Unpack an Integer with SQL Server Recursion

By Frank Solomon

Starting with SQL Server 2005, developers have had recursion available as a T-SQL language feature. This article will describe recursion and its SQL Server implementations, complete with examples. It will also include SQL Server functions and a stored procedure that unpacks, or parses, an integer into its multiple-of-two components.

An algorithm that references, or calls, itself a finite number of times is called recursive. Although a recursive solution can use more memory and server resources compared to a more conventional approach, recursion can offer faster, simpler development and for some problems, recursion is the ideal solution. The factorial algorithm is a classic example of recursion. Mathematically, the factorial of a positive integer n is the product of the positive integers less than or equal to n. For example, this calculates the factorial of seven:

7! = 7 x 6 x 5 x 4 x 3 x 2 x 1 = 5040

We can certainly write a looping algorithm to calculate the factorial, but we can also write a recursive algorithm. In pseudo-code, it would look like this:

function fact(n)
{
   if (n > 1)
      return (n * fact(n - 1)); -- recursive call
   else
      return 1;                  -- base case
}

For a positive integer, n greater than one, the function recursively calls itself with (n – 1). This is key – the recursive step drives the function closer to the base case, to reach that case in a finite number of steps. At the final step, n = 1, the function returns the computed value.

Recursive stored procedure SP_factorial SP_1 calculates the factorial for an integer between 0 and 20. Like all the functions and stored procedures in this article, it starts with USE [master], but it can go in any database. To prevent overflow crashes, it restricts @param to values between 0 and 20. Otherwise, it looks and works a lot like the pseudo-code above. Starting with input parameter @param, the recursive calls drive closer to the base case, to finally finish the stored procedure calculation. When run in the SQL Server 2012 debugger, this behavior becomes clear when called with this code:

DECLARE  @fact bigint
EXEC     SP_factorial 5, @fact OUTPUT
SELECT   @fact

Recursive function FN_factorial FN_1 calculates the factorial for an integer between 0 and 20. This also looks and works like the pseudo-code and the stored procedure above. Starting with input parameter @param, the recursive calls drive closer to the base case, to finally finish the function calculation. This statement:

SELECT dbo.FN_factorial(5)

shows the recursive behavior in the SQL Server 2012 debugger.

The third recursive element available in SQL Server is the recursive Common Table Expression (CTE). A CTE works like a view, except it scopes to the query that declared it. Both functions and stored procedures can have them. A recursive factorial CTE in a function might look something like FN_CTE_factorial FN_2. We can easily modify this function to return a single value for the computed factorial for input parameter @param, but the present structure of the function better illustrates how it works, and this will help explain the parsing stored procedure and functions we’ll see shortly. Unlike the recursive stored procedure and function above, the base case must come before the recursive call. Also, unlike the examples above, this CTE “moves away” from the base case, and the WHERE clause in the recursive step tells the CTE when to stop – in this case, when CTE column param_val exceeds input parameter @param. This statement:

SELECT param_val, fact from FN_CTE_factorial (10)

returns this:

param_val  fact
---------- -------
0          1     -->   base case
1          1     -->   recursive calls
2          2        
3          6
4          24
5          120
6          720
7          5040
8          40320
9          362880
10         3628800

result set. The added comments in the stored procedure show how the CTE works. Starting with the base case, each recursive call recalculates param_val by adding one to the most recent param_val value above it, and recalculates fact by multiplying the most recent fact value above it by (param_val + 1), also using the most recent param_val value for this calculation. The recursive calls proceed as long as param_val is less than input parameter value @param. Recursive CTE’s can handle between 0 and 32767 recursion levels, with 100 as the default. To change the limit, place the MAXRECURSION hint below the SELECT statement that calls the CTE. Remember that recursion has a potential price – slow performance and high system resource demand. Therefore, it must be used with care. However, it is a useful tool and we’ll now use it to solve a real-world problem.

This screenshot PNG_1 shows a Visual Studio listbox with the SelectionMode property set to MultiSimple. This property setting allows multiple row selections. Each row shows its data value in parentheses for clarification, although a production application probably would not do this. Black maps to 1, clear maps to 4, silver maps to 256, etc. Notice that each data value is a multiple of two. We want to use integer multiples of two here because in a positive number, the largest count of component integers will involve multiples of two. Using the listbox example above, we see:

999 = (9 * 10

2

) + (9 * 10

1

) + (9 * 10

0

)   -- Base 10 integers
   
    = 900 + 99 + 9

999 = (1 * 2

9

) + (1 * 2

8

) + (1 * 2

7

) +    -- Integer multiples
      (1 * 2

6

) + (1 * 2

5

) + (1 * 2

2

) +    -- of 2
      (1 * 2

1

) + (1 * 2

0

)

    = 512 + 256 + 128 + 64 + 32 + 4 + 2 + 1

So to get the most information with the most component integers possible, we should use integer multiples of 2. An application using this listbox will gather all the data values together and ultimately pass them as parameters to an embedded query, a database function, or a stored procedure. We can easily build a string with the individual selected values at this point, but we could also add all the selected data values and pass that sum as a single integer value to the database. In this example, a single integer substituting for eight integers would help decrease the workload on the server for a high volume application, and make the application logic simpler. However, the function or stored procedure receiving it must somehow “unpack”, or parse, that single integer into its component integers. SQL Server 2012 stored procedure SP_parseparam_output_varchar and SQL Server 2012 functions FN_parseparam_TV_four_cols and FN_convert_TV_to_varchar will handle this. In the function names, “TV” means “Table Variable”. Stored procedure SP_parseparam_output_varchar sets a VARCHAR output variable with all the parsed components of a positive integer, and function FN_parseparam_TV_four_cols returns a table variable. Function FN_convert_TV_to_varchar also returns these values as a VARCHAR. We’ll first look at stored procedure SP_parseparam_output_varchar.

Stored procedure SP_parseparam_output_varchar SP_2 takes BIGINT parameter @param and returns the component integers in a comma-delimited varchar. Although this stored procedure operates only on integer values, it has a BIGINT datatype parameter to offer flexibility if a call to it passes a BIGINT value. These lines:

DECLARE  @parse_string NVARCHAR(250) = ''
EXEC     SP_parseparam_output_varchar 999, @parse_string OUTPUT
SELECT   @parse_string AS 'parse_string'

Return:

parse_string
--------------------------------
(512, 256, 128, 64, 32, 4, 2, 1)
(1 row(s) affected)

The stored procedure defines @parse_string as an output parameter because each recursive step would overwrite @parse_string if defined as an internal variable. First, it validates @param, allowing positive values up to (231 – 1) and returning NULL for anything else. A stored procedure can handle at most 32 recursion levels; this upper limit guarantees coverage of every possible positive INT value. Remember that 20 = 1, and the stored procedure must account for this component. This is why the upper limit is not (232 – 1). Next, when @param = 0, the stored procedure has reached the base case. If @parse_string is NULL here, it means @param is zero and the stored procedure returns (0). Otherwise, the IF-block cleans the finished @parse_string. The next block handles the recursive step. First, it uses FLOOR(LOG(@param, 2)) to find the base-two LOG value of the largest multiple of two integer component of @param, and uses this value in the POWER function to get the component itself. For example,

DECLARE @param INT
SET @param = 872


SELECT FLOOR(LOG(@param, 2))             --> 9
SELECT POWER(2.0, FLOOR(LOG(@param, 2))) --> 512.0 
SELECT POWER(2.0, FLOOR(LOG(872, 2)))    --> 512.0 
SELECT POWER(2.0, 9)                     --> 512.0

works because 29 = 512. The power function uses 2.0 instead of 2 for the base to prevent overflow issues when the exponent exceeds 30. The stored procedure then appends the new value for @component to @parse_string, and sets @param to @param – @component as the step driving it toward the base case. Then, it calls itself recursively.

SQL Server 2012 function FN_parseparam_TV_four_cols FN_3 uses a CTE to get past the integer parameter datatype restriction of SP_parseparam_output_varchar. This function takes a BIGINT input parameter and returns a four-column table. The largest positive BIGINT number is (263 – 1), but this function can only cover values up to (247 – 1) because the SQL Server LOG function won’t work reliably for numbers of this size or larger. Still, this function will parse numbers with as many as 47 components. An example call to the function:

SELECT      param_val, largest_exp_of_two_in_param_val,
            component, new_param_val
FROM        FN_parseparam_TV_four_cols (122)

Returns the result set below.

param_val   largest_exp_of_two_in_param_val component   new_param_val
---------   ------------------------------- ---------   -------------
122         6                                64           58
58          5                                32           26
26          4                                16           10
10          3                                 8            2
2           1                                 2            0

In each row, the CTE calculates the exponent of the largest multiple of two component in param_val, the component itself, and then subtracts that component from param_val to get new_param_val. This value then becomes param_val for the next row.

The function first validates @param, allowing positive values up to (247 – 1) and returning a row of NULLs for anything else. Next, it calculates the first “largest exponent of two” value for @param. Then comes the CTE itself. The first SELECT builds the base case, and the second SELECT is the recursive call, based on the values of the most recent row above it. The recursive calls continue as long as the new_param_val is greater than zero. Finally, the stored procedure SELECTs from the CTE into table @parsed_vals. The SQL Server 2012 function FN_parseparam_TV_component_col FN_4 returns a table with only the component column values, because a production environment would probably need only these values.

Function FN_convert_TV_to_varchar FN_5 returns the component integers of @param in a comma-delimited VARCHAR, for all positive integer values of @param up to (247 – 1). This contrasts with stored procedure SP_parseparam_output_varchar, which works only up to (231 – 1). This function first calls FN_parseparam_TV_component_col, placing the values into table variable @local_TV. Then, using COALESCE in a SELECT from @local_TV, it concatenates VARCHAR variable @component_list with the values in @local_TV and returns @component_list.

Although recursion does have potential performance and resource drawbacks, it can provide simple, flexible solutions and this makes it a viable option to consider.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles