Take maximum advantage of Grouping Sets, Roll-up, Cube and Pivoting

Updated! Missing images available!

1. INTRODUCTION

This article explains, with detailed examples, how to use PIVOTING to take maximum advantage of queries making use of GROUPING SETS. Besides, ROLLUP and CUBE operators will be analysed as well. We'll see throughout the article that it's very easy to get simple, practical and powerful queries by integrating all of them in the same query.

I think that master these operators contributes to make cleaner code and save a lot of time when developing. Thus, I hope this article will be useful to both improve your knowledge and develop faster cool queries for your reports, forms or views.

2. BACKGROUND

Let's suppose we have a table similar to the one showed below. It's a simple table showing the Quantity of Products sold by every Employee for each ShipCountryin a given company and period of time.

From the previous table, we would like to show a final result set that looks like to the next pictureas our first goal. Background colors are only painted for later explanations.

As you can see, original table has been rotating by converting values of ShipCountry field in rows into new columns and also, new calculations have been performed to obtain:

  • GREEN Cell: Sum of Quantity of Products sold by All Employees in All Countries.
  • YELLOW Cells: Sum of Quantity of Products sold for each Employee in All Countries.
  • ORANGE Cells: Sum of Quantity of Products sold by All Employeesfor each ShipCountry.
  • BLUE Cells: Sum of Quantity of Products sold for each Employee for each ShipCountry.

We'll see how to get these different calculated values by using GROUPING SETS, CUBE or ROLLUP operators over a GROUP BY clause and how to rotate data values in ShipCountry field from the original result set by using PIVOT operator.

3. USING THE CODE TO GET OUR FIRST GOAL

To begin with, we'll create a table with the proper structure:

Copy Code
 create table tbl_Sales(Id int,
    Employee nvarchar(50), 
    ShipCountry nvarchar(50), 
    ProductName nvarchar(50), 
    Quantity int
    primary key (Id)
    )

And we'll populate table with data:

Copy Code
insert into tbl_Sales
values (1,'Antonio Muro','Spain','Corsa',100),
       (2,'Antonio Muro','France','Corsa',34),
       (3,'Antonio Muro','Germany','Corsa',25),
       (4,'Wilson Smith','Spain','Corsa',98),
       (5,'Wilson Smith','France','Corsa',38),
       (6,'Wilson Smith','Germany','Corsa',52),
       (7,'Maria Johnson','Spain','Corsa',120),
       (8,'Maria Johnson','France','Corsa',40),
       (9,'Maria Johnson','Germany','Corsa',60),
       (10,'Antonio Muro','Spain','Insignia',50),
       (11,'Antonio Muro','France','Insignia',13),
       (12,'Antonio Muro','Germany','Insignia',38),
       (13,'Wilson Smith','Spain','Insignia',80),
       (14,'Wilson Smith','France','Insignia',50),
       (15,'Wilson Smith','Germany','Insignia',9),
       (16,'Maria Johnson','Spain','Insignia',32),
       (17,'Maria Johnson','France','Insignia',17),
       (18,'Maria Johnson','Germany','Insignia',21)

Before going on, I would like to point out briefly some issues about how GROUPING SETS works. You can go deeper on it by searching in this site or Microsoft Documentation. There's a lot of information about and there's no sense explain the same again.

On one hand, GROUPING SETS lets you perform aggregations (sum, avg, etc.) over a list of combinations or sets of different fields. Have a look to the next piece of code:

Copy Code
select Employee, 
	   ShipCountry, 
	   SUM(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
		(Employee, Shipcountry), 
		Employee, 
		ShipCountry,
		())    

I'm calculating the Sum of Quantity of Products grouping by four different combinations or sets of fields.

  • Employee, ShipCountry: Employee and ShipCountry.
  • Employee: Only Employee (All ShipCountries are taken for each employee).
  • ShipCountry: Only ShipCountry (All Employees are taken for each ShipCountry).
  • () : All Fields (All ShipCountries and All Employees are taken)

The result from que query is:

  • Employee, ShipCountry: Results in BLUE color.
  • Employee: Results in YELLOW color.
  • ShipCountry: Results in ORANGE Color.
  • () : Results in GREEN Color.

It's important to note that GROUPING SETS operator has calculated new aggregations and has added new rows to the result set. That being said, we can see that null values have been returned in rows that contains aggregations. I mean, a null value in ShipCountry column is returned as a equivalent to All Countries and a null value in Employee field is returned as an All Employees value.

At this point, I would like to mention two new point to consider:

  1. How could I differenciate a null value returned from an aggregation operation instead of a null from the original query?
  2. I dont'like much to deal with null values. Could I do something clearer or better?

To respond question one, there's a function called GROUPING that let us obtain exactly what we are looking for. I will use in the next piece of code. In short, this function returns an tinyint value of 1 when null value comes from aggregation and 0 if not.

To answer question two, I like to substitute null values from aggregations with strings such us All Employees or All Countries. We'll achieve this by using the next improved piece of code.

So, here is our query using GROUPING SETS with GROUPING function inside CASE statements:

Copy Code
select case GROUPING(Employee) 
            when 1 then 'All Employees' 
            else Employee 
        end as Employee, 
       case GROUPING(ShipCountry) 
            when 1 then 'All Countries' 
            else ShipCountry 
        end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
        (Employee, Shipcountry), 
        Employee, 
        ShipCountry,
        ())

And here is our new obtained result set:

You should note that nullvalues have been replaced with strings All Employees and All Countries. Look at ORANGE, GREEN and YELLOW rows to check this.

So far, we have achieved all the calculations needed to put in our final result set. It will be necessary rotate the table, but we have done at least the half part of our first objective. Any other way of achieving this? Yes. We might have reached it by using CUBE operator instead of GROUPING SETS. Without diving a lot into it, CUBE operator calculates aggregated values for all possible combinations for specified fields in the GROUP BY clause. So, at least, you should know that there are 2n possible combinations, where n is the number of fields included in the GROUP BY clause. In this case n=2 so that there are four different cases or possible combinations. These combinations are exacty the same as used in our previous piece of code.

Anyway, here is the query using CUBE operator:

Copy Code
 select case GROUPING(Employee) 
            when 1 then 'All Employees' 
            else Employee 
        end as Employee, 
       case GROUPING(ShipCountry) 
            when 1 then 'All Countries' 
            else ShipCountry 
        end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by Employee, Shipcountry with CUBE

As I said before, the result is the same as using the previous code. Here is the identical result:

Let's go on with our first example. Now, we just have to rotate CountryShipvalues in rows into new columns. How to get this? It's easy. We've only to make use of PIVOT operator. You can read some of my previous articles to go deeper on it (i.e. Using PIVOT with T-SQL)

Here is our final query using GROUPING SETS and PIVOTING our source query making previous use of a common table expression:

Copy Code
;with CTE_OriginalQuery as 
(select case GROUPING(Employee) 
         when 1 then 'All Employees' 
         else Employee 
       end as Employee, 
       case GROUPING(ShipCountry) 
         when 1 then 'All Countries' 
         else ShipCountry 
       end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
 (Employee, Shipcountry), 
  Employee, 
  ShipCountry,
 ())
)
select Employee, Spain, France, Germany, [All Countries]
from CTE_OriginalQuery
PIVOT (sum(Quantity) for ShipCountry in 
 (Spain, France, Germany, [All Countries])) as Pivoted_FinalQuery

At last, by executing our final query we obtain our desired result set.

We've achieved our first goal only by using simple SQL (not cursors, not strange code...) I think that it's worthwhile.

To end up this section, only one more issue. Why not using ROLLUP operator? Is it possible to obtain the same result? If not, why? Let's check it with another example:

Our query using Rollup would be:

Copy Code
select case GROUPING(Employee) 
            when 1 then 'All Employees' 
            else Employee 
        end as Employee, 
       case GROUPING(ShipCountry) 
            when 1 then 'All Countries' 
            else ShipCountry 
        end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by Employee, Shipcountry with ROLLUP

And the result of previous query:

Where are the ORANGE rows? They aren't up there. This is because ROLLUP operator doesn't generate aggregations only grouped by CountryShip field. Why? Because ROLLUP just calculates these grouping sets:

  • Employee, ShipCountry: Results in BLUE color.
  • Employee: Results in YELLOW color (All ShipCountries are taken for each employee).
  • () : Results in GREEN Color (All ShipCountries and All Employees are taken).

As you can check, ROLLUP operator only obtains aggregated values in ascending order in the hierarchy, I mean in the specified order of fields in the GROUP BY clause. Thus, aggregated values grouped by only ShipCountries are not calculated. It doesn't match our initial requirements. Anyway, which kind of final result set we would have obtained with this query using ROLLUP and applying PIVOTING. Let's see the differences with our next piece of code and related picture from results.

Query using ROLLUP and then PIVOTING Source Query Data:

Copy Code
;with CTE_OriginalQuery as (
select case GROUPING(Employee) 
         when 1 then 'All Employees' 
         else Employee 
       end as Employee, 
       case GROUPING(ShipCountry) 
         when 1 then 'All Countries' 
         else ShipCountry 
       end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by Employee, Shipcountry with ROLLUP)
select Employee, Spain, France, Germany, [All Countries]
from CTE_OriginalQuery
PIVOT (sum(Quantity) for ShipCountry in (Spain, France, Germany, [All Countries])) as Pivoted_Sales

Results from previous query:

As you can see again, there are no values for All Employees for each single country.

4. ONE MORE ADVANCED EXAMPLE

I think that, at this point, there is enough SQL code for the article, but I would like to finish it showing one more advanced example. What if we wanted to obtain a even more advanced result set like the following? Again, I have given background color to cells only to help to explain the origin for each set of values.

It should be noted that a new column field Product has been added to our final result set. It might seem difficult to obtain this result, but not, it's easy taking into account all previously-explained before. You just have to add some new combinations of GROUPING SETS.

Here is the code with our new GROUPING SETS:

Copy Code
select case GROUPING(Employee) 
        when 1 then 'All Employees' 
        else Employee 
       end as Employee, 
	   case GROUPING(ShipCountry) 
        when 1 then 'All Countries' 
        else ShipCountry 
       end ShipCountry, 
	   case GROUPING(ProductName) 
        when 1 then 'All Products' 
        else ProductName 
       end ProductName,
	   sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
	(Employee, Shipcountry, ProductName), 
	(Employee, Shipcountry), 
	(Employee, ProductName), 
	(ShipCountry,ProductName),
	Employee, 
	ShipCountry,
	ProductName,
	()
)

Now, we have three fields to group by. Then, according to what we saw before, there are 23= 8 possible combinations to consider. I have maintained same code colors from first example and added new ones for the new combinations. Here are the new GROUPING SETS:

  • Employee, ShipCountry, ProductName: Employee, ShipCountry and ProductName.
  • Employee, ShipCountry: Employee and ShipCountry (All Products are taken for each combination)
  • Employee, ProductName: Employee and ProductName (All ShipCountries are taken for each combination)
  • ShipCountry, ProductName: ShipCountry and ProductName (All Employees are taken for each combination)
  • Employee: Only Employee (All ShipCountries and All Products are taken for each employee)
  • ShipCountry: Only ShipCountry (All Employees and All Products are taken for each employee)
  • ProductName: Only ProductName (All ShipCountries and All Employees are taken for each employee)
  • () : All Fields (All ShipCountries, All Employees and all Products are taken)

We have again all the information to obtain our second goal in this article. We just need pivoting the previous result. Here is the code:

Copy Code
;with cte_OriginalQuery as (
select case GROUPING(Employee) 
        when 1 then 'All Employees' 
        else Employee 
       end as Employee, 
	   case GROUPING(ShipCountry) 
        when 1 then 'All Countries' 
        else ShipCountry end ShipCountry, 
	   case GROUPING(ProductName) 
        when 1 then 'All Products' 
        else ProductName end ProductName,
	   sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
	(Employee, Shipcountry, ProductName), 
	(Employee, Shipcountry), 
	(Employee, ProductName), 
	(ShipCountry,ProductName),
	Employee, 
	ShipCountry,
	ProductName,
	()
    )
)
select Employee, ProductName, Spain, France, Germany, [All Countries]
from cte_OriginalQuery 
PIVOT (sum(Quantity) for ShipCountry in (
Spain, France, Germany, [All Countries])) as Pivoted_Sales
order by Employee, ProductName

And here are the explained results from PIVOTING our new GROUPING SETS:

  • Employee, ShipCountry, ProductName:Results in GREY color.
  • Employee, ShipCountry: Results in BLUE color.
  • Employee, ProductName: Results in BROWN color.
  • ShipCountry, ProductName: Results in RED color.
  • Employee: Results in YELLOW color.
  • ShipCountry: Results in ORANGE color.
  • ProductName: Results in PURPLE color.
  • (): Results in GREEN color.

5. POINTS OF INTEREST

I'm used to reviewing a lot of pieces of SQL code and very little times I have found use of GROUPING SETS, PIVOT or even other powerful SQL facilities such us WINDOWING functions, etc. Instead, massive use of cursors, temp tables, etc. has been coded. As a consequence, it usually implies worse maintainability and performance. So, I think it's worthwhile make a strong effort to master these options.

I hope this is useful for you.

Add comment