Take maximum advantage of Grouping Sets, Roll-up, Cube and Pivoting

Updated! Missing images available!

1. INTRODUCTION

This article explains, with detailed examples, how to use PIVOTING to take maximum advantage of queries making use of GROUPING SETS. Besides, ROLLUP and CUBE operators will be analysed as well. We'll see throughout the article that it's very easy to get simple, practical and powerful queries by integrating all of them in the same query.

I think that master these operators contributes to make cleaner code and save a lot of time when developing. Thus, I hope this article will be useful to both improve your knowledge and develop faster cool queries for your reports, forms or views.

2. BACKGROUND

Let's suppose we have a table similar to the one showed below. It's a simple table showing the Quantity of Products sold by every Employee for each ShipCountryin a given company and period of time.

From the previous table, we would like to show a final result set that looks like to the next pictureas our first goal. Background colors are only painted for later explanations.

As you can see, original table has been rotating by converting values of ShipCountry field in rows into new columns and also, new calculations have been performed to obtain:

  • GREEN Cell: Sum of Quantity of Products sold by All Employees in All Countries.
  • YELLOW Cells: Sum of Quantity of Products sold for each Employee in All Countries.
  • ORANGE Cells: Sum of Quantity of Products sold by All Employeesfor each ShipCountry.
  • BLUE Cells: Sum of Quantity of Products sold for each Employee for each ShipCountry.

We'll see how to get these different calculated values by using GROUPING SETS, CUBE or ROLLUP operators over a GROUP BY clause and how to rotate data values in ShipCountry field from the original result set by using PIVOT operator.

3. USING THE CODE TO GET OUR FIRST GOAL

To begin with, we'll create a table with the proper structure:

Copy Code
 create table tbl_Sales(Id int,
    Employee nvarchar(50), 
    ShipCountry nvarchar(50), 
    ProductName nvarchar(50), 
    Quantity int
    primary key (Id)
    )

And we'll populate table with data:

Copy Code
insert into tbl_Sales
values (1,'Antonio Muro','Spain','Corsa',100),
       (2,'Antonio Muro','France','Corsa',34),
       (3,'Antonio Muro','Germany','Corsa',25),
       (4,'Wilson Smith','Spain','Corsa',98),
       (5,'Wilson Smith','France','Corsa',38),
       (6,'Wilson Smith','Germany','Corsa',52),
       (7,'Maria Johnson','Spain','Corsa',120),
       (8,'Maria Johnson','France','Corsa',40),
       (9,'Maria Johnson','Germany','Corsa',60),
       (10,'Antonio Muro','Spain','Insignia',50),
       (11,'Antonio Muro','France','Insignia',13),
       (12,'Antonio Muro','Germany','Insignia',38),
       (13,'Wilson Smith','Spain','Insignia',80),
       (14,'Wilson Smith','France','Insignia',50),
       (15,'Wilson Smith','Germany','Insignia',9),
       (16,'Maria Johnson','Spain','Insignia',32),
       (17,'Maria Johnson','France','Insignia',17),
       (18,'Maria Johnson','Germany','Insignia',21)

Before going on, I would like to point out briefly some issues about how GROUPING SETS works. You can go deeper on it by searching in this site or Microsoft Documentation. There's a lot of information about and there's no sense explain the same again.

On one hand, GROUPING SETS lets you perform aggregations (sum, avg, etc.) over a list of combinations or sets of different fields. Have a look to the next piece of code:

Copy Code
select Employee, 
	   ShipCountry, 
	   SUM(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
		(Employee, Shipcountry), 
		Employee, 
		ShipCountry,
		())    

I'm calculating the Sum of Quantity of Products grouping by four different combinations or sets of fields.

  • Employee, ShipCountry: Employee and ShipCountry.
  • Employee: Only Employee (All ShipCountries are taken for each employee).
  • ShipCountry: Only ShipCountry (All Employees are taken for each ShipCountry).
  • () : All Fields (All ShipCountries and All Employees are taken)

The result from que query is:

  • Employee, ShipCountry: Results in BLUE color.
  • Employee: Results in YELLOW color.
  • ShipCountry: Results in ORANGE Color.
  • () : Results in GREEN Color.

It's important to note that GROUPING SETS operator has calculated new aggregations and has added new rows to the result set. That being said, we can see that null values have been returned in rows that contains aggregations. I mean, a null value in ShipCountry column is returned as a equivalent to All Countries and a null value in Employee field is returned as an All Employees value.

At this point, I would like to mention two new point to consider:

  1. How could I differenciate a null value returned from an aggregation operation instead of a null from the original query?
  2. I dont'like much to deal with null values. Could I do something clearer or better?

To respond question one, there's a function called GROUPING that let us obtain exactly what we are looking for. I will use in the next piece of code. In short, this function returns an tinyint value of 1 when null value comes from aggregation and 0 if not.

To answer question two, I like to substitute null values from aggregations with strings such us All Employees or All Countries. We'll achieve this by using the next improved piece of code.

So, here is our query using GROUPING SETS with GROUPING function inside CASE statements:

Copy Code
select case GROUPING(Employee) 
            when 1 then 'All Employees' 
            else Employee 
        end as Employee, 
       case GROUPING(ShipCountry) 
            when 1 then 'All Countries' 
            else ShipCountry 
        end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
        (Employee, Shipcountry), 
        Employee, 
        ShipCountry,
        ())

And here is our new obtained result set:

You should note that nullvalues have been replaced with strings All Employees and All Countries. Look at ORANGE, GREEN and YELLOW rows to check this.

So far, we have achieved all the calculations needed to put in our final result set. It will be necessary rotate the table, but we have done at least the half part of our first objective. Any other way of achieving this? Yes. We might have reached it by using CUBE operator instead of GROUPING SETS. Without diving a lot into it, CUBE operator calculates aggregated values for all possible combinations for specified fields in the GROUP BY clause. So, at least, you should know that there are 2n possible combinations, where n is the number of fields included in the GROUP BY clause. In this case n=2 so that there are four different cases or possible combinations. These combinations are exacty the same as used in our previous piece of code.

Anyway, here is the query using CUBE operator:

Copy Code
 select case GROUPING(Employee) 
            when 1 then 'All Employees' 
            else Employee 
        end as Employee, 
       case GROUPING(ShipCountry) 
            when 1 then 'All Countries' 
            else ShipCountry 
        end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by Employee, Shipcountry with CUBE

As I said before, the result is the same as using the previous code. Here is the identical result:

Let's go on with our first example. Now, we just have to rotate CountryShipvalues in rows into new columns. How to get this? It's easy. We've only to make use of PIVOT operator. You can read some of my previous articles to go deeper on it (i.e. Using PIVOT with T-SQL)

Here is our final query using GROUPING SETS and PIVOTING our source query making previous use of a common table expression:

Copy Code
;with CTE_OriginalQuery as 
(select case GROUPING(Employee) 
         when 1 then 'All Employees' 
         else Employee 
       end as Employee, 
       case GROUPING(ShipCountry) 
         when 1 then 'All Countries' 
         else ShipCountry 
       end ShipCountry, 
       sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
 (Employee, Shipcountry), 
  Employee, 
  ShipCountry,
 ())
)
select Employee, Spain, France, Germany, [All Countries]
from CTE_OriginalQuery
PIVOT (sum(Quantity) for ShipCountry in 
 (Spain, France, Germany, [All Countries])) as Pivoted_FinalQuery

At last, by executing our final query we obtain our desired result set.