Updated! Missing images available!
1. INTRODUCTION
This article explains, with detailed examples, how to use PIVOTING to take maximum advantage of queries making use of GROUPING SETS. Besides, ROLLUP and CUBE operators will be analysed as well. We'll see throughout the article that it's very easy to get simple, practical and powerful queries by integrating all of them in the same query.
I think that master these operators contributes to make cleaner code and save a lot of time when developing. Thus, I hope this article will be useful to both improve your knowledge and develop faster cool queries for your reports, forms or views.
2. BACKGROUND
Let's suppose we have a table similar to the one showed below. It's a simple table showing the Quantity
of Products
sold by every Employee
for each ShipCountry
in a given company and period of time.
From the previous table, we would like to show a final result set that looks like to the next pictureas our first goal. Background colors are only painted for later explanations.
As you can see, original table has been rotating by converting values of ShipCountry
field in rows into new columns and also, new calculations have been performed to obtain:
- GREEN Cell:
Sum
of Quantity
of Products sold by All Employees
in All Countries
.
- YELLOW Cells:
Sum
of Quantity
of Products sold for each Employee
in All Countries
.
- ORANGE Cells:
Sum
of Quantity
of Products sold by All Employees
for each ShipCountry
.
- BLUE Cells:
Sum
of Quantity
of Products sold for each Employee
for each ShipCountry
.
We'll see how to get these different calculated values by using GROUPING SETS, CUBE or ROLLUP operators over a GROUP BY clause and how to rotate data values in ShipCountry
field from the original result set by using PIVOT operator.
3. USING THE CODE TO GET OUR FIRST GOAL
To begin with, we'll create a table with the proper structure:
create table tbl_Sales(Id int,
Employee nvarchar(50),
ShipCountry nvarchar(50),
ProductName nvarchar(50),
Quantity int
primary key (Id)
)
And we'll populate table with data:
insert into tbl_Sales
values (1,'Antonio Muro','Spain','Corsa',100),
(2,'Antonio Muro','France','Corsa',34),
(3,'Antonio Muro','Germany','Corsa',25),
(4,'Wilson Smith','Spain','Corsa',98),
(5,'Wilson Smith','France','Corsa',38),
(6,'Wilson Smith','Germany','Corsa',52),
(7,'Maria Johnson','Spain','Corsa',120),
(8,'Maria Johnson','France','Corsa',40),
(9,'Maria Johnson','Germany','Corsa',60),
(10,'Antonio Muro','Spain','Insignia',50),
(11,'Antonio Muro','France','Insignia',13),
(12,'Antonio Muro','Germany','Insignia',38),
(13,'Wilson Smith','Spain','Insignia',80),
(14,'Wilson Smith','France','Insignia',50),
(15,'Wilson Smith','Germany','Insignia',9),
(16,'Maria Johnson','Spain','Insignia',32),
(17,'Maria Johnson','France','Insignia',17),
(18,'Maria Johnson','Germany','Insignia',21)
Before going on, I would like to point out briefly some issues about how GROUPING SETS works. You can go deeper on it by searching in this site or Microsoft Documentation. There's a lot of information about and there's no sense explain the same again.
On one hand, GROUPING SETS lets you perform aggregations (sum, avg, etc.) over a list of combinations or sets of different fields. Have a look to the next piece of code:
select Employee,
ShipCountry,
SUM(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
(Employee, Shipcountry),
Employee,
ShipCountry,
())
I'm calculating the Sum
of Quantity
of Products grouping by four different combinations or sets of fields.
Employee, ShipCountry:
Employee and ShipCountry.
Employee
: Only Employee (All ShipCountries are taken for each employee).
ShipCountry
: Only ShipCountry (All Employees are taken for each ShipCountry).
()
: All Fields (All ShipCountries and All Employees are taken)
The result from que query is:
Employee, ShipCountry:
Results in BLUE color.
Employee
: Results in YELLOW color.
ShipCountry
: Results in ORANGE Color.
()
: Results in GREEN Color.
It's important to note that GROUPING SETS operator has calculated new aggregations and has added new rows to the result set. That being said, we can see that null
values have been returned in rows that contains aggregations. I mean, a null
value in ShipCountry
column is returned as a equivalent to All Countries and a null
value in Employee
field is returned as an All Employees value.
At this point, I would like to mention two new point to consider:
- How could I differenciate a
null
value returned from an aggregation operation instead of a null
from the original query?
- I dont'like much to deal with
null
values. Could I do something clearer or better?
To respond question one, there's a function called GROUPING that let us obtain exactly what we are looking for. I will use in the next piece of code. In short, this function returns an tinyint
value of 1 when null
value comes from aggregation and 0 if not.
To answer question two, I like to substitute null
values from aggregations with strings such us All Employees or All Countries. We'll achieve this by using the next improved piece of code.
So, here is our query using GROUPING SETS with GROUPING function inside CASE statements:
select case GROUPING(Employee)
when 1 then 'All Employees'
else Employee
end as Employee,
case GROUPING(ShipCountry)
when 1 then 'All Countries'
else ShipCountry
end ShipCountry,
sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
(Employee, Shipcountry),
Employee,
ShipCountry,
())
And here is our new obtained result set:
You should note that null
values have been replaced with strings All Employees and All Countries. Look at ORANGE, GREEN and YELLOW rows to check this.
So far, we have achieved all the calculations needed to put in our final result set. It will be necessary rotate the table, but we have done at least the half part of our first objective. Any other way of achieving this? Yes. We might have reached it by using CUBE operator instead of GROUPING SETS. Without diving a lot into it, CUBE operator calculates aggregated values for all possible combinations for specified fields in the GROUP BY clause. So, at least, you should know that there are 2n possible combinations, where n is the number of fields included in the GROUP BY clause. In this case n=2
so that there are four different cases or possible combinations. These combinations are exacty the same as used in our previous piece of code.
Anyway, here is the query using CUBE operator:
select case GROUPING(Employee)
when 1 then 'All Employees'
else Employee
end as Employee,
case GROUPING(ShipCountry)
when 1 then 'All Countries'
else ShipCountry
end ShipCountry,
sum(Quantity) as Quantity
from tbl_Sales
group by Employee, Shipcountry with CUBE
As I said before, the result is the same as using the previous code. Here is the identical result:
Let's go on with our first example. Now, we just have to rotate CountryShip
values in rows into new columns. How to get this? It's easy. We've only to make use of PIVOT operator. You can read some of my previous articles to go deeper on it (i.e. Using PIVOT with T-SQL)
Here is our final query using GROUPING SETS and PIVOTING our source query making previous use of a common table expression:
;with CTE_OriginalQuery as
(select case GROUPING(Employee)
when 1 then 'All Employees'
else Employee
end as Employee,
case GROUPING(ShipCountry)
when 1 then 'All Countries'
else ShipCountry
end ShipCountry,
sum(Quantity) as Quantity
from tbl_Sales
group by GROUPING SETS(
(Employee, Shipcountry),
Employee,
ShipCountry,
())
)
select Employee, Spain, France, Germany, [All Countries]
from CTE_OriginalQuery
PIVOT (sum(Quantity) for ShipCountry in
(Spain, France, Germany, [All Countries])) as Pivoted_FinalQuery
At last, by executing our final query we obtain our desired result set.