The GROUP BY statement to summarize data is almost as old as SQL itself.
Microsoft introduced additional constructs of ROLLUP and CUBE to add power to
the GROUP BY clause in SQL Server 6.5 itself. What I have found during my
experiences while training SQL Server professionals is that awareness about
ROLLUP and CUBE is low and consequently professionals spend a lot of time
building queries that could have been easily accomplished by using these
constructs. I would attempt to explain the two constructs in this article. I
would also attempt to explain a new construct introduced in SQL Server 2008, the
GROUPING SETS operator. For the purpose of examples I have downloaded the
AdventureWorks sample database from the link provided in the box.
Direct Hit! |
Applies To: Database developers USP: Summarize data to a level of detail desired by your business Primary Link: http://bit.ly/75KkPgSearch Engine Keywords: group by, roll up, cube, grouping sets |
A Simple GROUP BY Clause
Let us say you want to analyze the sales AdventureWorks is doing customer
wise and product wise. You could write a query like this.
SELECT CustomerID, ProductID,
SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY CustomerID, ProductID
ORDER BY CustomerID, ProductID
In the sample output of the Group By Clause query, you see
the sum on a per-customer and per-product basis but you do not see what the sum
of the line total is for one customer across all products. For that you need to
use the ROLLUP operator.
Sample output of the Group By Clause query. Here, you see the sum on a per-customer basis on the left and per-product basis on the right. |
The ROLLUP operator
The query for the ROLLUP operator can be written as follows:
SELECT CustomerID, ProductID,
SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY CustomerID, ProductID WITH ROLLUP
ORDER BY CustomerID, ProductID
The output of the query using the CUBE operator. If you compare the statistics on the right hand side of the yellow status bar at the bottom of each screenshot, you will find that the number of rows and the time taken to run the query has been increasing to keep pace with the complexity of the query. |
In the screenshot on the left, you can see the output. The
first row in the screenshot shows you the sum across all products and customers.
The second row is for one particular customer across all products. You will see
for the other customers when you scroll down.What do you do if in the same
output you want the sum for one product across all customers? You use the CUBE
operator.
The output using the Grouping Sets operator. Observe the missing rows here compared to the CUBE operator. |
The CUBE Operator
A sample query written for the CUBE operator.
SELECT CustomerID, ProductID,
SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY CustomerID, ProductID WITH CUBE
ORDER BY CustomerID, ProductID
The screenshot on top right shows the output of the query
using the CUBE operator. If you compare the statistics on the right hand side of
the yellow status bar at the bottom of each screenshot, note that the number of
rows and the time taken to run the query has been increasing to keep pace with
the complexity of the query.
I have been summarizing data all along on two dimensions —
customer and product. If you take more than two dimensions, for example,
customer, product and the sales person, the number of rows in the result of CUBE
will start growing exponentially because the possible number of combinations
will start growing exponentially.
The GROUPING SETS Operator
Examine the output of the CUBE operator. What if you want the sum only
customer-wise and only product-wise and you are not interested in the breakup
customer-product-wise? You use the GROUPING SETS operator.
Here is the query:
SELECT CustomerID, ProductID,
SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY GROUPING SETS(CustomerID, ProductID)
ORDER BY CustomerID, ProductID
Summary
Presenting summary information to the level of detail desired is one of the
important challenges when the consumer of data is the business decision maker.
Not all of us today have the luxury of using decision support systems. In this
article, I have tried to introduce an easy to build query, summarizing features
into your day-to-day data driven applications.