Advertisment

Summarizing Data in Microsoft SQL Server 2008

author-image
PCQ Bureau
New Update

The GROUP BY statement to summarize data is almost as old as SQL itself.

Microsoft introduced additional constructs of ROLLUP and CUBE to add power to

the GROUP BY clause in SQL Server 6.5 itself.  What I have found during my

experiences while training SQL Server professionals is that awareness about

ROLLUP and CUBE is low and consequently professionals spend a lot of time

building queries that could have been easily accomplished by using these

constructs. I would attempt to explain the two constructs in this article. I

would also attempt to explain a new construct introduced in SQL Server 2008, the

GROUPING SETS operator. For the purpose of examples I have downloaded the

AdventureWorks sample database from the link provided in the box.

Advertisment

Direct Hit!

Applies To: Database developers



USP: Summarize data to a level of detail desired by your
business



Primary Link:



http://bit.ly/75KkPgSearch Engine



Keywords: group by, roll up, cube, grouping sets

A Simple GROUP BY Clause



Let us say you want to analyze the sales AdventureWorks is doing customer

wise and product wise. You could write a query like this.

SELECT CustomerID, ProductID,

SUM(LineTotal) AS LineTotal



FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail


ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID


GROUP BY CustomerID, ProductID


ORDER BY CustomerID, ProductID


Advertisment

In the sample output of the Group By Clause query, you see

the sum on a per-customer and per-product basis but you do not see what the sum

of the line total is for one customer across all products. For that you need to

use the ROLLUP operator.

Sample output of the Group By Clause query. Here, you see the sum on a

per-customer basis on the left and per-product basis on the right.

The ROLLUP operator



The query for the ROLLUP operator can be written as follows:

Advertisment

SELECT CustomerID, ProductID,

SUM(LineTotal) AS LineTotal



FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail


ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID


GROUP BY CustomerID, ProductID WITH ROLLUP


ORDER BY CustomerID, ProductID


The output of the query using the CUBE operator. If you compare the

statistics on the right hand side of the yellow status bar at the bottom of

each screenshot, you will find that the number of rows and the time taken to

run the query has been increasing to keep pace with the complexity of the

query.

In the screenshot on the left, you can see the output. The

first row in the screenshot shows you the sum across all products and customers.

The second row is for one particular customer across all products. You will see

for the other customers when you scroll down.What do you do if in the same

output you want the sum for one product across all customers? You use the CUBE

operator.

Advertisment
The output using the Grouping Sets operator. Observe the missing rows here

compared to the CUBE operator.

The CUBE Operator



A sample query written for the CUBE operator.

SELECT CustomerID, ProductID,

SUM(LineTotal) AS LineTotal



FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail


ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID


GROUP BY CustomerID, ProductID WITH CUBE


ORDER BY CustomerID, ProductID


Advertisment

The screenshot on top right shows the output of the query

using the CUBE operator. If you compare the statistics on the right hand side of

the yellow status bar at the bottom of each screenshot, note that the number of

rows and the time taken to run the query has been increasing to keep pace with

the complexity of the query.

I have been summarizing data all along on two dimensions —

customer and product. If you take more than two dimensions, for example,

customer, product and the sales person, the number of rows in the result of CUBE

will start growing exponentially because the possible number of combinations

will start growing exponentially.

The GROUPING SETS Operator



Examine the output of the CUBE operator. What if you want the sum only

customer-wise and only product-wise and you are not interested in the breakup

customer-product-wise? You use the GROUPING SETS operator.

Advertisment

Here is the query:

SELECT CustomerID, ProductID,

SUM(LineTotal) AS LineTotal



FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail


ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID


GROUP BY GROUPING SETS(CustomerID, ProductID)


ORDER BY CustomerID, ProductID


Summary



Presenting summary information to the level of detail desired is one of the
important challenges when the consumer of data is the business decision maker.

Not all of us today have the luxury of using decision support systems. In this

article, I have tried to introduce an easy to build query, summarizing features

into your day-to-day data driven applications.

Advertisment