Implementation Guides

Business Intelligence: Hyper Cubing

PCQ Bureau

04 Nov 2006 10:43 IST

New Update

Relational databases have been functional as life-lines for information

workers for decades now. They deliver results no doubt, but it is also a fact

that it is not easy (if not tough) to query and analyze information stored in

them, on the fly and with respect to multiple metrics. You would run queries for

analysis that would measure one or more KPI (Key Performance Indicators) for

your business and this would be against a particular metric. For example it

would either be sales data for a period of time or in a region or for a product.

How about analyzing the information of say sales figure for a product with

respect to a region over a period of time, or say sales figures for the same

product in a particular outlet of a particular region in a time period. Well as

for just a relational design, you might have to make sure that your sales table

has all these columns. And if they are not there, adding columns and values to

them would be the only solution left.

Advertisment

It is places like these where using OLAP (On Line Analytical Processing) can

give you all the information you want while keeping multiple variables in your

queries. In the previous part of the series, we talked about how data warehouses

can help information access and knowledge discovery, and subsequently your

business analytics.

Applies To: CIOs, CTOs

USP: OLAP systems basics, benefits and how to go about implementation

Primary Link: www.olap.com

Google Keywords: hypercubing, FASMI, OLAP

We also gave a brief overview of implementing a DW using MS SQL Server 2000.

In this part of the series, we will take you through the basic concepts related

to OLAP, how to go about it, what do you need for such an implementation and a

simple demo of how it can be done using the Business Intelligence Development

Studio provided with MS SQL Server 2005.

Advertisment

What is OLAP?

OLAP is a technique that enables you to analyze data with respect to multiple
metrics that are termed as 'Dimensions'. As it is a part of BI

implementation you need not worry about where the data is coming from (source)

or how is it presented (the format). In an OLAP setup, the entire data is

categorized into two types: Dimensions and Measures. Measures represent the data

that are measurable or countable in numeric terms. For example, the number of

units sold, the inventory stock, number of employees, and gallons of oil are all

examples of what can be included as a measure. Dimensions on the other hand are

non-numeric quantities but carry relevance in that they form one of the aspects

with respect to which other factors are to be measured. Examples are sales area

or region, type of product etc.

Multiple dimensions are organized in an OLAP system to give what is called a

'Multidimensional Cube' or simply a 'Cube'. A cube is a data-structure

that stores measures along predefined dimensions. The dimensions of a cube are

further organized in hierarchies. For example, daily figures aggregate to

monthly sales, monthly sales aggregate to give the annual sales revenue. In

cubes, each of these measures (also called variables) have multiple dimensions

(also called axes).

Decision makers can query OLAP databases

using Cubes having each metric along its dimensions

Advertisment

For example, the sales revenue measure can have dimensions of salesperson,

sales area, and outlet. Once defined, the cube pre-calculates and aggregates

values for these measures at each hierarchy of these dimensions. And this

pre-calculation is what makes OLAP quick. OLAP's speed is one of the major

characteristics associated with such a system.

Taking a step further

Now that you know about OLAP, cubes, dimensions and measures let us look at some
more jargon that form an integral part of OLAP implementations. We exemplified

cubes with three dimensions only in the preceding section. Although a 'cube'

has just three dimensions by geometrical definition, an OLAP Cube can have more

than one dimension. How many it has, depends on the complexity of your table

structure as well as your information workers' need. Such a cube is termed as

a 'Hypercube'. OLAP systems have an OLAP server and OLAP client software.

An OLAP server is a high-capacity, multi-user data manipulation engine
specifically designed to support and operate on multidimensional data

structures. The server houses all the cubes. OLAP clients are applications that

request multidimensional data from the OLAP server for manipulation,

calculations or analysis.

Measures and Dimensions are not the only part of a cube or hypercube. The

measures stored in a cube can be used as inputs in expressions for calculating

values for further variables on the fly. These expressions are termed as 'Formulas'

and the resultant variables are termed as 'Calculated Members'.

Advertisment

The most distinguishing characteristic for identifying a calculated member is

that it is always a resultant of existing measures in a Cube. What all these

facilitate in the end is termed as an Ad-hoc Analysis of data. Ad-hoc analysis

allows users generate new queries without defining anything in advance. This is

the most important resultant of an OLAP system, as ad-hoc queries do not exist

in the purely relational data model.

Star or Snowflake Models

You do not specifically need a data warehouse for OLAP. Your existing relational
models can also act as a source for cubes and hyper cubes. A Snow Flake schema

is a special, de-normalized data model used by relational databases to provide a

multidimensional data structure for OLAP applications.

Several dimension tables surround the fact table. A fact table is a table

that record the metrics of a KPI and provides independent attributes using which

different dimensions can be analyzed. This model is central to the

implementation.

Advertisment

The Physical Model of our database will contain several tables, with fields connected with relationships

Our sample database

For our sample, we have a holistic multi-location multi-store supply chain. This
lets us address several entities like clients, products, orders, outlets and

time. A real-life deployment would be more complex, involving more parameters.

In our database, we have tables for these entities: customer, order, outlet,

product and sales. The order table has basic fields like order number, date the

order was made, customer number and product code, and quantity. Then we have a

product table with the product number, name and unit price. Likewise there are

tables for customers and outlets.

First steps

Let's implement a cube in this database step-by-step. The first step is to get
a database with tables and columns. These tables are a low-level form of

relational databases. Our retail company wants to view monthly and weekly

reports. Someone feeds in data into a spreadsheet as rows and columns.

Advertisment

These relational databases only provide limited capability. If you want the

performance report for a particular time frame or for some specific customer,

product and location then you have to pull in all the data and work on it all

over again.

Creating these reports according to different requirements creates a lot of

work and number of sheets increase with every additional relationship added. The

structured tables from a database can be changed into a star schema which lets

you interconnect the different fields in it. These tables are linked together

using their inter-dependencies. For example, our order table is dependent upon

customer and product tables for customer id and product id respectively. Now the

sales table we created depends on the product, customer, outlet and order

tables. In our case, the dimensions invariably are Customer, Product, Order and

Outlet while the sole fact table is the Sales table.

Cubing

On top of the relational database we will build a cube for multi dimensional
analyses. As a general procedure you can take time and product on two axes and

as the third axis either you can put employee or outlet. This in turn provides

the data based on performance of an outlet for products in any given time frame.

If you choose to have the product performance over time for employee then

instead of changing the whole table, only one coordinate has to be changed.

These dimensions can have many different parameters and thousands of entries. In

a cube, there are three faces and each of these faces depicts a relation between

two of its dimensions.

Advertisment

Putting up a fourth dimension would let you make a hypercube for advanced

relationships and queries. Instead of creating a domain restricted to months or

years, it is more useful to create a larger 'time' domain. This domain will

invariably include all time domains for which you need to track the reports.

Further, the hierarchy system is to be put in place for every product in the

section it comes under. The locations can be categorized for region, zone,

country, and so on as this helps in finding the relevant data when queried. The

only problem faced while defining these hierarchies are dependencies between

them. These dependencies can also cascade. As the data grows, it becomes easier

for reports to have hierarchical patterns. This would let you look at the whole

picture instead of in small subsets.

The aggregate data from these hierarchical systems can be easily obtained as

the data is stored based on data hierarchy. The attributes in these dimensions

are to be defined like the product size, weight, etc. These attributes help in

narrowing the approach and making cube more versatile, by adding as much content

as it can. There are some calculations that would still exist and will require

inputs from existing measures. For example, over here you might need calculating

product revenues depending upon taxes and VAT. In such cases you need to

correlate two dimensions for calculative analysis depending on a derived

measure. With all the structures defined you can go ahead with deployment of the

cube using the specific tool that your database or warehouse vendor provides.

You can also use temporal calculations for some special cases. Using temporal

calculations, cases can be formulated for checking out on factors such as what

would be the effect on sales given one of the measures carried a different value

than what you have currently. You will also be able to compare two particular

entities using some calculations. And finally using conditional soothing i.e.

predicting trends based on assumed values of measures, and market situations,

you will be able to get the market trend for coming months.

Building a cube for OLAP is not an isolated exercise as we have outlined just

the concepts behind cubing. In the real world, you would be first running

processes to cleanup and get all the data into one universal format, followed by

defining a model that better suits the OLAP paradigm in case your existing model

does not suffice. Whatever be the processes involved, if carried out

meticulously, it is one exercise that would not only change but also ease the

way information workers look and execute queries for information retrieval or

even analysis.

Anadi Misra and Anubhav Verma

Advertisment