by November 4, 2006 0 comments



Relational databases have been functional as life-lines for information
workers for decades now. They deliver results no doubt, but it is also a fact
that it is not easy (if not tough) to query and analyze information stored in
them, on the fly and with respect to multiple metrics. You would run queries for
analysis that would measure one or more KPI (Key Performance Indicators) for
your business and this would be against a particular metric. For example it
would either be sales data for a period of time or in a region or for a product.
How about analyzing the information of say sales figure for a product with
respect to a region over a period of time, or say sales figures for the same
product in a particular outlet of a particular region in a time period. Well as
for just a relational design, you might have to make sure that your sales table
has all these columns. And if they are not there, adding columns and values to
them would be the only solution left.

It is places like these where using OLAP (On Line Analytical Processing) can
give you all the information you want while keeping multiple variables in your
queries. In the previous part of the series, we talked about how data warehouses
can help information access and knowledge discovery, and subsequently your
business analytics.

Applies To: CIOs, CTOs
USP:
OLAP systems basics, benefits and how to go about implementation
Primary Link:
www.olap.com
Google Keywords:
hypercubing, FASMI, OLAP

We also gave a brief overview of implementing a DW using MS SQL Server 2000.
In this part of the series, we will take you through the basic concepts related
to OLAP, how to go about it, what do you need for such an implementation and a
simple demo of how it can be done using the Business Intelligence Development
Studio provided with MS SQL Server 2005.

What is OLAP?
OLAP is a technique that enables you to analyze data with respect to multiple
metrics that are termed as ‘Dimensions’. As it is a part of BI
implementation you need not worry about where the data is coming from (source)
or how is it presented (the format). In an OLAP setup, the entire data is
categorized into two types: Dimensions and Measures. Measures represent the data
that are measurable or countable in numeric terms. For example, the number of
units sold, the inventory stock, number of employees, and gallons of oil are all
examples of what can be included as a measure. Dimensions on the other hand are
non-numeric quantities but carry relevance in that they form one of the aspects
with respect to which other factors are to be measured. Examples are sales area
or region, type of product etc.

Multiple dimensions are organized in an OLAP system to give what is called a
‘Multidimensional Cube’ or simply a ‘Cube’. A cube is a data-structure
that stores measures along predefined dimensions. The dimensions of a cube are
further organized in hierarchies. For example, daily figures aggregate to
monthly sales, monthly sales aggregate to give the annual sales revenue. In
cubes, each of these measures (also called variables) have multiple dimensions
(also called axes).

Decision makers can query OLAP databases
using Cubes having each metric along its dimensions

For example, the sales revenue measure can have dimensions of salesperson,
sales area, and outlet. Once defined, the cube pre-calculates and aggregates
values for these measures at each hierarchy of these dimensions. And this
pre-calculation is what makes OLAP quick. OLAP’s speed is one of the major
characteristics associated with such a system.

Taking a step further
Now that you know about OLAP, cubes, dimensions and measures let us look at some
more jargon that form an integral part of OLAP implementations. We exemplified
cubes with three dimensions only in the preceding section. Although a ‘cube’
has just three dimensions by geometrical definition, an OLAP Cube can have more
than one dimension. How many it has, depends on the complexity of your table
structure as well as your information workers’ need. Such a cube is termed as
a ‘Hypercube’. OLAP systems have an OLAP server and OLAP client software.
An OLAP server is a high-capacity, multi-user data manipulation engine
specifically designed to support and operate on multidimensional data
structures. The server houses all the cubes. OLAP clients are applications that
request multidimensional data from the OLAP server for manipulation,
calculations or analysis.

Measures and Dimensions are not the only part of a cube or hypercube. The
measures stored in a cube can be used as inputs in expressions for calculating
values for further variables on the fly. These expressions are termed as ‘Formulas’
and the resultant variables are termed as ‘Calculated Members’.

The most distinguishing characteristic for identifying a calculated member is
that it is always a resultant of existing measures in a Cube. What all these
facilitate in the end is termed as an Ad-hoc Analysis of data. Ad-hoc analysis
allows users generate new queries without defining anything in advance. This is
the most important resultant of an OLAP system, as ad-hoc queries do not exist
in the purely relational data model.

Star or Snowflake Models
You do not specifically need a data warehouse for OLAP. Your existing relational
models can also act as a source for cubes and hyper cubes. A Snow Flake schema
is a special, de-normalized data model used by relational databases to provide a
multidimensional data structure for OLAP applications.

Several dimension tables surround the fact table. A fact table is a table
that record the metrics of a KPI and provides independent attributes using which
different dimensions can be analyzed. This model is central to the
implementation.

The Physical Model of our database will contain several tables, with fields connected with relationships

Our sample database
For our sample, we have a holistic multi-location multi-store supply chain. This
lets us address several entities like clients, products, orders, outlets and
time. A real-life deployment would be more complex, involving more parameters.
In our database, we have tables for these entities: customer, order, outlet,
product and sales. The order table has basic fields like order number, date the
order was made, customer number and product code, and quantity. Then we have a
product table with the product number, name and unit price. Likewise there are
tables for customers and outlets.

First steps
Let’s implement a cube in this database step-by-step. The first step is to get
a database with tables and columns. These tables are a low-level form of
relational databases. Our retail company wants to view monthly and weekly
reports. Someone feeds in data into a spreadsheet as rows and columns.

These relational databases only provide limited capability. If you want the
performance report for a particular time frame or for some specific customer,
product and location then you have to pull in all the data and work on it all
over again.

Creating these reports according to different requirements creates a lot of
work and number of sheets increase with every additional relationship added. The
structured tables from a database can be changed into a star schema which lets
you interconnect the different fields in it. These tables are linked together
using their inter-dependencies. For example, our order table is dependent upon
customer and product tables for customer id and product id respectively. Now the
sales table we created depends on the product, customer, outlet and order
tables. In our case, the dimensions invariably are Customer, Product, Order and
Outlet while the sole fact table is the Sales table.

Cubing
On top of the relational database we will build a cube for multi dimensional
analyses. As a general procedure you can take time and product on two axes and
as the third axis either you can put employee or outlet. This in turn provides
the data based on performance of an outlet for products in any given time frame.
If you choose to have the product performance over time for employee then
instead of changing the whole table, only one coordinate has to be changed.
These dimensions can have many different parameters and thousands of entries. In
a cube, there are three faces and each of these faces depicts a relation between
two of its dimensions.

Putting up a fourth dimension would let you make a hypercube for advanced
relationships and queries. Instead of creating a domain restricted to months or
years, it is more useful to create a larger ‘time’ domain. This domain will
invariably include all time domains for which you need to track the reports.
Further, the hierarchy system is to be put in place for every product in the
section it comes under. The locations can be categorized for region, zone,
country, and so on as this helps in finding the relevant data when queried. The
only problem faced while defining these hierarchies are dependencies between
them. These dependencies can also cascade. As the data grows, it becomes easier
for reports to have hierarchical patterns. This would let you look at the whole
picture instead of in small subsets.

The aggregate data from these hierarchical systems can be easily obtained as
the data is stored based on data hierarchy. The attributes in these dimensions
are to be defined like the product size, weight, etc. These attributes help in
narrowing the approach and making cube more versatile, by adding as much content
as it can. There are some calculations that would still exist and will require
inputs from existing measures. For example, over here you might need calculating
product revenues depending upon taxes and VAT. In such cases you need to
correlate two dimensions for calculative analysis depending on a derived
measure. With all the structures defined you can go ahead with deployment of the
cube using the specific tool that your database or warehouse vendor provides.

You can also use temporal calculations for some special cases. Using temporal
calculations, cases can be formulated for checking out on factors such as what
would be the effect on sales given one of the measures carried a different value
than what you have currently. You will also be able to compare two particular
entities using some calculations. And finally using conditional soothing i.e.
predicting trends based on assumed values of measures, and market situations,
you will be able to get the market trend for coming months.

Building a cube for OLAP is not an isolated exercise as we have outlined just
the concepts behind cubing. In the real world, you would be first running
processes to cleanup and get all the data into one universal format, followed by
defining a model that better suits the OLAP paradigm in case your existing model
does not suffice. Whatever be the processes involved, if carried out
meticulously, it is one exercise that would not only change but also ease the
way information workers look and execute queries for information retrieval or
even analysis.

Anadi Misra and Anubhav Verma

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.