Advertisment

Warehousing for Easy Information Access

author-image
PCQ Bureau
New Update

In the first part of this series, we talked about the benefits, market

proposition, some technologies involved and the tools that are available in the

market for business intelligence.

Advertisment

This time, we will be taking you into the world of data warehousing and its

structure definitions, various terms and entities associated with it. We will

talk about the data warehouse components and characteristics. We will list all

the challenges that are common while defining data warehouses. We will give you

a hands-on experience of our warehouse implementation.

Why do you need a data warehouse? This is the most common query you face

while convincing others to adopt the technology and implement structured data

system in their organization. The volume of data that one has to deal with in an

organization comprises thousands of accounts, product lists, transaction

details, and inventory and customer requests present in fragmented format (as

databases, files and spreadsheets). Each of these databases, files and sheets

has to be updated on a daily basis in a heterogenous environment of data

sourcing and collection and uploaded to stay active, which is not easy. For

those who implement data warehouses, this data would run into terabytes and

across different locations. So how do you extract any specific bit of data that

you want and within minutes? For this, specific architectures have to be laid

down to ensure data flow of your choice in a warehouse.

A properly configured warehouse has different as well as unique databases

integrated together. So if you want to build a data warehouse, you need a team

of dedicated IT professionals with good understanding of your business so that

they can correlate different data and implement data structures in the most

suitable manner for their organization. This involves categorizing databases

that house data from each department spread across locations and link them

accurately across structure in defined business process. And it's not the end

of exercise once the warehouse is implemented and has gone online. Since a

warehouse is not a static application which once installed or configured will

work painlessly, the team has to modify its properties quite often with every

change in the market conditions and company's management decisions.

Advertisment

Warehouse architectures



In many cases, a data warehouse contains the living history of the organization
while they usually contain historical data, often collected from different

sources such as OLTP (On-Line Transaction Processing) systems, legacy systems,

spreadsheets or even text files. A data warehouse combines this data, cleanses

it for accuracy and consistency, and organizes it for ease and efficiency of

querying. Some definitions of a data



warehouse include several elements such as a data preparation area, the
cleansing process, the database that holds the data warehouse data, and the

tools that organize and present the data to client applications. Other

definitions restrict the data warehouse to the database that contains the data

warehouse data. In large data warehousing applications, data is often segmented

into specialized components, called data marts that address individual

components of the organization. Some definitions consider data marts to be part

of the data warehouse; others consider them to be separate entities. The

intended meaning of the term data warehouse is usually clear from the context in

which it is used.

Critical solutions such as warehouses require a careful analysis of what is

needed and how a subsequent warehousing solution has to be implemented. While

going for a warehouse solution the choice of architecture matters a lot, as it

is to be based on factors such as organization's



infrastructure, business requirements and desired system structure. What
architecture one goes for is also affected by the location of the data marts and

the location of the central control. For example, an organization might go for

an architecture that allows data to be present, distributed and



controlled centrally or even independently. You can have architectures where the
data marts are independent or inter-connected or go for a warehouse that spans

through your organization. The scope here refers to the usage and not physical

location.

Advertisment

Architecture components



The architecture of the warehouse is made up of a set of components with clearly
defined operational and structural boundaries. At the base is what is termed as

Operational Data Sources or Data Sources which communicate with the layer above

them, namely Data Access Layer. This layer serves as an interface between the

data sources and the Data Staging Area. It is responsible for 'cleansing'

the data, ie bringing all the data from different sources to a universal format

before it is passed on to further layers in the warehouse. After this layer

comes the Data Warehouse Layer which is the organization's warehouse. Layers

above this are typically another Data Access Layer for acting as an interface to

processes and applications that are conceptually placed in the top most layer of

the architecture called the Information Access Layer or the User Analysis Layer.

These layers can be considered to be a part of the larger Application

Messaging Layer. This layer is responsible for transfer of information within

itself, ie between the components that make up this layer and also between its

component layers and external data sources like the Metadata.

Advertisment

Implementation



Implementing a data warehouse begins with a thorough phase of analysis of your
business, its requirements and environment, and analyzing shortcomings in the

current BI. Repeating processes iteratively leads to a better understanding of

the overall state of business and tells you where and how much the problem is.

So you need to gather business documents and analyze them to formulate the

decisions aforesaid.

After this, you analyze other requirements which affect running business

processes in the organization. The activities that constitute this phase include



end-user requirement analysis . This revolves around the data needs of a
day-to-day information worker. The functional requirements are evaluated next;

this answers questions such as what new information analyses are needed and what

pitfalls of the current information analysis



systems are, if they exist. The information infrastructure is also evaluated to
find out whether the current infrastructure suffices for the implementation or

an expansion is required.

DW with SQL

Server 2000
Let's see one sample DW implementation as a step by step

procedure. We used MS SQL Server 2000 on Windows 2003 Enterprise Server

SP1.
  • Define the DTS (Data Transformation Services), which defines the

    flow of data and its automation. The services take care of cleansing

    your data. You can also transform data from Oracle, Informix, Excel

    workbooks and even text files using DTS. This can be designed from the

    DTS designer or created programmatically. Next you have to identify

    the DTS applications and define data warehouse system and then apply

    DTS to data warehouse.
  • Define the warehouse structure by defining the source and

    destination structure, define the dimension tables and fact tables for

    your data. The dimension tables are tables that keep information of

    the KPI (Key Performance Indicators) of your organization's

    business, while fact tables store the facts metrics that remain

    constant over a time period.
  • Define a Star Schema for your data. The star schema, as the name

    suggests, breaks up your schema into a star shaped structure with each

    of your KPI at the branches. After you have described the star schema,

    then check the data load on star schema and define dimension and fact

    data loads.
  • The warehousing process also involves a staging area which is the

    place where data is kept before it is transformed into the warehouse

    standard. So the next step in the exercise is to implement the staging

    tables.
  • Apply the DTS to the configured data. It is very critical in the

    entire process to evaluate what transformations are required in

    warehousing. Deciding upon the DTS package elements is yet another

    important step as it directly affects the efficiency of your warehouse

    solution. You can use the DTS Package Designer to create the DTS flow

    to define the data filtering process more accurately with package

    connections, tasks and subsequent steps. Once all the designs have

    completed you can load you DTS modules and start executing them, well

    that's not the end of story but we have just provided you an

    overview of the process.
Advertisment

After all the analyses, comes the warehouse modeling phase. This involves

modeling the warehouse at both logical and physical levels. This modeling is

quite different from the modeling of a relational database. For instance, in

relational database modeling a direct end-user interaction is never a priority

whereas here the modeling is done so that the warehouse is end-user centric.

Unlike a database, the modeling here is done keeping in mind the fact that the

warehouse is being designed for data analysis and not just data storage in a

qualified pre-defined manner.

While these issues are addressed in the conceptual modeling, the physical

modeling looks into aspects such as how the implementation will proceed.

So it is not that hard to configure a warehouse. All it takes is an

initiative to implement the process and take that first step. After that, DTS

takes care of most of the things and you can get structured answers to your

queries.

Anadi Misra and Anubhav Verma

Advertisment