Advertisment

Ensuring Information Quality for Business

author-image
PCQ Bureau
New Update

Organizations of all sizes across industries are deluged with data spread

across various legacy systems, local databases, spreadsheets and other data

sources. They are constantly seeking ways to bring in an enterprise wide

datawarehouse that delivers actionable, timely and reliable business

information.

Advertisment

Most techniques used by organizations to build a comprehensive Information

management infrastru cture employ either a top-down or bottom-up development

approach. In the top-down approach, an enterprise datawarehouse (EDW) is built

in an iterative manner, business area by business area, and the underlying

dependent data marts are created as required from the EDW contents. In the

bottom-up approach, independent data marts are created with the view to

integrate them into an enterprise data warehouse at some time in the future.

Either way, a major blind spot in the datawarehousing project delivery

quality is testing. Unlike custom software applications or product

implementations, there tends to be minimal user involvement in testing of the

datawarehouse and its downstream data marts, given that testing processes are

technical and follow the same iterative cycle as the data warehouse development.

Datawarehouse testing



The importance of specialized data warehouse testing cannot be undermined.

The key success factor is user confidence and consistent data quality, which

leads to a high degree of user adoption. The ROI of information assets can be

determined only when the data is put to usage to get business insight.

Advertisment
Various data transformation layers in a typical DWH

implementation and the testing process flow and coverage.

Testing of datawarehouse & business intelligence applications involves

significant programming work as there are limited front end screens but mostly

back end processes that work on data sets. Further unlike complicated features

and functions of application software, DWH projects can be considered a simple

sequence of data transformation, changes, and aggregation through a chain of

processes. But this simple sequence of data movement leads to complication in

testing. For every transformation of a dataset, testing must ensure that

transformation is right by including the transformation logic into test scripts.

With no front end screens, most test scripts have to be created as backend

scripts (say SQL queries) for testing. Thus, DWH testing is more intensive and

more programmatic than regular application testing and requires extensive domain

knowledge and DWH concepts to create test scripts. There is no readily available

user interface to visually inspect and validate.

Testing process and methodology



Datawarehouse testing starts with identifying all the data sources that go into
building the data warehouse, downstream applications like subject specific data

marts and reports and other dependencies. Each data movement layer is determined

in terms of source data set, target data set and data transformation and

business logic. Testing is focused on verification and validation for the

correct and complete data movement from one data component or layer to the

other, and successful generation of any downstream requirements like reports.

Typically testing may be differentiated into three distinct components: data

movement from the files/ other source systems to an operational data store or a

staging area, further data transformation and creation of business logic in the

data warehouse and downstream data movement. Downstream data dependencies could

be subject specific data mart where further aggregation and integration of

business logic takes place, and reports. The key scenarios that should be tested

are:

Advertisment

Data completeness:  Data completeness check involves some of the below

mentioned sub scenarios.

* Comparing record counts between source data, data loaded to the warehouse

and rejected records.

* Comparing unique values of key fields between source data and data loaded

to the warehouse. Typically can be done for entities like Customer, Account,

Transactions, Product etc.

Advertisment

* Value Range Testing: Test for the range and value distributions of fields

in a data set. This can be used during testing and in production to compare

source and target data sets and point out any data anomalies from source systems

that may be missed even when the data movement is correct.

Data correctness:  Data correctness checks for inaccurate data coming from

disparate source systems, different definitions of operational data elements and

loss of data during the ETL (Extraction, Transformation and Loading) process.

Data transformation: Testing involves checks to ensure that the data has been

correctly transformed as per business rules and that the parent child

relationships are correctly defined.

Advertisment

Data quality: Data quality checks focus on "how the ETL system handles data

rejection, substitution, correction and notification without modifying data”.

Business logic checks: Testing will validate if all the business logic as per

the defined business specifications  built in the ETL and the reports is

correct. At the report level, this involves testing for all the measures with

adequate data that mirrors production data to the extent possible so that all

the business conditions are met. Testing will also cover report level

aggregations like averages, percentages and sub totals and also graphical

representations.

Downstream data reconciliation: This involves reconciliation of data from the

source systems to the final downstream, across the various intermediate

transformation layers.

Regression testing: Given the iterative methodology of building a data

warehouse, regression testing is of utmost importance. The focus is on ensuring

that every incremental release does not break the existing system in production

and that the new functionality behaves as per specifications.

Test scripts are executed as a batch file and the results will be generated

as an output file or report that provides details on the Source —Target counts

and other test scenarios. The batch file also accounts for the execution of the

dependent scripts. Datawarehouse testing needs specialized skill sets which is a

combination of domain knowledge, testing experience and technical knowledge for

understanding the data warehouse design and overall IM infrastructure. Testing

strategy must start with the design stage itself and key challenges in terms of

data and infrastructure availability, and access to the test environment must be

determined at the initial design stages.

NK Subramaniyam - Executive Director & J Srivastava - Head Products &

Alliances, Saksoft

Advertisment