Developers

Data Access Strategies

PCQ Bureau

09 Jul 2003 06:13 IST

New Update

Dot net has plenty of data-access mechanisms to choose from. Let’s take a bird’s eye view of the different data-access choices.

Advertisment

The DNA World

The DNA world provided us with many choices for data access. Proprietary DB API calls, ODBC, DAO, RDO and so on went the list of choices. The big transition came with the ADO object model with support for UDA (Universal Data Access) based on OLE DB. Thus we could access any data, anywhere, using ADO.

ADO 2.6

ADO has many ‘plus points’. It’s easy to learn, language-independent, implemented with a small memory footprint and uses minimal network traffic. The combination of OLE DB with ADO was primarily designed for a tightly coupled, COM-aware environment there was little cost involved in clinging on to database connections.

Changing Landscape

There has been a shift from tightly-based client-server applications to loosely coupled, Web-based applications. There is now a need for disparate applications, running on a diverse set of platforms, using myriad languages and object models, to exchange data seamlessly. This inevitable shift to the Web, wherein protocols are intrinsically connectionless, necessitated the following: (a) a data access model with rich features for ‘connectionless’ programming, (b) a model that is vendor-neutral and (c) a model that is ‘XML in its core’, as XML has emerged as the de-facto standard for information exchange.

Advertisment

ADO’s reliance on COM custom marshaling to exchange data is a big limiting factor for interoperability across disparate platforms. Also, the ‘Hold on to the connection’ severely impacts scalability.

Of course, Microsoft did introduce features in ADO 2.6 to adapt to changing scenarios: the disconnected recordset and the ability to load and save disconnected recordsets into and out of a specific XML schema. However, these changes have been perceived, at best, as ‘patchy support’ to the Web-based programming paradigm. It is also rather difficult to reconcile the ADO data model that is primarily relational, with the new world of XML where the data model is heterogeneous and hierarchical. XML comes with its own object model (XMLDOM) and a different set of services (XSL/T, X-Path, W3C Schemas). Many a time , we are forced to make a binary choice about which stack to use–MDAC or MSXML–when what we really look forward to is to use both technologies in a more complementary, need-based fashion.

ADO.NET

ADO.NET is a natural evolution of ADO, built around n-tier application development. Data access with ADO.NET is designed around an architecture that uses connections sparingly; the application is connected to the database only long enough to fetch or update the data. ADO.NET has been architected with XML support at a very fundamental level. Cross-platform exchange of data can now occur in a loosely-coupled fashion.

Advertisment

DISCONNECTED

ACCESS TO RELATIONAL DATA

USING DATASETS

ADO.NET is well integrated with the .NET Framework. Hence, it enjoys all key benefits of .NET–automatic object lifetime management, language-neutral class libraries, cross-language inheritance, exception handling and debugging. The strength of the .NET Framework is in its uniformity–all components share a common type system, design patterns, and naming conventions. ADO.NET follows the same style.

The ADO.NET object model is broadly divided into two levels: the connected layer and the disconnected layer.

Advertisment

The Connected Layer

Staying connected, we can perform DML and DDL operations, execute stored procedures and use transactioning techniques. We can also fetch records in a read-only, forward-only fashion. The connected layer consists of the classes that comprise the Managed Provider. A Managed Provider loosely maps to the concept of an OLE DB provider, but the object model looks a lot more like ADO. It consists of Connection, Command, DataReader, and DataAdapter classes. An ADO programmer may be familiar with the first two classes. They help to get connection to a data source and execute a command against it, respectively. The DataReader loosely corresponds to a forward-only, read-only recordset. Finally, the DataAdapter acts as the bridge between the Managed Provider and the DataSet, which we will discuss now.

Some Managed Providers currently available are:

Managed Provider for SQL Server
Managed Provider for any data source that has a OLE DB provider
Managed Provider for Oracle
Managed Provider for any database with ODBC driver

Advertisment

Of course, we can roll our own Managed provider too, if we have a proprietary data store with proprietary (and perhaps, the most efficient) data access mechanisms!

The Disconnected Layer

The disconnected layer in ADO.NET revolves around the Dataset. The concept of the disconnected recordset in ADO has been greatly enhanced and elevated to become the focal theme in the programming model in

ADO.NET.

A common data task in any application is to retrieve data from the database and perform tasks like display, processing or transmission to another layer. The application may fetch data from one or more tables in the same database or different databases. Once these records have been fetched, the application typically works with them as a group.

Advertisment

In many cases, after the query is executed, it is neither practical nor necessary to go back to the database each time the application needs to process the next record. A viable alternative, therefore, is to retrieve the required data at one shot and work offline with this temporary set. This is exactly what a dataset is all about. A dataset is a local cache of records retrieved from one or more data sources. It is interesting to note here note here–part of the data could come from SQL Server, the other portion from Oracle or for that matter a spreadsheet!

The dataset works like a virtual data store. It includes three key components: data, constraints on data and the relationships among data and the data structure in the form of schema. A dataset can contain one table or a collection of tables. An important aspect of the dataset is that it keeps track of the relationships between the tables it contains–as if it is an in-memory, relational data store.

Using a dataset, we can scroll through data. We can also do a hierarchical drill down. For example, get all customers and their orders into the data set and given a customer, view the orders placed by him. We can make changes on this copy we are working with, and undo them too. All this happens as our application stays disconnected from the database.

Advertisment

Of course, we often need to update data in the database. We can perform update operations on the dataset using the Data Adapter, the ‘bridge’ object mentioned earlier. Remember that the dataset is a passive container for the data. To actually fetch data from the database and (optionally) write it back, data adapters are used.

Because a dataset is effectively a private copy of the database data, it does not necessarily reflect the current state of the database. This means that refresh operations need to be planned, periodically or as when required, so as to make the copy up-to-date. Also, as changes are made in our local copy, other users may directly modify the data in the database. Hence, concurrency conflicts cannot be ruled out at all! Concurrency conflict detection and resolution are two key design features to be carefully planned for.

The ‘disconnected’ scenario is by no means limited to Web-based applications alone. The ‘smart client’ applications that involve usage of hand-held devices can also be aptly included in ‘disconnected’ environment. ADO.NET fits in equally well here too.

XML Integration

In ADO.NET, the format for data transmission is XML. If we have an XML file, we can use it like any data source and create a dataset out of it. Similarly, if data needs to be persisted, it can be stored as XML. This simply means that we can access and manipulate data in two ways: the ‘relational’ way and the ‘XML’ way.

This seamless integration offers a lot of interesting choices to developers. Essentially, it is obvious that that ‘relational’ and ‘hierarchical’ representations are two views of the same data. Therefore, as a developer, we can choose to hold a ‘relational’ view when we wish to perform data binding and execute relational queries. On the same data, if we wish to perform XSL transformation and XPath queries, we just need to switch our view to ‘hierarchical’ mode. For example, we can read Customer data from SQL Server database into memory, switch to XML view so as to perform XSL-transformations and X-Path queries and perhaps write the modified data back into the table in the database. The dataset makes it extremely easy to choose/switch between the two views, thereby providing tremendous degree of flexibility.

Since the dataset's serialization format is XML, Dataset objects can easily pass through firewalls without restrictions. Dataset can be remoted across tiers. For example, a business object in the middle tier might create and populate a dataset, then send it to another component, probably running in a non-Microsoft platform, for subsequent processing.

XML Support in SQL Server

Consider a requirement to develop a middle-tier app over an existing, customer information database and expose/consume data to and from a client that has only HTTP access. The obvious choice to format the data is

XML.

To do this in ADO, we need to select data from the Customers, Orders, Order Details, and Products table, and then build the XML ourselves by cursoring through the recordsets, nesting elements based on the key relationships, and adding all of the appropriate XML markup. The DNA model, as we can see, places the onus of presentation on other layers.

What about ADO.NET? Here also, we need to retrieve the data using Data Adapter in a relational format, bring it to dataset and switch to ‘XML’ view. Why would we want to introduce an extra-execution

layer for this conversion to XML? There has to be a better way.

XML Support in SQL Server 2000

In SQL Server 2000, the user can now execute SQL queries against relational database objects and return results as XML documents directly, rather than as standard

rowsets.

This doesn’t mean that we have to change our database design to accommodate XML. We only intend to express the data a different form. All of our well-normalized entities, finely tuned queries and well-groomed indexes will remain intact if we plan properly for XML integration. We are simply placing a thin veil over the database so that any XML-based application will not be able to distinguish the database's translation to and from XML, and a purely XML tier.

It is one thing for SQL Server to have XML capabilities and it is another thing to expose them via HTTP. This access mechanism mandates creation of a virtual directory in IIS, on top of a SQL Server database. The requirements here could be many. A few include insulating names of columns and tables from the client application and providing meaningful names and encapsulating database queries/calls inside an XML file that is invoked by the client. Without getting into details, let me just mention that mapping schemas and template files provide a lot of convenience and abstraction in this context.

More advanced features include exposing stored procedures as Web Services, programmatic access to these features, both in ADO as well as in ADO.NET; in the latter model, managed classes have been specifically written to tap the rich, native XML support provided by SQL Server 2000.

We are not just talking about retrieving data in XML format. Client applications can also submit data in XML format for insert/update/delete purposes.

While ADO.NET or for that matter ADO, offers high-performance data access, all blame cannot be heaped on a specific data access technique, if we are submitting "challenging" queries. For example, we may not have optimized the indexes, we may not used stored procedures, the server may be overloaded, or we may ask the application to return too many rows. The basic database design, query design and optimization issues hold the key to make or break performance, not withstanding the specific data access technique.

Best Data-access Approach

Is Microsoft trying to overwhelm us with choices? No, since diverse applications have disparate needs. There is no one approach that can statically be called as the ‘best’. It all depends on the specific requirements in a scenario. Choosing a data access method is based on what our application requires in the following areas:

Performance
Interoperability
Degree of Integration with XML
Managed/Native environment in which the application runs
Degree of coupling acceptable between different layers
Scalability

The emphasis we place on each factor should be motivated by both the current and future needs of the application.

Best choice

The key to writing applications with efficient data access lies in not getting bogged down with the choices described in this article, or worse, getting ‘anchored’ to one choice for all situations! The smart strategy is to learn the features, merits and demerits of each choice and weigh them against the unique needs of our application. We will then be able to make ‘appropriate’ choices meeting the unique needs of our application.

Meena K, runs Symbiosis Training and Consultancy, Bangalore

Advertisment