by September 17, 2003 0 comments



Higher-end machines, such as the AS 400, use a database-based file system. Windows is trying to follow suit with WinFS
to be introduced with the next Windows OS (codenamed Longhorn), WinFS is already creating waves with database major Oracle taking umbrage and waging an antiSQL server war on the Internet. What do file systems and databases have in common, you may ask? 

First, let us examine the innards of a file system (FS). It contains information about files and folders–what and where they are, and how much space they occupy. File systems such as NTFS and EXT2 also contain other information such as security descriptors. A single entry in the FS would be similar to a row on a tabular representation of data. This is where the name FAT (File Allocation Table) comes from. Thus, the hard disk would contain a set of such tables and so a database.

File
System
Max
characters
in name
Best
for
Mgmt
schema
Salient
features
FAT 16
(MS DOS)
8+3 File
sizes in multiples of
cluster-size
HDD
sector based storage
Better
handling for smallerhard disks. Limit on number of directories and files
under a single directory
FAT 32 
(Win95OSR2)
255 Storing a
large number of files of various sizes
HDD
sector based storage. Windows
Registry
FAT 16
limitationsremoved. Format and usepartitions of up to 32Gb to store
extra info
NTFS
(Win NT)
255 Encrypted
and compressed

files 
Flat-file
databases  store additional information
On the
fly encryption and compression.No size or number
limits. Can set access control policies and restrict file-access
WinFS No limit All files
and data
True
RDBMS
Query-able
data and files.Faster searching and indexing. Better disk usage through
elimination of inter-file gaps

Mainframe systems have used database engines (such as IBM’s DB2) for their FS backend. Thus, its entry into the PC platform as well as the power behind the FS seems only natural. With WinFS, Microsoft has integrated their upcoming version of SQL Server (codenamed Yukon) into the kernel layer. 

Primarily, SQL Server is a relational database system with features that can readily enhance file/folder usage and management. These include: indexing (including clustered indexing), grouping of similar data, ability to do cubical (use of Cube and Rollup queries to perform multi-dimensional data analysis in SQL Server) and lexical searches (full-text searches). Further enhancements to the file system can be done by simply creating additional tables and adding feature-specific information into it. 

The atomicity of a file-system operation now, is a single byte-wise or clusterwise read or write. This is granular and is undesirable. We require the ability to undo chunks of operations. With RDBMS enhancement, the WinFS is also slated to get transaction logging features, where you can perform rollbacks. Rollbacks are already a de-facto feature in NTFS systems, with the system performing corrective measures behind the scenes based on data secretly filed away. However, WinFS could enable multiple level file undelete and take data security and recovery to levels never envisaged before. We could maybe encrypt the file system itself, instead of just the content, as is the case now.

A store and forget mechanism could be the rule of the day, where instead of the more complicated COM and API mechanisms prevalent today, simple SQL-based commands are all that are required to perform various file I/O operations. RDBMS backup/restore mechanisms might replace the current file system backup programs for operations on the entire file system, while backing up individual files may be best left to external tools.

Okay, so there are some negatives as well. Doomsayers such as Oracle are already jumping to conclusions saying that Microsoft is conspiring to bundle its SQL Server with Windows, though without facts to prove its allegation. A key flaw would be that there is now a kernel mode driver -the FS database engine – with a strong possibility of an ability to run pure-text SQL commands. Also, the bugs in SQL Server could themselves be included in the OS kernel. But, as the company claims, the entire OS, including the lite DB engine are being written from the scratch for Longhorn and this might turn out to be a good thing. Maybe we would need special service packs and hot-fixes for the file system alone? We might also need to become conversant with SQL to extract better performance out of command lines. Would the file system requirements go up, as the file system driver would now be a database engine?

Basically, WinFS is rewriting the entire concept of file systems. The earliest (for DOS) FAT16 systems were 16-bit, had limitations on maximum file sizes, the amount of hard disk they could address and the number of levels of folders they would accommodate. FAT32 extended the limits a bit. With Windows NT 3.51, the NTFS arrived and introduced file protection schemes and the concept of application-less encryption. Windows 2000 and XP’s own NTFS versions added indexing, error correction mechanisms and optimized in-memory handling of file indexes, some also roping in the ubiquitous Windows Registry.

The version used for NTFS Windows 2003, is largely (hidden) file based and is not very dissimilar from a database, containing similar records for its attributes. All in all, if you want to maintain inter-operability with different OSs’, and also with different versions of the Windows system, we need the lowest common denominator. And that is and will remain FAT16.
How easy would it be to upgrade to WinFS from other schemas? We will have to wait and see. However, there should be an easy utility to do a convert (as in NT/2000/XP for FAT to NTFS) anytime. There would also be ways to format an existing partition with WinFS if needed.

The storage efficiency of WinFS is again a moot point. Since there would be all kinds of tables in this database file system, the file-system info (the equivalent of the FAT tables) can be expected to be large. However, the data itself (the files) will be stored in compressed formats as individual rows. The I/O algorithm would no longer need to leave inter-file gaps as a new row could be appended anywhere in the table. The cleanup routines would take care of reclaiming row-spaces of data that got deleted. This should help optimize file storage.

Microsoft itself is seriously debating the efficacy of this move with one group completely denying the concept of
WinFS.

However, at a recent press conference, Chairman and Chief Software Architect Bill Gates stated that WinFS would be a part of Longhorn when the public betas ship at the end of this year. Also, earlier FAT systems like FAT16, FAT32 and NTFS would continue to be supported by Longhorn. WinFS is slated to be just another option, should you want to use it.

So may be one day, we see something like:

C:\> SELECT * FROM Songs WHERE (Singer LIKE ‘Bon
Jovi%’);

Bad command or filename.

Sujay Sarma

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.