Higher-end machines, such as the AS 400, use a database-based file system. Windows is trying to follow suit with WinFS
to be introduced with the next Windows OS (codenamed Longhorn), WinFS is already creating waves with database major Oracle taking umbrage and waging an antiSQL server war on the Internet. What do file systems and databases have in common, you may ask?
First, let us examine the innards of a file system (FS). It contains information about files and folders–what and where they are, and how much space they occupy. File systems such as NTFS and EXT2 also contain other information such as security descriptors. A single entry in the FS would be similar to a row on a tabular representation of data. This is where the name FAT (File Allocation Table) comes from. Thus, the hard disk would contain a set of such tables and so a database.
File System |
Max characters in name |
Best for |
Mgmt schema |
Salient features |
FAT 16 (MS DOS) |
8+3 | File sizes in multiples of cluster-size |
HDD sector based storage |
Better handling for smallerhard disks. Limit on number of directories and files under a single directory |
FAT 32 (Win95OSR2) |
255 | Storing a large number of files of various sizes |
HDD sector based storage. Windows Registry |
FAT 16 limitationsremoved. Format and usepartitions of up to 32Gb to store extra info |
NTFS (Win NT) |
255 | Encrypted and compressed files |
Flat-file databases store additional information |
On the fly encryption and compression.No size or number limits. Can set access control policies and restrict file-access |
WinFS | No limit | All files and data |
True RDBMS |
Query-able data and files.Faster searching and indexing. Better disk usage through elimination of inter-file gaps |
Mainframe systems have used database engines (such as IBM’s DB2) for their FS backend. Thus, its entry into the PC platform as well as the power behind the FS seems only natural. With WinFS, Microsoft has integrated their upcoming version of SQL Server (codenamed Yukon) into the kernel layer.
Primarily, SQL Server is a relational database system with features that can readily enhance file/folder usage and management. These include: indexing (including clustered indexing), grouping of similar data, ability to do cubical (use of Cube and Rollup queries to perform multi-dimensional data analysis in SQL Server) and lexical searches (full-text searches). Further enhancements to the file system can be done by simply creating additional tables and adding feature-specific information into it.
The atomicity of a file-system operation now, is a single byte-wise or clusterwise read or write. This is granular and is undesirable. We require the ability to undo chunks of operations. With RDBMS enhancement, the WinFS is also slated to get transaction logging features, where you can perform rollbacks. Rollbacks are already a de-facto feature in NTFS systems, with the system performing corrective measures behind the scenes based on data secretly filed away. However, WinFS could enable multiple level file undelete and take data security and recovery to levels never envisaged before. We could maybe encrypt the file system itself, instead of just the content, as is the case now.
A store and forget mechanism could be the rule of the day, where instead of the more complicated COM and API mechanisms prevalent today, simple SQL-based commands are all that are required to perform various file I/O operations. RDBMS backup/restore mechanisms might replace the current file system backup programs for operations on the entire file system, while backing up individual files may be best left to external tools.
Okay, so there are some negatives as well. Doomsayers such as Oracle are already jumping to conclusions saying that Microsoft is conspiring to bundle its SQL Server with Windows, though without facts to prove its allegation. A key flaw would be that there is now a kernel mode driver -the FS database engine - with a strong possibility of an ability to run pure-text SQL commands. Also, the bugs in SQL Server could themselves be included in the OS kernel. But, as the company claims, the entire OS, including the lite DB engine are being written from the scratch for Longhorn and this might turn out to be a good thing. Maybe we would need special service packs and hot-fixes for the file system alone? We might also need to become conversant with SQL to extract better performance out of command lines. Would the file system requirements go up, as the file system driver would now be a database engine?
Basically, WinFS is rewriting the entire concept of file systems. The earliest (for DOS) FAT16 systems were 16-bit, had limitations on maximum file sizes, the amount of hard disk they could address and the number of levels of folders they would accommodate. FAT32 extended the limits a bit. With Windows NT 3.51, the NTFS arrived and introduced file protection schemes and the concept of application-less encryption. Windows 2000 and XP’s own NTFS versions added indexing, error correction mechanisms and optimized in-memory handling of file indexes, some also roping in the ubiquitous Windows Registry.
The version used for NTFS Windows 2003, is largely (hidden) file based and is not very dissimilar from a database, containing similar records for its attributes. All in all, if you want to maintain inter-operability with different OSs’, and also with different versions of the Windows system, we need the lowest common denominator. And that is and will remain FAT16.
How easy would it be to upgrade to WinFS from other schemas? We will have to wait and see. However, there should be an easy utility to do a convert (as in NT/2000/XP for FAT to NTFS) anytime. There would also be ways to format an existing partition with WinFS if needed.
The storage efficiency of WinFS is again a moot point. Since there would be all kinds of tables in this database file system, the file-system info (the equivalent of the FAT tables) can be expected to be large. However, the data itself (the files) will be stored in compressed formats as individual rows. The I/O algorithm would no longer need to leave inter-file gaps as a new row could be appended anywhere in the table. The cleanup routines would take care of reclaiming row-spaces of data that got deleted. This should help optimize file storage.
Microsoft itself is seriously debating the efficacy of this move with one group completely denying the concept of
WinFS.
However, at a recent press conference, Chairman and Chief Software Architect Bill Gates stated that WinFS would be a part of Longhorn when the public betas ship at the end of this year. Also, earlier FAT systems like FAT16, FAT32 and NTFS would continue to be supported by Longhorn. WinFS is slated to be just another option, should you want to use it.
So may be one day, we see something like:
C:\> SELECT * FROM Songs WHERE (Singer LIKE ‘Bon
Jovi%’);
Bad command or filename.
Sujay Sarma