All computer applications need to store and retrieve information. This information should remain persistent even after the application using it terminates or the computer system is shut down. The solution is to store information on non-volatile storage like hard drives, magnetic tapes, and optical media. Information on such media is stored using a logical storage unit, the file. The OS manages files and the part of the OS dealing with files is known as the file system. A computer system can have thousands of files so some mechanism is required to organize and keep track of them, which is done by the file system. It consists of two distinct parts: a collection of files, each storing related data and a directory structure, which organizes and provides information about all the files. In this article we’ll look at the basic structure of a file system and some of the popular ones.
The basics
Files are an abstraction mechanism. They facilitate information storage and retrieval in a way that the user is shielded from the details of how and where on the disk is all this done. Files are named for the convenience of its users. Most OSs allow strings of one to eight characters as file names, and some even support longer file names of up to 255 characters.
Many OSs also support two-part name, with the first being the actual name and the second part, called file extension, identifying the file type. For example prog.cpp represents a C++ source file named prog. OSs also provide various operations on files. Some of the common ones are CREATE, DELETE, OPEN, CLOSE, READ, WRITE, APPEND, SEEK, RENAME, etc.
|
||||||||||||||||||||||||
FAT file system directory entry |
Files are stored on disk as discrete entities, but the OS links them logically using the directory structure. A directory typically contains a number of entries, one per file. The directory entries store information such as name, location, size and type for the files placed logically under the directory. This way while the files are stored separately on the disk, the user gets a feeling that files are kept together in a directory. Directory is also sometimes called a folder.
Earlier systems had single-level directory structures with all files contained in a single directory. The problem was that two people using the system couldn’t give the same name to two files. Since all files are in the same directory, they must have unique names. Then came two-level directory structures. There was a top-level directory, also called root directory, containing sub-directories for users of the system. The users were allowed to save files in their sub-directories and not the root directory. This way two users could have files with the same name, as they were storing the files in their respective sub-directories. But soon users also felt the need to create further sub-directories in their sub-directories. So tree level or hierarchical directory structures came into being. A directory (or subdirectory) contains any number of files or further sub-directories. A file in a directory has an absolute path name consisting of the path from the root directory to the file.
Popular file systems
This was all about files and directories, but in the real world there are a number of file systems used by different OSs. Each file system has a different directory organization and a different way to represent files.
Let’s look at some of the popular and widely used file systems.
FAT
FAT is an acronym for File Allocation Table. The FAT file system uses 8.3 filename naming convention and all filenames are
created with the ASCII character set. 8.3 implies that the filename can be up to eight characters long, and have a three-character extension to indicate file type. The name cannot contain spaces and is not case sensitive. However, all characters get converted to uppercase after a file is created. All Microsoft Operating Systems, Mac OS and some versions of Unix support FAT.
In FAT, files are stored in clusters, whose size is determined by the size of the partition size. A file can be stored in a single cluster or can use multiple clusters depending on file size. Earlier versions of FAT used 16-bit addressing so the file system is also called FAT16. FAT is actually a table indexed on cluster numbers. Using 16-Bit addressing a total of 216 (65536 or 64K) clusters can be present in the file system. To support large disks, the cluster size can go up to 32K. This means the maximum disk size can be 64K*32K=2GB. When a file is created, an entry is created in the directory, which contains the file name, file attributes and the starting cluster number, which indexes into the FAT. This entry in the FAT table either indicates that this is the last cluster of the file, or points to the next cluster. To protect the data two copies of FAT are maintained in case one becomes damaged.
VFAT
VFAT is an extension of the FAT file system and was introduced with Win 95. VFAT maintains backward compatibility with FAT, but relaxes some of the rules. VFAT filenames can contain up to 255 characters, spaces, and multiple periods. VFAT is not case sensitive, and unlike FAT, it also preserves the case of the file name once created. The maximum disk size supported by VFAT is 4 GB. In Win NT 4.0, if you format a partition as FAT, it is actually formatted as
VFAT.
FAT32
FAT32 is actually an extension of FAT and VFAT, first introduced with Win 95 OEM service Release 2 (OSR2). This has all the filename features of VFAT. Plus the greatest advantage is 32-Bit addressing.
This results in smaller cluster sizes than FAT16. Smaller cluster size dramatically increases the amount of free hard disk space.
To illustrate this, consider a 2 GB FAT16 partition, which has a cluster size of 32K.
Now on this partition even a 1-byte file will occupy the entire 32K cluster. If this rule applies to every file on your hard disk, a lot of space is wasted. In FAT32 file system, partitions of less than 8 GB have a cluster size of 4 KB. This way it’s not uncommon to gain back hundreds of megabytes, using a FAT32 partition. The FAT32 file system supports disk sizes up to 2 TB. FAT32 is also supported by Win 95 (OSR2)/98/2000 but not by NT.
NTFS
NTFS was created to compensate the features that FAT lacked. It is sometimes called New Technology File System, but this is not the exact name. File and directory names can be up to 255 characters long. Filenames preserve case but are not case sensitive. The maximum size of an NTFS partition is 16 exabytes, i.e. 264 bytes. The cluster sizes of NTFS are 512 bytes, 1kB, 2kB and 4kB, depending on the partition size. The goals of NTFS are to provide: reliability, fault tolerance, security, POSIX support, etc. Files in NTFS are not considered a single stream of data as in FAT, but it supports multiple data streams. These additional streams contain data that describe the file attributes. FAT supports only read-only, hidden, system, and archive file attributes. Apart from these NTFS also supports last-access, last-write, file-creation date-time stamps and security access restrictions. Due to the multiple data stream a user can add his or her own user-defined attributes to a file. This is because each attribute of a file is an independent byte stream that can be created, deleted, read and written. These attributes can be specific to certain kinds of files.
|
||||
Ext2fs directory entry |
NTFS’ goal of providing reliability is met by organizing I/O by transactions. Transactions are atomic, which means that either the entire I/O operation must complete or none of it can complete. If anything interrupts the transaction in-progress, such as loss of power to the computer or a cancellation of the I/O operation, any changes made to the file system as part of the I/O operation are undone, or rolled back, returning the file system to its condition before the I/O operation began. NTFS allows the operating system to recover without having to use disk-checking utilities like chkdsk, which are required for fat and fat32 file systems.
NTFS implements files and directories as securable objects. NTFS fully supports the Win NT security model. Access to file and directory objects can be restricted to specific users and groups. NTFS keeps access lists with files, which define which users and groups can access the file.
A FAT16 partition enables compression of the entire partition but it slows down the file access after compression. FAT32 offers no compression. NTFS offers a much better option. It lets the user compress and encrypt individual files and directories of choice. This way you can compress seldom—used files to save space and it wont slow down your overall system performance.
NTFS also supports creation of hard links. It is a technique that allows a file to appear in more than one directory. In this the actual file remains the same but additional directory entries can be made which point to the original file. Any changes made to the file through one link are visible to applications accessing the file from the other links. A hard link is similar to the original directory entry and after creation there is no difference between a hard link and original directory entry.
Ext2fs
Ext2fs, the second extended file system is probably the most widely used file system in the Linux community. The directory structure used in Ext2fs is extremely simple with each entry containing just file name and its i-node number. i-node is a structure, which describes the file. All information about the file type, size, timestamp, ownership, access rights, file type pointers to data blocks is contained in the
i-node.
When a file has to be accessed, the directory is checked for the file name to find the i-node number. Then the i-node is located and the disk locations of the files block are read. Using the block addresses the file is read from the disk. File blocks are similar to the clusters used in FAT to store files. The block sizes can typically be 1k, 2k 4k.
Apart from regular files and directories, the file system also has block character and character special files. These special files represent block devices, hard disk, and character devices, keyboard, in the file system. This way applications can directly access the device through normal file read, write operations. This mode of access is sometimes called raw I/O. Ext2fs supports a maximum partition size up to 4TB and long file names up to 255 characters, which could be extended to 1012 if needed. Hard links can also be created in the file system. For security, files contain read, write, and execute attributes for the user, group and others. A user can only access the file in a particular mode if the appropriate attribute bit is set otherwise an access denied message is displayed.
Ext3fs
Ext3fs is actually a Ext2fs file system with a transaction log similar to the NTFS file system. It is also called a journaling-file system, as the transaction log is called journal. Now if there is power loss and the system reboots the integrity of the file system is guaranteed to be preserved, and no fsck is necessary. The system is up and running quickly.
Reiser file system
ReiserFS is a comparatively new file system for Linux systems. It is also a journaling file system like the Ext3fs and facilitates crash recovery, speeds up booting process and helps prevent data loss due to mechanical or serious user errors. Maximum partition size can be 16 TB with block sizes of 4 KB and going up to 64 KB. It provides fast performance when reading and writing small files and may be more suited to work as database server. But, the developers of the file system say it is equally good at large files and is truly a general-purpose file system.
There are lots of other file systems too, and covering them all is beyond the scope of this article.
Anoop Mangla