Advertisment

General Parallel File System 

author-image
PCQ Bureau
New Update

The General Parallel File System (GPFS) is IBM’s first shared disk file system. It was initially released on the RS/6000 SP in 1998 using software simulation of storage area network called the IBM Virtual Shared Disk

(VSD). 

Advertisment

Subsequently, it has been ported to Linux. At its core, GPFS is a parallel disk file system. 

It guarantees the availability of the entire file system to all nodes within a defined scope and the safe application of the file system’s services to the same file system on multiple nodes simultaneously. 

GPFS allows users to have shared access to files that may span multiple disk drives on multiple nodes. Every node can have concurrent read/write access to a file. 

Advertisment

The basic objective is to scale file system I/O, to meet the demand of applications such as digital library serving and data mining. 

It delivers its performance by striping data across multiple disks on multiple servers.

The GPFS offers many standard UNIX file system interfaces, allowing most applications to execute them without any need for modification or recompilation. 

Advertisment

Standard UNIX file system utilities are also supported by GPFS. Thus, users can continue to use UNIX commands as they have used for ordinary file operations. 

The only new commands are those for administering the GPFS itself.

GPFS provides services to both parallel and serial applications. It allows parallel applications to simultaneously access the same or different files from any node in a GPFS node group while managing a high level control over all file system operations.

Advertisment

It is appropriate in an environment where the aggregate peak need for data exceeds the capability of a distributed file system server. It is not apt for those environments where hot backup is the main requirement or where data is readily partitioned along individual node boundaries.

There are many terms that are used in relation to a GPFS cluster. In this article we have mentioned some of them and explained what they mean.

GPFS cluster



A GPFS cluster is a collection of nodes with a shared-disk file system that can provide data access from all nodes in a cluster environment concurrently.

Advertisment

Nodeset



A GPFS nodeset is a group of nodes that run the same level of GPFS code and operate on the same file system.

Open Source Portability 



It is a set of source files that can be compiled to provide Linux kernel abstraction layer for GPFS and is used to enable communication between the Linux kernel and the GPFS kernel modules.

Network Shared Disk 



NSD is a GPFS disk subsystem that provides remote disk capability and global disk naming for GPFS shared-disk file system.

Advertisment

Failure group



This group is a set of disks that share a common point of failure that could cause all of them to become simultaneously unavailable.

Metadata



Metadata consists of i-nodes and indirect blocks that contains file size, time of last modification and addresses of the disk blocks that comprise the file data. 

It is used to locate and organize user data contained in GPFS’s striped blocks.

Advertisment

Quorum



Quorum is a simple rule to ensure the integrity of a cluster and the resources under its administration. 

Quorum is achieved if GPFS daemons are in the active state in at least half of all nodes and are able to communicate with each other.

Shailendra Malik

Advertisment