chap3:Digging Deeper into IBM GPFS


生词/短语 读音 释义
full-featured adj. 全功能的;功能全面的
demanding /dɪ'mɑːndɪŋ/ adj. 苛求的;要求高的;吃力的
gene sequencing /'siːkwənsɪŋ/ 基因测序
retail /'riːteɪl/ n. 零售 adj. 零售的 vt. 零售;转述 adv. 以零售方式
biotechnology /,baɪə(ʊ)tek'nɒlədʒɪ/ n. [生物] 生物技术;[生物] 生物工艺学
quotas [k'wəʊtəz] n. 配额(quota的复数);[经管] 定额,[经] 限额
POSIX Portable Operating System Interface,n. 可移植性操作系统接口
prohibitive /prə(ʊ)'hɪbɪtɪv/ adj. 禁止的,禁止性的;抑制的;(费用,价格等)过高的;类同禁止的
viable /'vaiəbl/ adj. 可行的;能养活的;能生育的
empower /ɪm'paʊə; em-/ vt. 授权,允许;使能够


Today it’s a full-featured set of file management tools, including advanced storage virtualization, integrated high availability, automated tiered storage management, and performance to effectively manage very large quantities of file data.


POSIX is an IEEE (Institute of Electrical and Electronics Engineers) family of standards for maintaining compatibility between different variations of UNIX and other operating systems.


GPFS accelerates time to results and maximizes utilization by providing parallel access to data. GPFS provides extreme performance and eliminates storage bottlenecks, by providing parallel access to data.

Striping data across multiple disks attached to multiple servers,在连接到多个服务器的多个磁盘上分割数据
Providing efficient client side caching,提供高效的客户端缓存
Executing high-performance metadata (inode) scans,执行高性能元数据(inode)扫描
Supporting a wide range of file system block sizes to match I/O requirements ,支持各种文件系统块大小以匹配I / O需求
Utilizing advanced algorithms that improve I/O operations,利用改进I/O操作的高级算法
Using block-level locking based on a very sophisticated token management system to provide data consistency, while allowing multiple application servers concurrent access to the files

When many servers need to use the same set of files at the same time, the file system needs to ensure that all the files are protected, so one server can’t change a file without the other servers knowing about the change. Keeping thousands of servers “in the loop” on file status is difficult and scaling up is even harder.


GPFS provides file integrity protection through a token process that keeps file data consistent by always ensuring there is only one owner for any given file.


There are two parts to managing tokens and file consistency: handing out the tokens and keeping file metadata up to date.

token manager: The server(s) that initially have the token for all files that are not in use is called the token manager. 最初拥有所有未使用文件令牌的服务器称为令牌管理器。

Multiple token managers help each other out by sharing the workload and by taking over when a fellow token manager fails.


When a file is opened, the token manager hands off the token for that file to the server that’s opening the file.


The server using the file is now responsible for all metadata changes to that file. If a server wants to open a file that is already open on another server, the token manager redirects the request to the server that already has the file open and lets the two servers work out the details among themselves.


This sharing of metadata maintenance across the entire cluster is what makes GPFS scale very effectively.


The global namespace is easy to administer and can be scaled quickly, as desired, by simply adding more scale-out resources — eliminating “filer-sprawl” and its associated issues.

A single GPFS command can perform a file system function across the entire cluster, and most can be issued from any server in the cluster. Optionally, you can designate a group of administration servers that can be used to perform all cluster administration tasks, or only authorize a single login session to perform admin commands cluster-wide. This allows for higher security by reducing the scope of server-to-server administrative access.


You can use snapshots to protect data from human error.

A snapshot is used to preserve the file system’s contents at a single point in time. It contains a copy of only the file system data that has changed since the last snapshot was created and keeps that data in the same pool as the original file, which keeps space usage at a minimum.



Snapshots provide an online backup capability that allows you (or an end user) to easily recover from an accidental file deletion, or the ability to compare a file to an older version.

Clustered NFS

To better enable end user access to a GPFS file system, the file system can be exported to clients outside the cluster through NFS (Network File System), including the capability of exporting the same data from multiple servers. This GPFS feature is called Clustered NFS (cNFS). Clustered NFS allows you to provide scalable file service with simultaneous access to a common set of data from multiple servers. The cNFS feature includes failover capability, so if a NFS server fails, the clients connected to that server automatically connect to another server in the cluster.

NFS is a Network file system protocol that enables access to storage by using a standard protocol over a TCP/IP network. NFS protocol access is commonly provided by a networkattached storage (NAS) appliance or similar device. Samba enables file and print services for Microsoft Windows clients from UNIX and Linux based servers.

   1. Shared disk

A shared disk cluster is the most basic environment. In this configuration, the storage is directly attached to all servers in the cluster. Application data flows over the SAN, and control information flows among the GPFS servers in the cluster over a TCP/IP network.



This configuration is best for small clusters (1 to 50 servers) when all servers in the cluster need the highest performance access to the data. For example, this configuration is good for high-speed data access for digital media applications or a storage infrastructure for data analytics.

   2. Network Shared Disk (NSD) protocol

GPFS uses a network to transfer control information and data to NSD clients. The network doesn’t need to be dedicated to GPFS, but it should provide sufficient bandwidth to meet your GPFS and other applications sharing the bandwidth.

In a NSD server configuration, a subset of the total server population is defined as NSD servers. The NSD servers are responsible for the abstraction of disk data blocks across an IP-based network. The fact that I/O is remote is transparent to the application. Figure 3-3 shows an example of a configuration where a set of compute servers are connected to a set of NSD servers via a high-speed interconnect or an IP-based network (such as Ethernet). In this example, data to the NSD servers flows over the SAN, and data and control information flows to the clients across the LAN.



An NSD server architecture is well suited to clusters with sufficient network bandwidth between the I/O servers and the clients. For example, statistical applications like financial fraud detection, supply chain management, or data mining.

   3. Empowering global collaboration

GPFS provides low latency access to data from anywhere in the world with Active File Management (AFM) distributed disk caching technology. AFM expands the GPFS global namespace across geographical distances, providing fast read and write performance with automated namespace management from anywhere in the world. As data is written or modified at one location, all other locations get the same data with minimal delays. These game-changing capabilities accelerate project schedules and improve productivity for globally distributed teams.


