其他分享
首页 > 其他分享> > Software Defined Storage For Dummies(Chap6)

Software Defined Storage For Dummies(Chap6)

作者:互联网

#Software Defined Storage For Dummies(Chap6)
Chap6: Ten Ways to Use Software Defined Storage

##生词、短语

生词/短语 读音 释义
retention [rɪ'tenʃ(ə)n] n. 保留;扣留,滞留;记忆力;闭尿
bursty ['bɜːstɪ] n. 猝发性;丛发性;突爆性
overhead [əʊvə'hed] n. 天花板;[会计] 经常费用;间接费用;吊脚架空层
adj. 高架的;在头上的;在头顶上的
adv. 在头顶上;在空中;在高处
seamlessly ['si:mlisli] adv. 无缝地
streamline ['striːmlaɪn] n. 流线;流线型 adj. 流线型的
vt. 把…做成流线型;使现代化;组织;使合理化;使简单化
syntax ['sɪntæks] n. 语法;句法;有秩序的排列
synchronous ['sɪŋkrənəs] adj. 同步的;同时的
semantic [sɪ'mæntɪk] adj. 语义的;语义学的(等于semantical)
fine-grained [ɡreɪnd] adj. 细粒的;有细密纹理的 adj. 详细的;深入的
whereby [weə'baɪ] conj. 凭借;通过…;借以;与…一致 adv. 凭此;借以

##基本知识点

Data-intensive applications are defined by the fact that they need to read or write a large amount of data to get the job done.


For many storage products you have to use separate containers to provide better overall throughput. The problem with this approach is that you partition up your data, greatly complicating management tasks, and it is difficult to keep all of your hardware busy. For GPFS, the opposite is true. GPFS allows you to fully leverage the performance of all of the underlying storage hardware. It does this by spreading the file data over all of the available storage all the time. This means that you don’t have idle disks and more importantly not wasting money.

对于许多存储产品,您必须使用单独的容器来提供更好的整体吞吐量。这种方法的问题在于你对数据进行分区,使管理任务变得非常复杂,并且很难保持所有硬件的繁忙。对于GPFS,情况正好相反。GPFS允许您充分利用所有底层存储硬件的性能。它通过将文件数据始终分布在所有可用存储上来实现这一点。这意味着你没有空闲磁盘,更重要的是不浪费金钱。


Metadata in GPFS is distributed similar to data. There are two aspects to metadata attribute storage and data consistency

GPFS中的元数据分布类似于数据。元数据属性存储和数据一致性有两个方面。

As with other data, metadata is spread across all available storage and metadata management is distributed across
the entire cluster. Also, many metadata-intensive workloads perform much better with GPFS, leveraging its distributed
metadata and load balancing features. Applications that require dynamic load balancing need a file system that has excellent I/O performance and is very reliable. GPFS performs like a local file system, with the added advantage of flexibility, increased scalability, and the reliability of a clustered file system.

与其他数据一样,元数据分布在所有可用存储中,元数据管理分布在整个集群中。此外,利用GPFS的分布式元数据和负载平衡功能,许多元数据密集型工作负载的GPFS性能会更好。需要动态负载平衡的应用程序需要具有出色I/O性能且非常可靠的文件系统。 GPFS像本地文件系统一样执行,具有灵活性,可扩展性和集群文件系统可靠性等优势。

GPFS enables all of the servers in a cluster to access all system data equally and metadata in parallel. This improves performance for metadata-intensive applications by speeding up application I/O operations. Since GPFS allows any server in a cluster to read from or write to any of the disks, applications that perform concurrent I/O can achieve very high data access rates.

GPFS使集群中的所有服务器能够并发地访问所有系统数据和元数据。这通过加速应用程序I/O操作来提高元数据密集型应用程序的性能。由于GPFS允许群集中的任何服务器读取或写入任何磁盘,因此执行并行I/O的应用程序可以实现非常高的数据访问速率。


GPFS provides a highly scalable, low-latency, high-performance, reliable file system for large-scale storage infrastructures, with capabilities for distributed parallel data streaming and no single point of failure. It’s not uncommon to have GPFS file systems of a petabyte or more, containing hundreds of millions of files.

GPFS为大规模存储基础架构提供了高度可扩展、低延迟、高性能、可靠的文件系统,具备分布式并行数据流功能,无单点故障。拥有数PB或更多的GPFS文件系统并不罕见,其中包含数以亿计的文件。

data-driven applications

Applications that synthesize and process results generated by large scale simulations

合成和处理由大规模模拟产生的结果的应用程序
Business transactions,商务交易
Online trading and data processing,在线交易和数据处理
Long-running compute and analysis jobs,长时间运行的计算和分析作业
Applications that analyze large datasets for
Medical imagery,医疗影像
Seismic data for oil and gas exploration,石油和天然气勘探地震数据
Industrial design,工业设计
Internet collaboration,网络合作
CRM (customer relationship management),客户关系管理
Social marketing trends,社会营销趋势
Market analysis,市场分析


With an enterprise-wide GPFS environment, you can achieve cost-effective and efficient collaboration with features such as a common file system and massive global namespace across computing platforms. Users can seamlessly access data from any storage cluster server without having to first transfer the data from another location. This streamlines the collaboration process and is more cost effective and energy efficient because enterprises don’t have to purchase additional disk space to store duplicate files. By pooling storage purchases, you can build a much larger common shared data infrastructure. In addition, the data is available in a highly parallel manner, making access to massive amounts of data much faster.

通过企业范围的GPFS环境,您可以实现跨计算平台的通用文件系统和大规模全局命名空间等功能的经济高效协作。用户可以无缝访问任何存储集群服务器中的数据,而无需先从其他位置传输数据。这简化了协作流程,并且更具成本效益和能源效率,因为企业无需购买额外的磁盘空间来存储重复文件。通过集中存储购买,您可以构建更大的通用共享数据基础架构。此外,数据以高度并行的方式可获得,使访问大量数据的速度更快。

With petabytes of data and billions of files, it isn’t practical to just ask each application group to “clean up their stuff.” Plus, government regulations often require you to keep certain data around for many years. Solving this problem requires automation, and GPFS provides a set of tools designed to help you.

GPFS helps simplify storage administration through its policy-based automation framework and Information Life Cycle Management (ILM) feature set. GPFS policy-based ILM tools can be used to manage sets of files and pools of storage, and automate the management of file data. Using these tools, GPFS can automatically determine where to physically store user data, regardless of its placement in the logical directory structure. Storage pools, Fileset, and user-defined policies provide the ability to match the cost of storage resources to the value of your data.

GPFS通过其基于策略的自动化框架和信息生命周期管理(ILM)功能集帮助简化存储管理。基于GPFS策略的ILM工具可用于管理文件和存储池集合,并自动管理文件数据。使用这些工具,GPFS可以自动确定物理存储用户数据的位置,而不管它在逻辑目录结构中的位置。存储池,文件集和用户定义的策略提供了将存储资源成本与数据价值相匹配的功能。

Storage pools

Storage pools allow you to create groups of disks within a file system. You can create tiers of storage by grouping your disks based on performance, locality, or reliability characteristics. For example, one pool could be solid state disks (SSDs) and another could be more economical SATA (Serial ATA) storage.

存储池允许您在文件系统内创建磁盘组。您可以根据性能、位置或可靠性特征对磁盘进行分组,从而创建存储层。例如,一个池可能是固态硬盘(SSD),另一个池可能更经济的SATA(串行ATA)存储。

Fileset

A fileset is a sub-tree of the file system namespace and provides a way to partition the namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas, take snapshots, and be specified in a policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of rules in a user-defined policy.

文件集是文件系统名称空间的子树,并提供了将名称空间分区为更小,更易管理的单元的方法。文件集提供了一个管理边界,可用于设置配额,拍摄快照并在策略中指定,以控制初始数据放置或数据迁移。单个文件集中的数据可以驻留在一个或多个存储池中。文件数据驻留的位置以及如何进行迁移是基于用户定义策略中的一组规则。


user-defined policies: File placement & File management

File placement

When a file is created, GPFS needs to know where to put it. This is done by using file placement policies that direct file data as files are created to the appropriate storage pool. You can create file placement policies based on anything GPFS knows about a file when it is created, including filename and the user who is creating the file.

当文件被创建时,GPFS需要知道把它放在哪里。这是通过使用文件放置策略完成的,该文件放置策略将文件数据作为文件创建到适当的存储池。您可以根据GPFS在创建文件时知道的任何内容创建文件放置策略,包括文件名和正在创建文件的用户。

File management

After a file has been created, GPFS knows much more about it. In addition to the attributes available when a file is created, GPFS now knows additional information including the size of the file, how long it’s been since someone accessed the file, and whether or not it’s been changed. Policies that operate on existing file are called file management policies and allow you to move, replicate, or delete files. You can use file management policies to move data from one pool to another without changing the file location in the directory structure. One popular use for file management policies doesn’t involve moving data at all — you can use it for reporting. The policy syntax is very powerful, allowing you to generate custom reports, for example, on the type of files using the most space. On Linux and AIX, you can use similar tools to get this information, but the policy engine is very fast — it can look at the metadata of millions of files per second.

文件创建完成后,GPFS知道更多。除了创建文件时可用的属性之外,GPFS现在还知道其他信息,包括文件的大小,从某人访问文件以来是多久,以及它是否已更改。对现有文件进行操作的策略称为文件管理策略,并允许您移动,复制或删除文件。您可以使用文件管理策略将数据从一个池移到另一个池,而无需更改目录结构中的文件位置。文件管理策略的一个常用用途不涉及移动数据 - 您可以将其用于报告。策略语法非常强大,允许您生成自定义报告,例如,使用最多空间的文件类型。在Linux和AIX上,您可以使用类似工具来获取此信息,但策略引擎速度非常快 - 它可以每秒查看数百万个文件的元数据。


ILM tools need to have rich features, be automated, and be capable of operating on very large data sets to be useful when you are storing petabytes of data. The GPFS ILM toolset is well suited to this environment and is capable of managing billions of files.

The core GPFS software is fault tolerant. If a server or even a storage system fails, the other servers can continue to access the data. It does this by continuously monitoring the health of the cluster and file system components. When a failure is detected, the appropriate recovery action is taken automatically. Extensive logging and recovery capabilities maintain metadata consistency when application servers holding locks or performing cluster services fail.

核心GPFS软件具有容错能力。如果服务器或存储系统发生故障,其他服务器可以继续访问数据。它通过持续监视群集和文件系统组件的健康状况来实现这一点。当检测到故障时,将自动采取适当的恢复操作。当持有锁或执行群集服务的应用程序服务器失败时,广泛的日志记录和恢复功能可保持元数据一致。

MapReduce

MapReduce is a programming and data organization model for processing large data sets in parallel on a distributed computational cluster.

HDFS和GPFS的比较

HDFS and GPFS both provide the basic storage tools needed for MapReduce workloads, but that is where the similarities end. HDFS is a basic storage solution for Map Reduce, whereas GPFS is an enterprise storage software solution that supports MapReduce.

Some limitations of HDFS include
Centralized master-slave architecture
No file locking
File data stripped into uniformly sized blocks that are distributed across cluster servers
Block-level information exposed to applications
Simple coherency with a write-once, read-many model that restricts what users can do with data,(一次写入,多次读取模型的简单一致性限制了用户可以对数据执行的操)

GPFS features include
High-performance, shared-disk cluster architecture with POSIX semantics
Distributed metadata, space allocation, and lock management
File data blocks striped across multiple servers and disks
Block-level information not exposed to applications
Ability to open, read, and append to any section of a file

Cloud storage is shared across different classes of applications so standard file semantics are important. The standard interface needs to support new workloads including MapReduce style applications so you don’t have to use separate point solutions for these workloads. For mixed workload environments which require access to large amounts of data in a cloud environment, GPFS can help leverage standard components, along with its POSIX-complaint interfaces and support for the latest technologies, such as InfiniBand (IB) Remote Memory Data Access (RMDA).

云存储在不同类别的应用程序之间共享,因此标准文件语义很重要。标准接口需要支持包括MapReduce样式的应用程序在内的新工作负载,因此您不必为这些工作负载使用单独的点解决方案。对于需要在云环境中访问大量数据的混合工作负载环境,GPFS可以帮助使用标准组件以及POSIX投诉接口,并支持最新技术,例如InfiniBand(IB)远程内存数据访问(RMDA))。

The IBM Smart Analytics System is a pre-integrated analytics system designed to deploy quickly and deliver fast time-to-value. Engineered for the rapid deployment of a business-ready solution, the IBM Smart Analytics System includes the following features:
Powerful data warehouse foundation
Extensive analytic capabilities
Fully integrated, scalable environment

GPFS is a core component of IBM’s Smart Analytics offering providing a high availability data storage platform.

标签:文件,数据,Defined,storage,GPFS,Chap6,file,data,Dummies
来源: https://blog.csdn.net/u014454538/article/details/80268058