编程语言
首页 > 编程语言> > HDFS文件常用操作之java api

HDFS文件常用操作之java api

作者:互联网

环境准备

1. 本地客户机

  1. idea新建maven工程
  2. 配置maven依赖
<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.1.3</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
  1. 配置的虚拟机hostname, mac环境在/etc/host中配置

2. 虚拟机

  1. 本人集群部署NameNode在hadoop102上

文件操作基本流程

Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102"), configuration, "pitaya");
fs.close()
  1. 新建configuration对象
  2. 新建DistributedFileSystem对象(多态,其父类为FileSystem,可查看源码)
  3. 关闭文件流

HDFS文件上传

@Test
public void testCopyFromLocalFIle() throws IOException, URISyntaxException, InterruptedException {
  Configuration configuration = new Configuration();
  //统一资源标识符(Uniform Resource Identifier,URI)
  FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "pitaya");

  fs.copyFromLocalFile(new Path("/Users/eric/blogs/code/uploadFiles/test.txt"),
                       new Path("/"));
  fs.close();
}

说明

  1. pitaya 选择与文件拥有者相同的身份,否则报错(官方文档:Permissions and HDFS)

Permission denied: user=ptaya, access=WRITE, inode="/":pitaya:supergroup:drwxr-xr-x

HDFS文件下载

@Test
public void testDownload() throws IOException, URISyntaxException, InterruptedException {
    Configuration configuration = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102"), configuration, "pitaya");
    fs.copyToLocalFile(true,new Path("/test.txt"),new Path("/Users/eric/blogs/code/downloadFiles"));
    fs.close();
}

HDFS文件更名或移动

fs.rename(new Path("/test.txt"),new Path("/success.txt"));

HDFS文件或目录删除

fs.delete(new Path("/success.txt"),true);

说明

  1. 使用时发现FileSystem.get()返回的其实是DistributedFileSystem

    public class DistributedFileSystem extends FileSystem implements KeyProviderTokenIssuer {
    }
    
    //FileSystem里定义
    public abstract boolean delete(Path var1, boolean var2) throws IOException;
    

HDFS文件详情查看

RemoteIterator<LocatedFileStatus> locatedFileStatusRemoteIterator = fs.listFiles(new Path("/"), true);
while(locatedFileStatusRemoteIterator.hasNext()){
    LocatedFileStatus fileStatus = locatedFileStatusRemoteIterator.next();
    System.out.println("======="+fileStatus.getPath()+"======");
    System.out.println(fileStatus.getOwner());
    System.out.println(fileStatus.getModificationTime());
    System.out.println(fileStatus.getBlockSize());
    System.out.println(Arrays.toString(fileStatus.getBlockLocations()));

}

HDFS文件夹和文件创建

fs.mkdirs(new Path("newcreate"));
FSDataOutputStream fsDataOutputStream = fs.create(new Path("/newcreate"));

说明

  1. 创建文件夹时发现只能在当前用户目录下进行,而创建文件则没有此要求

标签:HDFS,fs,java,FileSystem,api,new,Path,configuration
来源: https://www.cnblogs.com/pitaya01/p/15571630.html