其他分享
首页 > 其他分享> > Hive的导入导出方式汇总

Hive的导入导出方式汇总

作者:互联网

导入

  1. 本地文件导入到Hive表CREATE TABLE testA (  
        id INT,  
        name string,  
        area string  
    ) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;  
    
    CREATE TABLE testB (  
        id INT,  
        name string,  
        area string,  
        code string  
    ) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; 
    
    // 数据文件(sourceA.txt):
    
    1,fish1,SZ  
    2,fish2,SH  
    3,fish3,HZ  
    4,fish4,QD  
    5,fish5,SR 
    
    // 数据文件(sourceB.txt):
    1,zy1,SZ,1001  
    2,zy2,SH,1002  
    3,zy3,HZ,1003  
    4,zy4,QD,1004  
    5,zy5,SR,1005 
    
    // 方式一:load进数据
    hive> load data local inpath '/tmp/20210504/sourceA.txt' into table testA partition(create_time='2021-05-04');
    Loading data to table one.testa partition (create_time=2021-05-04)
    OK
    Time taken: 1.552 seconds
    
    
    hive> load data local inpath '/tmp/20210504/sourceB.txt' into table testB partition(create_time='2021-05-04');
    Loading data to table one.testb partition (create_time=2021-05-04)
    OK
    Time taken: 0.789 seconds
    
     hive> select * from testA;
    OK
    1    fish1    SZ    2021-05-04
    2    fish2    SH    2021-05-04
    3    fish3    HZ    2021-05-04
    4    fish4    QD    2021-05-04
    5    fish5    SR    2021-05-04
    Time taken: 2.227 seconds, Fetched: 5 row(s)
    hive> select * from testB;
    OK
    1    zy1    SZ    1001    2021-05-04
    2    zy2    SH    1002    2021-05-04
    3    zy3    HZ    1003    2021-05-04
    4    zy4    QD    1004    2021-05-04
    5    zy5    SR    1005    2021-05-04
    Time taken: 0.175 seconds, Fetched: 5 row(s)
    
    
    // 方式二 insert
    
    
    
    删除表数据
    hive> truncate table testA;
    OK
    Time taken: 0.326 seconds
    
    删除分区:
    hive> alter table testB drop partition(create_time = '2021-05-04');
    Dropped the partition create_time=2021-05-04
    OK
    Time taken: 0.27 seconds
    
    hive不能删除行
    如果想删除部分数据,就创建临时表 ,把留下的数据查出来 插入进去

     

     

     

  2. Hive表导入到Hive表
    
    
    insert into table testA partition(create_time='2021-05-06') select id, name, area  from testB where id = 1;
    有MR
    Time taken: 173.64 seconds
    
    hive> SELECT * FROM testA;
    OK
    1    fish1    SZ    2021-05-04
    2    fish2    SH    2021-05-04
    3    fish3    HZ    2021-05-04
    4    fish4    QD    2021-05-04
    5    fish5    SR    2021-05-04
    1    zy1    SZ    2021-05-06
    1    zy1    SZ    2021-05-06
    Time taken: 0.246 seconds, Fetched: 7 row(s)
    
    
    INSERT INTO TABLE testA PARTITION(create_time) select id, name, area, code from testB where id = 2;
    hive> select * from testA;
    OK
    2    zy2    SH    1002
    2    zy2    SH    1002
    1    fish1    SZ    2021-05-04
    2    fish2    SH    2021-05-04
    3    fish3    HZ    2021-05-04
    4    fish4    QD    2021-05-04
    5    fish5    SR    2021-05-04
    1    zy1    SZ    2021-05-06
    1    zy1    SZ    2021-05-06
    Time taken: 0.212 seconds, Fetched: 9 row(s)
    
    
    1,将testB中id=1的行,导入到testA,分区为2021-05-06
    
    2,将testB中id=2的行,导入到testA,分区create_time为id=2行的code值。
    
    
    
    
    

     

  3. HDFS文件导入到Hive表
    HDFS文件导入到Hive表
    hdfs dfs -put /tmp/20210504/sourceA.txt /xiaoke003/sourceA
    hive> load data inpath '/xiaoke002/sourceA' into table testA partition(create_time= '2021-05-04');
    Loading data to table one.testa partition (create_time=2021-05-04)
    OK
    Time taken: 0.54 seconds
    hive> SELECT * FROM testA;
    OK
    1    fish1    SZ    2021-05-04
    2    fish2    SH    2021-05-04
    3    fish3    HZ    2021-05-04
    4    fish4    QD    2021-05-04
    5    fish5    SR    2021-05-04
    Time taken: 0.122 seconds, Fetched: 5 row(s)

     

  4. 创建表的过程中从其他表导入
    create table testC as select name, code from testB;

     

  5. 通过sqoop将mysql库导入到Hive表

 

导出

  1. Hive表导出到本地文件系统
    insert overwrite local directory '/tmp/20210504/output/one' row format delimited fields terminated by ',' select * from testA;
    
    通过INSERT OVERWRITE LOCAL DIRECTORY将hive表testA数据导入到/tmp/20210504/output/one/000000_0,众所周知,HQL会启动Mapreduce完成,
    /tmp/20210504/output/one就是Mapreduce输出路径,产生的结果存放在文件名为:000000_0。

     

  2. Hive表导出到HDFS
    insert overwrite directory '/tmp/xiaoke002/one' row format delimited fields terminated by ',' select * from testA;
    
    
    [root@ke03 output]# hdfs dfs -cat /xiaoke002/one/000000_0
    1fish1SZ2021-05-04
    2fish2SH2021-05-04
    3fish3HZ2021-05-04
    4fish4QD2021-05-04
    5fish5SR2021-05-04

     

  3. 使用-e -f 参数来导出数据
    -e 的使用方式,后面接SQL语句。>>后面为输出文件路径
    hive -e "use one;select * from testA" >> ./two
    
    
    
    -f 的使用方式,后面接存放sql语句的文件。>>后面为输出文件路径
    [root@ke03 output]# cat sql.sql 
    use one; select * from testA;
    
    hive -f sql.sql  >> three.txt
    
    
    
    [root@ke03 output]# cat two 
    1    fish1    SZ    2021-05-04
    2    fish2    SH    2021-05-04
    3    fish3    HZ    2021-05-04
    4    fish4    QD    2021-05-04
    5    fish5    SR    2021-05-04
    
    [root@ke03 output]# cat three.txt 
    1    fish1    SZ    2021-05-04
    2    fish2    SH    2021-05-04
    3    fish3    HZ    2021-05-04
    4    fish4    QD    2021-05-04
    5    fish5    SR    2021-05-04

     

  4. 从hdfs中直接导出
    hdfs dfs -get /xiaoke002/one/000000_0 ./four
    
    vi four
    1^Afish1^ASZ^A2021-05-04
    2^Afish2^ASH^A2021-05-04
    3^Afish3^AHZ^A2021-05-04
    4^Afish4^AQD^A2021-05-04
    5^Afish5^ASR^A2021-05-04

     

  5. 通过sqoop将Hive表导出到mysql库

标签:04,05,create,汇总,hive,Hive,testA,导入,2021
来源: https://www.cnblogs.com/bigdata-familyMeals/p/14730226.html