其他分享
首页 > 其他分享> > ETL工具 -- Sqoop

ETL工具 -- Sqoop

作者:互联网

1. 概述

2. Sqoop的工作机制

3. Sqoop1与Sqoop2架构对比

​ 版本号为1.4.x为sqoop1
​ 在架构上:sqoop1使用sqoop客户端直接提交的方式
​ 访问方式:CLI控制台方式进行访问
​ 安全性:命令或脚本中指定用户数据库名及密码

​ 版本号为1.99x为sqoop2
​ 在架构上:sqoop2引入了sqoop server,对connector实现了集中的管理
​ 访问方式:REST API、 JAVA API、 WEB UI以及CLI控制台方式进行访问

Sqoop安装部署

 

第一步:下载安装包

https://mirrors.bfsu.edu.cn/apache/sqoop/1.4.7

第二步:上传并解压

cd /bigdata/soft/
tar -xzvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /bigdata/install

第三步:修改配置文件

cd /bigdata/install/sqoop-1.4.7.bin__hadoop-2.6.0/conf/
mv sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/bigdata/install/hadoop-3.1.4

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/bigdata/install/hadoop-3.1.4

#set the path to where bin/hbase is available
export HBASE_HOME=/bigdata/install/hbase-2.2.6

#Set the path to where bin/hive is available
export HIVE_HOME=/bigdata/install/hive-3.1.2

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/bigdata/install/zookeeper-3.6.2

第四步:添加两个必要的jar包

cd /bigdata/soft
cp java-json.jar mysql-connector-java-5.1.38.jar /bigdata/install/sqoop-1.4.7/lib/

第五步:配置sqoop的环境变量

sudo vim /etc/profile	

# 添加如下内容
export SQOOP_HOME=/bigdata/install/sqoop-1.4.7
export PATH=:$SQOOP_HOME/bin:$PATH
source /etc/profile
1. sqoop help有warning日志
[hadoop@hadoop03 bin]$ pwd
/bigdata/install/sqoop-1.4.7/bin
# 搜索HCAT_HOME,将下图红框内容注释掉
[hadoop@hadoop03 bin]$ vim configure-sqoop

2. sqoop help有错误
1. 解决方案一:
2. 解决方案二
[hadoop@hadoop01 bin]$ cd /bigdata/install/hbase-2.2.6/bin/
[hadoop@hadoop01 bin]$ vim hbase
  temporary_cp=
  for f in "${HBASE_HOME}"/lib/hbase-server*.jar; do
    if [[ ! "${f}" =~ ^.*\-tests\.jar$ ]]; then
      temporary_cp=":$f"
    fi
  done
  HADOOP_JAVA_LIBRARY_PATH=$(HADOOP_CLASSPATH="$CLASSPATH${temporary_cp}" "${    HADOOP_IN_PATH}" \

[hadoop@hadoop01 bin]$ pwd
/bigdata/install/hbase-2.2.6/bin
[hadoop@hadoop01  bin]$ scp hbase hadoop02:$PWD
[hadoop@hadoop01  bin]$ scp hbase hadoop03:$PWD
RUBY 复制 全屏

Sqoop的数据导入

 

1. 列举出所有的数据库

sqoop help
sqoop list-databases --connect jdbc:mysql://hadoop02:3306/ --username root --password 123456
sqoop list-tables --connect jdbc:mysql://hadoop02:3306/mysql --username root --password 123456

2. 准备表数据

CREATE DATABASE /*!32312 IF NOT EXISTS*/`userdb` /*!40100 DEFAULT CHARACTER SET utf8 */;
USE `userdb`;

DROP TABLE IF EXISTS `emp`;

CREATE TABLE `emp` (
  `id` INT(11) DEFAULT NULL,
  `name` VARCHAR(100) DEFAULT NULL,
  `deg` VARCHAR(100) DEFAULT NULL,
  `salary` INT(11) DEFAULT NULL,
  `dept` VARCHAR(10) DEFAULT NULL,
  `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `is_delete` BIGINT(20) DEFAULT '1'
) ENGINE=INNODB DEFAULT CHARSET=latin1;

INSERT  INTO `emp`(`id`,`name`,`deg`,`salary`,`dept`) VALUES (1201,'gopal','manager',50000,'TP'),(1202,'manisha','Proof reader',50000,'TP'),(1203,'khalil','php dev',30000,'AC'),(1204,'prasanth','php dev',30000,'AC'),(1205,'kranthi','admin',20000,'TP');

DROP TABLE IF EXISTS `emp_add`;

CREATE TABLE `emp_add` (
  `id` INT(11) DEFAULT NULL,
  `hno` VARCHAR(100) DEFAULT NULL,
  `street` VARCHAR(100) DEFAULT NULL,
  `city` VARCHAR(100) DEFAULT NULL,
  `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `is_delete` BIGINT(20) DEFAULT '1'
) ENGINE=INNODB DEFAULT CHARSET=latin1;

INSERT  INTO `emp_add`(`id`,`hno`,`street`,`city`) VALUES (1201,'288A','vgiri','jublee'),(1202,'108I','aoc','sec-bad'),(1203,'144Z','pgutta','hyd'),(1204,'78B','old city','sec-bad'),(1205,'720X','hitec','sec-bad');

DROP TABLE IF EXISTS `emp_conn`;
CREATE TABLE `emp_conn` (
  `id` INT(100) DEFAULT NULL,
  `phno` VARCHAR(100) DEFAULT NULL,
  `email` VARCHAR(100) DEFAULT NULL,
  `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `is_delete` BIGINT(20) DEFAULT '1'
) ENGINE=INNODB DEFAULT CHARSET=latin1;

INSERT  INTO `emp_conn`(`id`,`phno`,`email`) VALUES (1201,'2356742','gopal@tp.com'),(1202,'1661663','manisha@tp.com'),(1203,'8887776','khalil@ac.com'),(1204,'9988774','prasanth@ac.com'),(1205,'1231231','kranthi@tp.com');

3. 导入数据库表数据到HDFS

sqoop import --connect jdbc:mysql://hadoop02:3306/userdb --password 123456 --username root --table emp -m 1
hdfs dfs -ls /user/hadoop/emp

sqoop import --connect jdbc:mysql://hadoop02:3306/userdb --password 123456 --username root --table emp -m 4 --split-by id

4. 导入到HDFS指定目录

sqoop import --connect jdbc:mysql://hadoop02:3306/userdb --username root --password 123456 --delete-target-dir --table emp --target-dir /sqoop/emp -m 1
hdfs dfs -text /sqoop/emp/part-m-00000

1201,gopal,manager,50000,TP
1202,manisha,Proof reader,50000,TP
1203,khalil,php dev,30000,AC
1204,prasanth,php dev,30000,AC
1205,kranthi,admin,20000,TP

5. 导入到hdfs指定目录并指定字段之间的分隔符

sqoop import --connect jdbc:mysql://hadoop02:3306/userdb --username root --password 123456 --delete-target-dir --table emp --target-dir /sqoop/emp2 -m 1 --fields-terminated-by '\t'
hdfs dfs -text /sqoop/emp2/part-m-00000

 

Sqoop的数据导出

 

将数据从HDFS把文件导出到RDBMS数据库

1201,gopal,manager,50000,TP,2018-06-17 18:54:32.0,2018-06-17 18:54:32.0,1
1202,manisha,Proof reader,50000,TP,2018-06-15 18:54:32.0,2018-06-17 20:26:08.0,1
1203,khalil,php dev,30000,AC,2018-06-17 18:54:32.0,2018-06-17 18:54:32.0,1
1204,prasanth,php dev,30000,AC,2018-06-17 18:54:32.0,2018-06-17 21:05:52.0,0
1205,kranthi,admin,20000,TP,2018-06-17 18:54:32.0,2018-06-17 18:54:32.0,1
第一步:创建mysql表
use userdb;

CREATE TABLE `emp_out` (
 `id` INT(11) DEFAULT NULL,
 `name` VARCHAR(100) DEFAULT NULL,
 `deg` VARCHAR(100) DEFAULT NULL,
 `salary` INT(11) DEFAULT NULL,
 `dept` VARCHAR(10) DEFAULT NULL,
 `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `is_delete` BIGINT(20) DEFAULT '1' 
) ENGINE=INNODB DEFAULT CHARSET=utf8;
第二步:执行导出命令

通过kkb来实现数据的导出,将hdfs的数据导出到mysql当中去

sqoop export \
--connect jdbc:mysql://hadoop02:3306/userdb \
--username root --password 123456 \
--table emp_out \
--export-dir /sqoop/emp \
--input-fields-terminated-by ","
第三步:验证mysql表数据

   

标签:sqoop,--,TIMESTAMP,Sqoop,DEFAULT,emp,NULL,ETL
来源: https://www.cnblogs.com/hanease/p/16218389.html