wordcount案例
作者:互联网
Hadoop与linux的交互
hadoop是安装在linux上的集群,所以二者之间需要交互。Linux命令是操作Linux的文件系统的,而hadoop有自己的文件系统hdfs,所以我们不能直接用Linux命令来操作Hadoop上的文件。此时就需要交互语言
hadoop上的命令基本同Linux,只是需要在前面加hadoop
hadoop的根目录 / 指的是:hdfs://user/机器名:端口/
- 如下可查看hadoop命令
[root@Hadoop01 /] hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
s3guard manage data on S3
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
- 如下可查看hadoop fs命令
[root@Hadoop01 /] hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
wordcount经典案例
wordcount是hadoop上的经典案例,本次我们用这个案例来测试Hadoop的存储、计算、调度过程
- 新建本地文件wordcount.txt
[byy@Hadoop01 data]$ vim wordcount.txt
hello
word
bai
xue
bai
xue
1
1 2
1 22 3
1 22 3
~
"test.txt" 10L, 47C
- 在hdfs上新建文件夹存放文件
[byy@Hadoop01 data]$ hadoop fs -mkdir -p /data/wordcount/input
[byy@Hadoop01 data]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - byy supergroup 0 2021-02-06 17:49 /data
[byy@Hadoop01 data]$ hadoop fs -ls /data
Found 1 items
drwxr-xr-x - byy supergroup 0 2021-02-06 17:49 /data/wordcount
[byy@Hadoop01 data]$ hadoop fs -ls /data/wordcount/
Found 1 items
drwxr-xr-x - byy supergroup 0 2021-02-06 17:49 /data/wordcount/input
- 上传wordcount.txt到hdfs上
[byy@Hadoop01 data]$ hadoop fs -put wordcount.txt /data/wordcount/input/
[byy@Hadoop01 data]$ hadoop fs -ls /data/wordcount/input
Found 1 items
-rw-r--r-- 1 byy supergroup 47 2021-02-06 18:56 /data/wordcount/input/wordcount.txt
[byy@Hadoop01 data]$ hadoop fs -cat /data/wordcount/input/wordcount.txt
hello
word
bai
xue
bai
xue
1
1 2
1 22 3
1 22 3
- 查找mapreduce作业的jar包
[root@Hadoop01 /] cd /opt/app/hadoop
[root@Hadoop01 hadoop] find -name '*example*.jar'
./share/hadoop/mapreduce1/hadoop-examples-2.6.0-mr1-cdh5.16.2.jar
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-test-sources.jar
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-sources.jar
#这个是我们需要的jar包
./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
- 运行jar包进行mapreduce作业,对wordcount.txt进行计算
此时可以打开yarn界面,就能看到任务进度了
[byy@Hadoop01 ~]$ hadoop jar /opt/app/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /data/wordcount/input /data/wordcount/output
#查看计算输出的文件(会在output路径下自动生成文件)
[byy@Hadoop01 ~]$ hadoop fs -ls /data/wordcount/output
Found 2 items
-rw-r--r-- 1 root supergroup 0 2021-02-06 19:30 /data/wordcount/output/_SUCCESS #任务成功或失败的文件
-rw-r--r-- 1 root supergroup 44 2021-02-06 19:30 /data/wordcount/output/part-r-00000 #结果文件
[byy@Hadoop01 ~]$ hadoop fs -cat /data/wordcount/output/part-r-00000
1 4
2 1
22 2
3 2
bai 2
hello 1
word 1
xue 2
标签:...,byy,Hadoop01,wordcount,hadoop,案例,data 来源: https://blog.csdn.net/weixin_45052608/article/details/113729176