其他分享
首页 > 其他分享> > wordcount案例

wordcount案例

作者:互联网

Hadoop与linux的交互

hadoop是安装在linux上的集群,所以二者之间需要交互。Linux命令是操作Linux的文件系统的,而hadoop有自己的文件系统hdfs,所以我们不能直接用Linux命令来操作Hadoop上的文件。此时就需要交互语言
hadoop上的命令基本同Linux,只是需要在前面加hadoop
hadoop的根目录 / 指的是:hdfs://user/机器名:端口/

[root@Hadoop01 /] hadoop
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  s3guard              manage data on S3
  trace                view and modify Hadoop tracing settings
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.
[root@Hadoop01 /] hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-usage [cmd ...]]

wordcount经典案例

wordcount是hadoop上的经典案例,本次我们用这个案例来测试Hadoop的存储、计算、调度过程

[byy@Hadoop01 data]$ vim wordcount.txt
hello
word
bai
xue
bai
xue
1
1 2
1 22 3
1 22 3
~                                                                                                                                                                                                                                                                                                 
"test.txt" 10L, 47C  
[byy@Hadoop01 data]$ hadoop fs -mkdir -p /data/wordcount/input
[byy@Hadoop01 data]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - byy supergroup          0 2021-02-06 17:49 /data
[byy@Hadoop01 data]$ hadoop fs -ls /data
Found 1 items
drwxr-xr-x   - byy supergroup          0 2021-02-06 17:49 /data/wordcount
[byy@Hadoop01 data]$ hadoop fs -ls /data/wordcount/
Found 1 items
drwxr-xr-x   - byy supergroup          0 2021-02-06 17:49 /data/wordcount/input
[byy@Hadoop01 data]$ hadoop fs -put wordcount.txt /data/wordcount/input/
[byy@Hadoop01 data]$ hadoop fs -ls /data/wordcount/input
Found 1 items
-rw-r--r--   1 byy supergroup         47 2021-02-06 18:56 /data/wordcount/input/wordcount.txt
[byy@Hadoop01 data]$ hadoop fs -cat /data/wordcount/input/wordcount.txt
hello
word
bai
xue
bai
xue
1
1 2
1 22 3
1 22 3
[root@Hadoop01 /] cd /opt/app/hadoop
[root@Hadoop01 hadoop] find -name '*example*.jar'
./share/hadoop/mapreduce1/hadoop-examples-2.6.0-mr1-cdh5.16.2.jar
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-test-sources.jar
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-sources.jar
#这个是我们需要的jar包
./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
[byy@Hadoop01 ~]$ hadoop jar /opt/app/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /data/wordcount/input /data/wordcount/output
#查看计算输出的文件(会在output路径下自动生成文件)
[byy@Hadoop01 ~]$ hadoop fs -ls /data/wordcount/output
Found 2 items
-rw-r--r--   1 root supergroup          0 2021-02-06 19:30 /data/wordcount/output/_SUCCESS #任务成功或失败的文件
-rw-r--r--   1 root supergroup         44 2021-02-06 19:30 /data/wordcount/output/part-r-00000 #结果文件
[byy@Hadoop01 ~]$ hadoop fs -cat /data/wordcount/output/part-r-00000
1	4
2	1
22	2
3	2
bai	2
hello	1
word	1
xue	2

标签:...,byy,Hadoop01,wordcount,hadoop,案例,data
来源: https://blog.csdn.net/weixin_45052608/article/details/113729176