其他分享
首页 > 其他分享> > Hadoop基础-03-HDFS基本概念

Hadoop基础-03-HDFS基本概念

作者:互联网

目录

HDFS概述(Hadoop Distributed File System)

HDFS架构详解

官方文档

举例
一个a.txt 共有150M 一个blocksize为128M
则会拆分两个block 一个是block1: 128M ; 另个block2: 22M

那么问题来了, block1 和block2 要存放在哪个DN里面?
这个 对于用户是透明的 , 这个就要用 HDFS来完成

HdfsDesign

文件系统Namespace

HDFS副本机制

Datanodes

Linux环境介绍

(base) JackSundeMBP:~ jacksun$ ssh hadoop@192.168.68.200

[hadoop@hadoop000 ~]$ pwd
/home/hadoop


[hadoop@hadoop000 ~]$ ls
app   Desktop    Downloads  maven_resp  Pictures  README.txt  software   t.txt
data  Documents  lib        Music       Public    shell       Templates  Videos

文件名 用途
software 软件安装包
app 软件安装目录
data 数据
lib jar包
shell 脚本
maven_resp maven依赖包

[hadoop@hadoop000 ~]$ sudo vi /etc/hosts

192.168.68.200 hadoop000

Hadoop部署

JDK1.8部署详解

PATH=$PATH:$HOME/.local/bin:$HOME/bin
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_91
export PATH=$JAVA_HOME/bin:$PATH

修改生效source .bash_profile

java -version

java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

打印出上述则安装成功

所有软件安装包
https://download.csdn.net/download/jankin6/12668545

ssh无密码登陆部署详解

-rw------- 1 hadoop hadoop  796 8月  16 06:17 authorized_keys
-rw------- 1 hadoop hadoop 1675 8月  16 06:14 id_rsa
-rw-r--r-- 1 hadoop hadoop  398 8月  16 06:14 id_rsa.pub
-rw-r--r-- 1 hadoop hadoop 1230 8月  16 18:05 known_hosts

id_rsa 私钥
id_rsa.pub 公钥

[hadoop@hadoop000 ~]$ ssh localhost 
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:LZvkeJHnqH0AtihqFB2AcQJKwMpH1/DorPi0bIEKcQM.
ECDSA key fingerprint is MD5:9f:b5:f3:bd:f2:aa:61:97:8b:8a:e2:a3:98:5a:e4:3d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Sun Aug 16 18:03:23 2020 from 192.168.1.3
[hadoop@hadoop000 ~]$ ls
app              Desktop    lib         Pictures    shell      t.txt
authorized_keys  Documents  maven_resp  Public      software   Videos
data             Downloads  Music       README.txt  Templates
[hadoop@hadoop000 ~]$ ssh localhost 
Last login: Sun Aug 16 18:05:21 2020 from 127.0.0.1

Hadoop安装目录详解及hadoop-env配置

配置JAVA_HOME

[hadoop@hadoop000 hadoop]$ ls
capacity-scheduler.xml      httpfs-env.sh            mapred-env.sh
configuration.xsl           httpfs-log4j.properties  mapred-queues.xml.template
container-executor.cfg      httpfs-signature.secret  mapred-site.xml
core-site.xml               httpfs-site.xml          mapred-site.xml.template
hadoop-env.cmd              kms-acls.xml             slaves
hadoop-env.sh               kms-env.sh               ssl-client.xml.example
hadoop-metrics2.properties  kms-log4j.properties     ssl-server.xml.example
hadoop-metrics.properties   kms-site.xml             yarn-env.cmd
hadoop-policy.xml           log4j.properties         yarn-env.sh
hdfs-site.xml               mapred-env.cmd           yarn-site.xml
[hadoop@hadoop000 hadoop]$ pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.15.1/etc/hadoop
[hadoop@hadoop000 hadoop]$ sudo vi hadoop-env.sh 

-----------------------------

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}

export JAVA_HOME=/home/hadoop/app/jdk1.8.0_91 

vi ~/.bash_profile

export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.15.1
export PATH=$HADOOP_HOME/bin:$PATH

cd $HADOOP_HOME/bin

[hadoop@hadoop000 hadoop-2.6.0-cdh5.15.1]$ ls
bin             etc                  include  LICENSE.txt  README.txt  src
bin-mapreduce1  examples             lib      logs         sbin
cloudera        examples-mapreduce1  libexec  NOTICE.txt   share
目录 用途
bin hadoop客户端名单
etc/hadoop hadoop相关的配置文件存放目录
sbin 启动hadoop相关进程的脚本
share 常用的例子

HDFS格式化以及启动详解

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.15.1/

vi etc/hadoop/core-site.xml:

说明这个主节点再这台机器上的8020端口

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop000:8020</value>
    </property>
</configuration>

vi etc/hadoop/hdfs-site.xml:

<configuration>


    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/app/tmp</value>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>


</configuration>

vi slaves

第一次要执行格式化文件系统,不重复执行: hdfs namenode -format

cd $HADOOP_HOME/bin

相关命令再这里cd $HADOOP_HOME/bin

验证成功

[hadoop@hadoop000 sbin]$ jps
13607 NameNode
14073 Jps
13722 DataNode
13915 SecondaryNameNode

http://192.168.1.200:50070
发现jps可以打开但浏览器不行,多半是防火墙

查看防火墙 firewall-cmd --state
关防火墙systemctl stop firewalld.service

[hadoop@hadoop000 sbin]$ firewall-cmd --state
not running

tart-dfs. sh等于

hadoop-daemons.sh start namenode
hadoop-daemons.sh start datanode
hadoop-daemons.sh start secondarynamenode

同理stop-dfs.sh也是

Hadoop命令行操作详解

改了环境变量记得
source ~/.bash_profile

[hadoop@hadoop000 bin]$ ./hadoop
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  s3guard              manage data on S3
  trace                view and modify Hadoop tracing settings
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.


[hadoop@hadoop000 bin]$ ./hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-usage [cmd ...]]

[hadoop@hadoop000 hadoop-2.6.0-cdh5.15.1]$ hadoop fs -put README.txt  /
[hadoop@hadoop000 hadoop-2.6.0-cdh5.15.1]$ hadoop fs -ls /
Found 1 items
-rw-r--r--   1 hadoop supergroup       1366 2020-08-17 21:35 /README.txt

[hadoop@hadoop000 hadoop-2.6.0-cdh5.15.1]$ hadoop fs -cat /README.txt

......
and our wiki, at:
......
  Hadoop Core uses the SSL libraries from the Jetty project written 
by mortbay.org.

[hadoop@hadoop000 hadoop-2.6.0-cdh5.15.1]$ hadoop fs -get /README.txt ./


[hadoop@hadoop000 hadoop-2.6.0-cdh5.15.1]$ hadoop fs -mkdir /hdfs-test
[hadoop@hadoop000 hadoop-2.6.0-cdh5.15.1]$ hadoop fs -ls /
Found 2 items
-rw-r--r--   1 hadoop supergroup       1366 2020-08-17 21:35 /README.txt
drwxr-xr-x   - hadoop supergroup          0 2020-08-17 21:48 /hdfs-test

HDFS的存储扩展



上图我们可以看到一个文件被拆了两个块,但是实际存储的在哪里呢?

由此我们得出
put,1个文件分割成n个块,然后再存放再不同的节点的
get,先去n个节点上的n个块上找到对应的数据信息

标签:03,fs,HDFS,...,hadoop,Hadoop,hadoop000,file
来源: https://www.cnblogs.com/hiszm/p/13376395.html