其他分享
首页 > 其他分享> > Hadoop-2.6.0 HA(高可用架构)部署(超详细)

Hadoop-2.6.0 HA(高可用架构)部署(超详细)

作者:互联网

目录


集群规划

Hadoop HA部署:

1)软件环境

2)系统环境准备

3)配置SSH通信

4)配置环境变量

5)配置zookeeper

6)配置Hadoop

7)启动集群

8)启动关闭顺序

9)Hadoop HA部署避坑指南


集群规划

主机 安装软件 进程
hadoop001 Hadoop、Zookeeper NameNode DFSZKFailoverController JournalNode DataNode ResourceManager JobHistoryServer NodeManager QuorumPeerMain
hadoop002 Hadoop、Zookeeper NameNode DFSZKFailoverController JournalNode DataNode ResourceManager NodeManager QuorumPeerMain
hadoop003 Hadoop、Zookeeper JournalNode DataNode QuorumPeerMain NodeManager

Hadoop HA部署

以阿里云主机为例

1)软件环境

2)系统环境准备

1.添加用户

[root@hadoop002 ~]# useradd hadoop
[root@hadoop002 ~]# passwd hadoop

在这里插入图片描述
2.更改用户权限

[root@hadoop002 ~]# vi /etc/sudoers

在这里插入图片描述

3)配置SSH通信

阿里云的centos镜像如果产生Permission denied, please try again错误,需要配置/etc/ssh/sshd_config文件。解决方法:
https://help.aliyun.com/knowledge_detail/41487.html?spm=a2c4e.11153987.0.0.6bcc4fbb6frbyn

a.修改主机的hosts文件

[root@hadoop001 .ssh]# vi /etc/hosts

在这里插入图片描述
传输hosts文件至另外两台主机
在这里插入图片描述
b.三台主机生成密钥

[hadoop@hadoop001 ~\]$ ssh-keygen
Generating public/private rsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
4b:fe:5c:6a:f5:df:80:28:02:96:6f:b8:7e:be:a1:0a hadoop@hadoop001
The key's randomart image is:
+--\[ RSA 2048\]----+
|                 |
|                 |
|                 |
|     .           |
|    +   S        |
|   . + o . ...   |
|E   . = + ..o..  |
| .   +.o +.o  ...|
|  .o+oo. .+    .o|
+-----------------+

c.将各台主机相应的id_rsa.pub文件分发到其他主机
此处id_rsa.pub2代表hadoop002机器下,如果是hadoop003机器为id_rsa.pub3,以此类推

hadoop002

[hadoop@hadoop002 .ssh]$ scp id_rsa.pub root@hadoop001:/home/hadoop/.ssh/id_rsa.pub2
The authenticity of host 'hadoop001 (*.*.*.43)' can't be established.
RSA key fingerprint is 7f:5b:5d:20:6e:f1:9c:18:01:1e:c4:97:ea:6f:2c:a2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,*.*.*.43' (RSA) to the list of known hosts.
root@hadoop001's password: 
id_rsa.pub     100%  398     0.4KB/s   00:00

hadoop003

[hadoop@hadoop003 .ssh]$ scp id_rsa.pub root@hadoop001:/home/hadoop/.ssh/id_rsa.pub2
The authenticity of host 'hadoop001 (*.*.*.43)' can't be established.
RSA key fingerprint is 7f:5b:5d:20:6e:f1:9c:18:01:1e:c4:97:ea:6f:2c:a2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,*.*.*.43' (RSA) to the list of known hosts.
root@hadoop001's password: 
id_rsa.pub     100%  398     0.4KB/s   00:00

hadoop001将id_rsa.pub写入authorized_keys,并分发至其他主机

[hadoop@hadoop001 .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@hadoop001 .ssh]$ ll
total 20
-rw-rw-r-- 1 hadoop hadoop  398 Feb 14 00:44 authorized_keys
-rw------- 1 hadoop hadoop 1675 Feb 14 00:32 id_rsa
-rw-r--r-- 1 hadoop hadoop  398 Feb 14 00:32 id_rsa.pub
-rw-r--r-- 1 root   root    398 Feb 14 00:36 id_rsa.pub2
-rw-r--r-- 1 root   root    398 Feb 14 00:42 id_rsa.pub3
[hadoop@hadoop001 .ssh]$ cat id_rsa.pub2 >> authorized_keys
[hadoop@hadoop001 .ssh]$ cat id_rsa.pub3 >> authorized_keys
[hadoop@hadoop001 .ssh]$ scp authorized_keys root@hadoop002:/home/hadoop/.ssh/
The authenticity of host 'hadoop002 (*.*.*.42)' can't be established.
RSA key fingerprint is a2:5c:9d:ee:67:0d:66:0d:df:1b:47:3d:f5:3c:2c:8d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop002,*.*.*.42' (RSA) to the list of known hosts.
root@hadoop002's password: 
authorized_key 100% 1194     1.2KB/s   00:00
[hadoop@hadoop001 .ssh]$ scp authorized_keys root@hadoop003:/home/hadoop/.ssh/
The authenticity of host 'hadoop003 (*.*.*.41)' can't be established.
RSA key fingerprint is aa:43:c2:8b:31:09:b7:46:d5:e2:a3:79:69:94:0c:50.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop003,*.*.*.41' (RSA) to the list of known hosts.
root@hadoop003's password: 
authorized_key 100% 1194     1.2KB/s   00:00

因为是通过root用户传输,通过Xshell同时操作退出后修改

[hadoop@hadoop001 .ssh]$ exit
logout
[root@hadoop001 hadoop]# chown -R hadoop:hadoop /home/hadoop/.ssh/*
[root@hadoop001 hadoop]# chown -R hadoop:hadoop /home/hadoop/.ssh/

d.修改权限,同时操作多台

[hadoop@hadoop001 .ssh]$ sudo chmod 700 -R ~/.ssh[sudo] password for hadoop: 
[hadoop@hadoop001 .ssh]$ sudo chmod 600 ~/.ssh/authorized_keys
[hadoop@hadoop001 .ssh]$ ll
total 12
-rw------- 1 hadoop hadoop  398 Feb 13 23:46 authorized_keys
-rwx------ 1 hadoop hadoop 1675 Feb 13 23:45 id_rsa
-rwx------ 1 hadoop hadoop  398 Feb 13 23:45 id_rsa.pub

测试SSH通信
在这里插入图片描述


4)配置环境变量

java目录 /usr/java 设置为全局,注意此处有坑
在这里插入图片描述

[root@hadoop001 jdk1.8.0_45]# vi /etc/profile

在这里插入图片描述

[root@hadoop001 jdk1.8.0_45]# source /etc/profile
[root@hadoop001 jdk1.8.0_45]# java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
[root@hadoop001 jdk1.8.0_45]# which java
/usr/java/jdk1.8.0_45/bin/java

配置hadoop&zookeeper环境变量,配置在当前用户
安装目录
在这里插入图片描述
环境变量
在这里插入图片描述


创建data,logs,tmp文件夹,并把tmp改为777权限

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ mkdir data logs tmp
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ sudo chmod -R 777 tmp
[sudo] password for hadoop: 
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ ll
total 68
drwxrwxr-x 2 hadoop hadoop  4096 Jan 12 22:15 bin
drwxrwxr-x 2 hadoop hadoop  4096 Feb 14 01:52 data
drwxrwxr-x 3 hadoop hadoop  4096 Jan 12 22:15 etc
drwxrwxr-x 2 hadoop hadoop  4096 Jan 12 22:15 include
drwxrwxr-x 3 hadoop hadoop  4096 Jan 12 22:15 lib
drwxrwxr-x 2 hadoop hadoop  4096 Jan 12 22:15 libexec
-rw-rw-r-- 1 hadoop hadoop 17087 Jan 12 22:15 LICENSE.txt
drwxrwxr-x 2 hadoop hadoop  4096 Feb 14 01:58 logs
-rw-rw-r-- 1 hadoop hadoop   101 Jan 12 22:15 NOTICE.txt
-rw-rw-r-- 1 hadoop hadoop  1366 Jan 12 22:15 README.txt
drwxrwxr-x 2 hadoop hadoop  4096 Jan 12 22:15 sbin
drwxrwxr-x 4 hadoop hadoop  4096 Jan 12 22:15 share
drwxrwxrwx 2 hadoop hadoop  4096 Feb 14 01:58 tmp

5)配置zookeeper
[hadoop@hadoop001 conf]$ pwd
/home/hadoop/app/zookeeper-3.4.6/conf
[hadoop@hadoop001 conf]$ ls
configuration.xsl  zoo_sample.cfg
log4j.properties
[hadoop@hadoop001 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@hadoop001 conf]$ ls
configuration.xsl  zoo.cfg
log4j.properties   zoo_sample.cfg
[hadoop@hadoop001 conf]$ vi zoo.cfg

在这里插入图片描述
复制到其他主机

[hadoop@hadoop001 conf]$ scp zoo.cfg hadoop002:/home/hadoop/app/zookeeper-3.4.6/conf
zoo.cfg       100% 1033     1.0KB/s   00:00    
[hadoop@hadoop001 conf]$ scp zoo.cfg hadoop003:/home/hadoop/app/zookeeper-3.4.6/conf
zoo.cfg       100% 1033     1.0KB/s   00:00
[hadoop@hadoop001 zookeeper-3.4.6]$ mkdir data
[hadoop@hadoop001 zookeeper-3.4.6]$ touch data/myid
[hadoop@hadoop001 zookeeper-3.4.6]$ echo 1 >data/myid
[hadoop@hadoop001 zookeeper-3.4.6]$ cat data/myid
1

hadoop002/003也修改配置

[hadoop@hadoop002 zookeeper-3.4.6]$ echo 2 >data/myid
[hadoop@hadoop003 zookeeper-3.4.6]$ echo 3 >data/myid

6)配置Hadoop

修改hadoop-env.sh文件内的JAVA_HOME

[hadoop@hadoop001 hadoop]$ vi hadoop-env.sh

在这里插入图片描述

[hadoop@hadoop001 hadoop]$ scp hadoop-env.sh hadoop002:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
hadoop-env.sh 100% 4233     4.1KB/s   00:00    
[hadoop@hadoop001 hadoop]$ scp hadoop-env.sh hadoop003:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
hadoop-env.sh 100% 4233     4.1KB/s   00:00
编辑slaves

在这里插入图片描述

编辑core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://alterpan</value>
        </property>
        <!--==============================Trash机制======================================= -->
        <property>
                <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
                <name>fs.trash.checkpoint.interval</name>
                <value>0</value>
        </property>
        <property>
                <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
                <name>fs.trash.interval</name>
                <value>1440</value>
        </property>
         <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
        <property>   
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp</value>
        </property>
         <!-- 指定zookeeper地址 -->
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
        </property>
         <!--指定ZooKeeper超时间隔,单位毫秒 -->
        <property>
                <name>ha.zookeeper.session-timeout.ms</name>
                <value>2000</value>
        </property>
        <property>
           <name>hadoop.proxyuser.hadoop.hosts</name>
           <value>*</value> 
        </property> 
        <property> 
            <name>hadoop.proxyuser.hadoop.groups</name> 
            <value>*</value> 
       </property> 
      <property>
		  <name>io.compression.codecs</name>
		  <value>org.apache.hadoop.io.compress.GzipCodec,
			org.apache.hadoop.io.compress.DefaultCodec,
			org.apache.hadoop.io.compress.BZip2Codec,
			org.apache.hadoop.io.compress.SnappyCodec
		  </value>
      </property>
</configuration>
编辑hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!--HDFS超级用户 -->
	<property>
		<name>dfs.permissions.superusergroup</name>
		<value>hadoop</value>
	</property>

	<!--开启web hdfs -->
	<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name</value>
		<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
	</property>
	<property>
		<name>dfs.namenode.edits.dir</name>
		<value>${dfs.namenode.name.dir}</value>
		<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/data</value>
		<description>datanode存放block本地目录(需要修改)</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<!-- 块大小256M (默认128M) -->
	<property>
		<name>dfs.blocksize</name>
		<value>268435456</value>
	</property>
	<!--======================================================================= -->
	<!--HDFS高可用配置 -->
	<!--指定hdfs的nameservice为alterpan,需要和core-site.xml中的保持一致 -->
	<property>
		<name>dfs.nameservices</name>
		<value>alterpan</value>
	</property>
	<property>
		<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
		<name>dfs.ha.namenodes.alterpan</name>
		<value>nn1,nn2</value>
	</property>

	<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
	<property>
		<name>dfs.namenode.rpc-address.alterpan.nn1</name>
		<value>hadoop001:8020</value>
	</property>
	<property>
		<name>dfs.namenode.rpc-address.alterpan.nn2</name>
		<value>hadoop002:8020</value>
	</property>

	<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
	<property>
		<name>dfs.namenode.http-address.alterpan.nn1</name>
		<value>hadoop001:50070</value>
	</property>
	<property>
		<name>dfs.namenode.http-address.alterpan.nn2</name>
		<value>hadoop002:50070</value>
	</property>

	<!--==================Namenode editlog同步 ============================================ -->
	<!--保证数据恢复 -->
	<property>
		<name>dfs.journalnode.http-address</name>
		<value>0.0.0.0:8480</value>
	</property>
	<property>
		<name>dfs.journalnode.rpc-address</name>
		<value>0.0.0.0:8485</value>
	</property>
	<property>
		<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
		<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/alterpan</value>
	</property>

	<property>
		<!--JournalNode存放数据地址 -->
		<name>dfs.journalnode.edits.dir</name>
		<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/jn</value>
	</property>
	<!--==================DataNode editlog同步 ============================================ -->
	<property>
		<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
                             <!-- 配置失败自动切换实现方式 -->
		<name>dfs.client.failover.proxy.provider.alterpan</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>
	<!--==================Namenode fencing:=============================================== -->
	<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/home/hadoop/.ssh/id_rsa</value>
	</property>
	<property>
		<!--多少milliseconds 认为fencing失败 -->
		<name>dfs.ha.fencing.ssh.connect-timeout</name>
		<value>30000</value>
	</property>

	<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
	<!--开启基于Zookeeper  -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<!--动态许可datanode连接namenode列表 -->
	 <property>
	   <name>dfs.hosts</name>
	   <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves</value>
	 </property>
</configuration>
编辑mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!-- 配置 MapReduce Applications -->
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<!-- JobHistory Server ============================================================== -->
	<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>hadoop001:10020</value>
	</property>
	<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>hadoop001:19888</value>
	</property>

<!-- 配置 Map段输出的压缩,snappy-->
  <property>
      <name>mapreduce.map.output.compress</name> 
      <value>true</value>
  </property>
              
  <property>
      <name>mapreduce.map.output.compress.codec</name> 
      <value>org.apache.hadoop.io.compress.SnappyCodec</value>
   </property>
</configuration>
编辑yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!-- nodemanager 配置 ================================================= -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
	<property>
		<name>yarn.nodemanager.localizer.address</name>
		<value>0.0.0.0:23344</value>
		<description>Address where the localizer IPC is.</description>
	</property>
	<property>
		<name>yarn.nodemanager.webapp.address</name>
		<value>0.0.0.0:23999</value>
		<description>NM Webapp address.</description>
	</property>

	<!-- HA 配置 =============================================================== -->
	<!-- Resource Manager Configs -->
	<property>
		<name>yarn.resourcemanager.connect.retry-interval.ms</name>
		<value>2000</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
		<value>true</value>
	</property>
	<!-- 集群名称,确保HA选举时对应的集群 -->
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>yarn-cluster</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
	</property>


    <!--这里RM主备结点需要单独指定,(可选)
	<property>
		 <name>yarn.resourcemanager.ha.id</name>
		 <value>rm2</value>
	 </property>
	 -->

	<property>
		<name>yarn.resourcemanager.scheduler.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
	</property>
	<property>
		<name>yarn.resourcemanager.recovery.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
		<value>5000</value>
	</property>
	<!-- ZKRMStateStore 配置 -->
	<property>
		<name>yarn.resourcemanager.store.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>
	<property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
	</property>
	<property>
		<name>yarn.resourcemanager.zk.state-store.address</name>
		<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
	</property>
	<!-- Client访问RM的RPC地址 (applications manager interface) -->
	<property>
		<name>yarn.resourcemanager.address.rm1</name>
		<value>hadoop001:23140</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address.rm2</name>
		<value>hadoop002:23140</value>
	</property>
	<!-- AM访问RM的RPC地址(scheduler interface) -->
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm1</name>
		<value>hadoop001:23130</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm2</name>
		<value>hadoop002:23130</value>
	</property>
	<!-- RM admin interface -->
	<property>
		<name>yarn.resourcemanager.admin.address.rm1</name>
		<value>hadoop001:23141</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address.rm2</name>
		<value>hadoop002:23141</value>
	</property>
	<!--NM访问RM的RPC端口 -->
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
		<value>hadoop001:23125</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
		<value>hadoop002:23125</value>
	</property>
	<!-- RM web application 地址 -->
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>hadoop001:8088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>hadoop002:8088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.https.address.rm1</name>
		<value>hadoop001:23189</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.https.address.rm2</name>
		<value>hadoop002:23189</value>
	</property>

	<property>
	   <name>yarn.log-aggregation-enable</name>
	   <value>true</value>
	</property>
	<property>
		 <name>yarn.log.server.url</name>
		 <value>http://hadoop001:19888/jobhistory/logs</value>
	</property>


	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>2048</value>
	</property>
	<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>1024</value>
		<discription>单个任务可申请最少内存,默认1024MB</discription>
	 </property>

  
  <property>
	<name>yarn.scheduler.maximum-allocation-mb</name>
	<value>2048</value>
	<discription>单个任务可申请最大内存,默认8192MB</discription>
  </property>

   <property>
       <name>yarn.nodemanager.resource.cpu-vcores</name>
       <value>2</value>
    </property>
</configuration>

7)启动集群

a.启动zookeeper,注意是否正常开启leader
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

b.启动Hadoop(HDFS + YARN)
格式化前现在JN节点上启动JN

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cd sbin
[hadoop@hadoop001 sbin]$ ./hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-journalnode-hadoop001.out
[hadoop@hadoop001 sbin]$ jps
1955 JournalNode
2006 Jps
1878 QuorumPeerMain

在这里插入图片描述
c.选取hadoop001机器进行namenode格式化

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hadoop namenode -format

d.同步元数据到hadoop002

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ scp -r data hadoop002:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/
in_use.lock           100%   14     0.0KB/s   00:00    
VERSION               100%  154     0.2KB/s   00:00    
fsimage_0000000000000 100%   62     0.1KB/s   00:00    
seen_txid             100%    2     0.0KB/s   00:00    
VERSION               100%  204     0.2KB/s   00:00    
fsimage_0000000000000 100%  338     0.3KB/s   00:00

e.初始化zkfc

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hdfs zkfc -formatZK

f.启动HDFS

[hadoop@hadoop001 sbin]$ ./start-yarn.sh

g.hadoop002 备机启动 RM

[hadoop@hadoop002 ~]# yarn-daemon.sh start resourcemanager

8)集群启动关闭顺序

--------------------启动-----------------------------
a.启动zookeeper

[hadoop@hadoop001 bin]# zkServer.sh start
[hadoop@hadoop002 bin]# zkServer.sh start
[hadoop@hadoop002 bin]# zkServer.sh start

b.启动Hadoop(HDFS + YARN)

[hadoop@hadoop001 sbin]# start-dfs.sh
[hadoop@hadoop001 sbin]# start-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh start resourcemanager
[hadoop@hadoop001 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

--------------------关闭-----------------------------
a.关闭HDFS + YARN

[hadoop@hadoop001 sbin]# stop-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh stop resourcemanager
[hadoop@hadoop001 sbin]# stop-dfs.sh

b.关闭zookeeper

[hadoop@hadoop001 bin]# zkServer.sh stop
[hadoop@hadoop002 bin]# zkServer.sh stop
[hadoop@hadoop003 bin]# zkServer.sh stop

9)Hadoop HA部署避坑指南

1.Java权限问题,chown -R root:root java所在目录解决
在这里插入图片描述

2.配置SSH通信时报错Permission denied, please try again如何解决:
1)https://help.aliyun.com/knowledge_detail/41487.html?spm=a2c4e.11153987.0.0.6bcc4fbb6frbyn
2)检查配置的SSH文件,密钥文件没有存储到对应的主机也会造成此种情况

3.集群启动时报错Name or service not knownstname hadoop001如何解决
1)检查日志
2)手工启动可以正常运行
3)查看slaves文件(我就是因为salves文件问题,注意如果是在windows编辑上传的话需要文件格式转换,不然显示的是[dos])
在这里插入图片描述
在这里插入图片描述

标签:00,resourcemanager,hadoop,yarn,Hadoop,hadoop001,hadoop002,HA,2.6
来源: https://blog.csdn.net/aubekpan/article/details/87213303