首页 > 其他分享> > 解决 NN 连接不上 JN 的问题

解决 NN 连接不上 JN 的问题


自动故障转移配置好以后,然后使用 start-dfs.sh 群起脚本启动 hdfs 集群,有可能会遇到 NameNode 起来一会后,进程自动关闭的问题。查看 NameNode 日志,报错信息如下:

2020-08-17 10:11:40,658 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:40,659 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:40,659 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:41,660 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:41,660 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:41,665 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:42,661 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:42,661 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:42,667 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:43,662 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:43,662 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:43,668 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:44,663 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:44,663 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:44,670 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:45,467 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No 
responses yet.
2020-08-17 10:11:45,664 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:45,664 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:45,672 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:46,469 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No responses yet.
2020-08-17 10:11:46,665 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:46,665 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:46,673 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:47,470 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8004 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No responses yet.
2020-08-17 10:11:47,666 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:47,667 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:47,674 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:48,471 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No responses yet.
2020-08-17 10:11:48,668 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:48,668 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:48,675 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,669 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop102/ Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,673 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop104/ Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,676 INFO org.apache.hadoop.ipc.Client: Retrying connect 
to server: hadoop103/ Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,678 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input 
streams from QJM to [,,]. Skipping.org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
exceptions to achieve quorum size 2/3. 3 exceptions thrown: Call From hadoop102/ to hadoop103:8485 
failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see: http://wiki.apache.org/hadoop/ConnectionRefused Call From hadoop102/ to hadoop102:8485 
failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see: http://wiki.apache.org/hadoop/ConnectionRefused Call From hadoop102/ to hadoop104:8485 
failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see: http://wiki.apache.org/hadoop/ConnectionRefused

查看报错日志,可分析出报错原因是因为 NameNode 连接不上 JournalNode,而利用 jps 命令查看到三台 JN 都已经正常启动,为什么 NN 还是无法正常连接到 JN 呢?这是因为 start-dfs.sh 群起脚本默认的启动顺序是先启动 NN,再启动 DN,然后再启动 JN,并且默认的 rpc 连接参数是重试次数为 10,每次重试的间隔是 1s,也就是说启动完 NN以后的 10s 中内,JN 还启动不起来,NN 就会报错了。

core-default.xml 里面有两个参数如下:

<!-- NN 连接 JN 重试次数,默认是 10 次 -->
<!-- 重试时间间隔,默认 1s -->

解决方案:遇到上述问题后,可以稍等片刻,等 JN 成功启动后,手动启动下三台NN:

[root@hadoop102 ~]$ hdfs --daemon start namenode
[root@hadoop103 ~]$ hdfs --daemon start namenode
[root@hadoop104 ~]$ hdfs --daemon start namenode

也可以在 core-site.xml 里面适当调大上面的两个参数:

<!-- NN 连接 JN 重试次数,默认是 10 次 -->
<!-- 重试时间间隔,默认 1s -->

来源: https://blog.csdn.net/weixin_45417821/article/details/121273115