多个NameNode的HDFS集群切换HA后,Spark应用变得很慢的处理办法(After the HDFS cluster of multiple namenodes switches ha, the spark application becomes very slow)

Spark客户端需要依次轮询到Active NameNode才能正确执行。
查看hdfs-site.xml配置,配置在hdfs-site.xml中的FailoverProxyProvider接口为ConfiguredFailoverProxyProvider。源码实现为:从hdfs-site.xml中查找所有的dfs.namenode.rpc-address键,依次遍历尝试,直到找到ActiveNameNode。

ConfiguredFailoverProxyProvider还有另外一个实现:org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider为升级版实现,同时并发发出请求,只要收到一个正确地响应,则认为是Active,并且马上取消掉其他请求。
我做了一个测试:
切换到第5个NameNode,执行hdfs dfs -ls /
ConfigBased: 10秒
RequestHedging: 1秒

可以发现,RequestHedging这种查找方式要比之前的方式更快。

————————

Spark clients need to poll active namenodes in order to execute correctly.
View HDFS site XML configuration in HDFS site The failoverproxyprovider interface in XML is configuredfailoverproxyprovider. The source code is implemented from HDFS site Find all DFs in XML namenode. RPC address key, traverse the attempts in turn until activenamenode is found.

Configuredfailoverproxyprovider has another implementation: org apache. hadoop. hdfs. server. namenode. ha. Requesthedgingproxyprovider is implemented in an upgraded version, and requests are sent concurrently. As long as a correct response is received, it is considered active, and other requests are cancelled immediately.
I did a test:
Switch to the fifth namenode and execute HDFS DFS – ls/
Configbased: 10 seconds
Requesthedging: 1 second

It can be found that the lookup method of requesthedging is faster than the previous method.