Configuring hadoop with Blob storage on Azure

2016-01-14T17:56:43

we are testing hadoop HA cluster on Azure v2 cloud running on linux. We are trying to switch to Azure BLOB storage. We are not sure how we should configure name nodes using Blob storage. We are getting following error:

2015-12-22 13:05:50,193 INFO  ha.StandbyCheckpointer (StandbyCheckpointer.java:start(129)) - Starting standby checkpoint thread...
Checkpointing active NN at http://bd-azure-qa-nn2:50070
Serving checkpoints at http://bd-azure-qa-nn1:50070
2015-12-22 13:07:50,240 INFO  ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(269)) - Triggering log roll on remote NameNode bd-azure-qa-nn2/10.0.0.7:8020
2015-12-22 13:07:51,387 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:52,391 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:53,400 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:54,416 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:55,425 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:56,450 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:57,456 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:58,462 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:07:59,473 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:00,478 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:01,482 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:02,490 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 11 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:03,501 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 12 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:04,515 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 13 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:05,520 INFO  ipc.Client (Client.java:handleConnectionFailure(858)) - Retrying connect to server: bd-azure-qa-nn2/10.0.0.7:8020. Already tried 14 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-12-22 13:08:05,966 WARN  ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(274)) - Unable to trigger a roll of the active NN
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby
    at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
    at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1719)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1352)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6339)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:933)
    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
    at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

    at org.apache.hadoop.ipc.Client.call(Client.java:1468)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    at com.sun.proxy.$Proxy14.rollEditLog(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:145)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
    at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
    at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)

We are not pretty sure how to configure name nodes with Azure Blob since Blob effectively handover the HDFS functionality. We have standard HA name node configuration in hdfs-site.xml and modified core-site.xml with the following properties to enable BLOB storage.

   <property>
      <name>fs.AbstractFileSystem.wasb.impl</name>
      <value>org.apache.hadoop.fs.azure.Wasb</value>
    </property>

    <property>
      <name>fs.azure.account.key.OUR_STORAGE_ACCOUNT.blob.core.windows.net</name>
      <value>"OUR_KEY"</value>
    </property>

    <property>
      <name>fs.defaultFS</name>
      <value>wasb://blob-hdfs@OUR_STORAGE_ACCOUNT.blob.core.windows.net</value>
      <final>true</final>
    </property>

    <property>
      <name>fs.azure.page.blob.dir</name>
      <value>/datadir</value>
    </property>

    <property>
      <name>fs.azure.selfthrottling.read.factor</name>
      <value>1.000000</value>
    </property>

    <property>
      <name>fs.azure.selfthrottling.write.factor</name>
      <value>1.000000</value>
   <property>

Our HA name in the original cluster were:

<!--    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://namenodeha</value>
  </property> -->

hdfs-site.xml we didn't touch at all.

We are not sure about name node settings. Two name node from original settings are probably overkill since underlying BLOB should handle all replication etc.

Could someone please clarify?

Copyright License:
Author:「jaksky」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/34786454/configuring-hadoop-with-blob-storage-on-azure

About “Configuring hadoop with Blob storage on Azure” questions

we are testing hadoop HA cluster on Azure v2 cloud running on linux. We are trying to switch to Azure BLOB storage. We are not sure how we should configure name nodes using Blob storage. We are get...
We are trying to use azure blob storage with Hadoop. The issue is that we have to set the fs.defaultFS property in core-site.xml but in our case, we are receiving an error. The file and the error b...
While trying to connect the local hadoop with the AZURE BLOB storage (ie using the blob storage as HDFS)with Hadoop Version - 2.7.1, It throws exception Here i have successfully formed the local
I have a map-reduce job and the reducer gets an absolute address of a file residing on the Azure Blob storage and the reducer should opens it and read its content. I add the storage account contain...
I need to store the messages pushed to Kafka in a deep storage. We are using Azure cloud services so I suppose Azure Blob storage could be a better option. I want to use Kafka Connect's sink connec...
I have some basic clarifications about azure hdInsight. The following article gives some basic input on using hdinsight. https://azure.microsoft.com/en-in/documentation/articles/hdinsight-hadoop-em...
I'm trying to connect Spark to azure blob storage (wasbs). I add the following jars in the hadoop classpath com.microsoft.azure_azure-storage-7.0.0.jar org.apache.hadoop_hadoop-annotations-3.1.2.j...
I try to use org.apache.hadoop.fs.FileSystem.get(Config...); method to get on azure storages. In case of Azure Data Lake Gen2 I use URI like: abfs://[email protected]/
I have been trying to get this to work for a while now so would appreciate some help. I am using the following: HDInsight Emulator Have set the default file system of the emulator in the core-sit...
I have some data in dataframe which i have to convert to json and store it into Azure Blob Storage. Is there any way to achieve this? Below are the steps which i have tried. I am trying it from spark-

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.