Hadoop tasktracker node fails to reconnect due to slow startup of jobtracker

2013-11-06T04:37:21

I'm running a hadoop cluster (version 0.20.205), and I have to periodically deploy new code to the cluster, which involves taking the cluster down and bringing it back up again with the new code. My problem is that, for reasons that are too complicated to go into here, I can't ensure that the jobtracker comes up before the tasktracker nodes. I see tasktracker nodes try to connect to the jobtracker that hasn't come up yet, and shut down after printing this to the logs:

- Can not start task tracker because java.io.IOException: Call to <jobtracker node> failed on local exception: java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at org.apache.hadoop.mapred.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:794)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:790)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1428)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3674)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:342)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745)

- SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at <tasktracker node>
************************************************************/

My question is this: is there some way I can configure the tasktracker nodes to try to reconnect in a loop until they've successfully connected to the jobtracker?

Thanks for the help!

Copyright License:
Author:「Seth」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/19798465/hadoop-tasktracker-node-fails-to-reconnect-due-to-slow-startup-of-jobtracker

About “Hadoop tasktracker node fails to reconnect due to slow startup of jobtracker” questions

I'm running a hadoop cluster (version 0.20.205), and I have to periodically deploy new code to the cluster, which involves taking the cluster down and bringing it back up again with the new code. My
i have successfully installed ubuntu 12.04 and hadoop 2.3.0. after entering the jps command i find the output as below 4135 jps 2582 SeconadaryNameNode 3143 NodeManager 2394 Namenode 2391 Datanod...
as the title says,In hadoop,what is the difference and relationship between jobtracker tasktracker? Can someone explain to me ,thanks for your kind help!
I am able to start the namenode and secondary namenode but I am not able to start jobtracker and tasktracker. When I check log it shows something like this ***************************************...
I use CDH5.4, I want to start the JobTracker and TaskTracker with this command sudo service hadoop-0.20-mapreduce-jobtracker start and sudo service hadoop-0.20-mapreduce-tasktracker start, I got this
I had Hadoop 3 node cluster which works perfectly. Next day the namenode and jobtracker stops working and the datanode and tasktracker working continuously . After starting the hadoop it works for ...
I have set up a pseudo-distributed Hadoop cluster (with jobtracker, a tasktracker, and namenode all on the same box) per tutorial instructions and it's working fine. I am now trying to add in a sec...
I have setup hadoop 1.1.1 on cygwin under windows 7. The dfs components are starting fine (start-dfs.sh or start-all.sh both) so is jobtracker (start mapred.sh) but the tasktracker fails to start and
I Installed Hadoop 2.4.X. As expected there is no JobTracker and TaskTracker. Its Yarn based. Is there any way to make it use old JobTracker and TaskTracker for MapReduce and not based on Yarn ? In
i am following http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html Start the namenode in the first window by executing cd hadoop-0.19.1 bin/hadoop namenode Start the secondary namenode ...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.