Hadoop crashed while running terasort?

2015-04-23T15:37:59

I am working with Hadoop single node, later may move on to multinode. Right now the same node is master as well as slave, hence namenode, datanode resource manager and node manager are running on the same PC.

Whenever I trigger terasort on seperate testing disk mounted on /home/hadoop/hdfs (here hadoop is user name), it fails with following errors:

INFO mapreduce.Job: Task Id : attempt_1429766544852_0001_m_001255_0, Status : FAILED
Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1429766544852_0001_m_001255_0_spill_1.out
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:769)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

15/04/23 11:36:07 INFO mapreduce.Job: Task Id : attempt_1429766544852_0001_m_001258_0, Status : FAILED
Error: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
        at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:163)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1633)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)

Error: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
        at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:163)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1633)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)

Error: java.io.IOException: Spill failed
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$300(MapTask.java:852)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1352)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1329)
        at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
        at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273)
        at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253)
        at org.apache.hadoop.io.Text.write(Text.java:323)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1127)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

Basically, spill failed, Disk checker, no space left etc.

When I looked into the issue, keep running df -h in separate terminal gave the clue that it is using / directory for some internal operations as the job was in progress. When no space left on / the job failed.

I tried changing hadoop.tmp.dir to some other mounted disk. It worked fine but again failed as that disk was also not having enough space.

My question is why is it happening, can we avoid this issue at all? Or, what exact parameters be configured in .xml config files so that to restrict it to within RAM or use disk space but make sure not to fail the job and use whatever space it has but dont crash due to any error which I have mentioned?

Thanks in advance.

PS: I have studied about alomst all config parameters and gone thourgh roughly all kinds of hit & trial but still it failed. Hence, I thought of asking here, hope you may help.

Copyright License:
Author:「Omkant」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/29816754/hadoop-crashed-while-running-terasort

About “Hadoop crashed while running terasort?” questions

I am working with Hadoop single node, later may move on to multinode. Right now the same node is master as well as slave, hence namenode, datanode resource manager and node manager are running on the
I have modified the hadoop terasort example located at the following path: hadoop/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java...
I have a Cloudera Hadoop cluster and I'm doing some benchmarks running Terasort but I'm getting very unstable results from 105 - 150 minutes. Some times I've seen it was replicating more than usual...
I am trying to run spark-terasort with spark-1.6.1-bin-hadoop1 (pre-built package for hadoop 1.X). When I try to run spark: ./bin/spark-submit --class com.github.ehiggs.spark.terasort.TeraGen ~/...
While Running the terasort application by modifying the parameters I'm getting the following Error. 15/05/24 21:41:42 ERROR terasort.TeraSort: Input path does not exist: maprfs:/user/user01/–DXm...
Question Regarding the TeraSort demo in hadoop, please suggest if the symptom is as expected or the workload should be distributed. Background Started Hadoop (3 nodes in a cluster) and run the
I'm trying to run TeraSort benckmark on my hadoop 2.1 cluster. After ran TeraGen successfully, I saw following error when running TeraSort. Could anyone help take a look? 13/12/16 01:18:25 INFO
In the Hadoop's implementation of Terasort, there's a scheduler called TeraScheduler. Having read through the code, the scheduler basically does the following: Pick the host with smallest number of
I am planning to insert some code into the mapper of the TeraSort class in Hadoop 0.20.2. However, after reviewing the source code, I cannot locate the segment that mapper is implemented. Normally,...
I tried to use hadoop terasort, and it worked well with teragen and teravalid. Then I wondered how the terasort works. I thought terasort works like a sort command in linux. So I made a text file l...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.