Hadoop Job throws java.io.IOException: Attempted read from closed stream

2013-01-08T04:35:53

I'm running a simple map-reduce job. This job uses 250 files from common crawl data.

e.g. s3://aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/

If I use, 50, 100 files, everything works OK. But with 250 files I get this error

java.io.IOException: Attempted read from closed stream.
    at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:159)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:107)
    at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:76)
    at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:136)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:111)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    at java.io.DataInputStream.readByte(DataInputStream.java:248)
    at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
    at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
    at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1707)
    at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1773)
    at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1849)
    at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:74)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:180)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:268)

Any clues?

Copyright License:
Author:「psabbate」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/14203621/hadoop-job-throws-java-io-ioexception-attempted-read-from-closed-stream

About “Hadoop Job throws java.io.IOException: Attempted read from closed stream” questions

I'm running a simple map-reduce job. This job uses 250 files from common crawl data. e.g. s3://aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/ If I use, 50, 100 files, everyth...
Am trying to view a wsdl file from SOAPUI but it throws the following error. I have researched the internet for a way to fix this but to no avail. What could be the issue here? Error loading [http...
I am making a HTTPPost call using Apache HTTP Client and then I am trying to create an object from the response using Jackson. Here is my code: private static final Logger log = Logger.getLogger(
I am trying to parse xml, but i am getting error while setInput method parser.setInput(responseStream, null); responseStream value are getting fine. error is like this, 08-28 15:47:37.069: WARN/...
I am trying to parse xml, but i am getting error while setInput method parser.setInput(responseStream, null); responseStream value are getting fine. error is like this, 08-28 15:47:37.069: WARN/...
I have come across many blogs that answers a way to read response body like below Response response = e.getResponse(); String body; try { Reader reader = res...
Considering the code below, when i call foo1() then foo2(), everything is good. But when i call them together, i randomly get the error: Error converting result java.io.IOException: Attempted read...
When I used two HTTP endpoints in a flow, Mule throws this exception. I've found the way to deal with this problem: use the second HTTP endpoint in asynchronous mode, but it is not a good way. ERROR
I want to process a file using aws s3 bucket with java. My Code AmazonS3 s3 = AmazonS3ClientBuilder.standard().build(); // Get the object from the event and show its content type String srcBucket =
I'm trying to find a way to avoid the IOException related to the fact that I read on a closed stream. I'm calling a webservice method that returns a Stream: InputStream stream =

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.