SequenceFile is not created in hadoop

2016-01-08T21:16:32

I am writing a MapReduce job to test some calculations. I split my input into maps so that each map does part of the calculus, the result will be a list of (X,y) pairs which I want to flush into a SequenceFile.

The map part goes well but when the Reducer kicks in I get this error: Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://172.16.199.132:9000/user/hduser/FractalJob_1452257628594_410365359/out/reduce-out.

Another observation would be that this error appears only when I use more then map.

UPDATED Here is my Mapper and Reducer code.

public static class RasterMapper extends Mapper<IntWritable, IntWritable, IntWritable, IntWritable> {
        private int imageS;
        private static Complex mapConstant;


        @Override
        public void setup(Context context) throws IOException {
            imageS = context.getConfiguration().getInt("image.size", -1);

            mapConstant = new Complex(context.getConfiguration().getDouble("constant.re", -1),
                    context.getConfiguration().getDouble("constant.im", -1));

        }

        @Override
        public void map(IntWritable begin, IntWritable end, Context context) throws IOException, InterruptedException {


            for (int x = (int) begin.get(); x < end.get(); x++) {
                for (int y = 0; y < imageS; y++) {

                    float hue = 0, brighness = 0;
                    int icolor = 0;
                    Complex z = new Complex(2.0 * (x - imageS / 2) / (imageS / 2),
                            1.33 * (y - imageS / 2) / (imageS / 2));

                    icolor = startCompute(generateZ(z), 0);

                    if (icolor != -1) {
                        brighness = 1f;
                    }


                    hue = (icolor % 256) / 255.0f;

                    Color color = Color.getHSBColor(hue, 1f, brighness);
                    try {
                        context.write(new IntWritable(x + y * imageS), new IntWritable(color.getRGB()));
                    } catch (Exception e) {
                        e.printStackTrace();

                    }

                }
            }

        }


        private static Complex generateZ(Complex z) {
            return (z.times(z)).plus(mapConstant);
        }

        private static int startCompute(Complex z, int color) {

            if (z.abs() > 4) {
                return color;
            } else if (color >= 255) {
                return -1;
            } else {
                color = color + 1;
                return startCompute(generateZ(z), color);
            }
        }

    }

    public static class ImageReducer extends Reducer<IntWritable, IntWritable, WritableComparable<?>, Writable> {
        private SequenceFile.Writer writer;

        @Override
        protected void cleanup(Context context) throws IOException, InterruptedException {
            writer.close();
        }
        @Override
        public void setup(Context context) throws IOException, InterruptedException {
            Configuration conf = context.getConfiguration();
            Path outDir = new Path(conf.get(FileOutputFormat.OUTDIR));
            Path outFile = new Path(outDir, "pixels-out");

            Option optPath = SequenceFile.Writer.file(outFile);
            Option optKey = SequenceFile.Writer.keyClass(IntWritable.class);
            Option optVal = SequenceFile.Writer.valueClass(IntWritable.class);
            Option optCom = SequenceFile.Writer.compression(CompressionType.NONE);
            try {
                writer = SequenceFile.createWriter(conf, optCom, optKey, optPath, optVal);
            } catch (Exception e) {
                e.printStackTrace();
            }

        }
        @Override
        public void reduce (IntWritable key,  Iterable<IntWritable>  value, Context context) throws IOException, InterruptedException {

                try{

                    writer.append(key, value.iterator().next());
                } catch (Exception e) {
                        e.printStackTrace();

                }
            }
        }

I hope you guys can help me out. Thank you!

EDIT:

Job failed as tasks failed. failedMaps:1 failedReduces:0

Looking better at the logs I noticed I think that the issue come from the way I feed my data to the maps.I split my image size into several sequence files so that the maps can read it from there and compute the colors for the pixels in that area.

This is the way I create the files :

try {
    int offset = 0;

    // generate an input file for each map task
    for (int i = 0; i < mapNr; ++i) {

        final Path file = new Path(input, "part" + i);

        final IntWritable begin = new IntWritable(offset);
        final IntWritable end = new IntWritable(offset + imgSize / mapNr);
        offset = (int) end.get();

        Option optPath = SequenceFile.Writer.file(file);
        Option optKey = SequenceFile.Writer.keyClass(IntWritable.class);
        Option optVal = SequenceFile.Writer.valueClass(IntWritable.class);
        Option optCom = SequenceFile.Writer.compression(CompressionType.NONE);
        SequenceFile.Writer writer = SequenceFile.createWriter(conf, optCom, optKey, optPath, optVal);
        try {
            writer.append(begin, end);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            writer.close();
        }
        System.out.println("Wrote input for Map #" + i);
    }

Log file:

 16/01/10 19:06:04 INFO client.RMProxy: Connecting to ResourceManager at /172.16.199.132:8032
16/01/10 19:06:07 INFO input.FileInputFormat: Total input paths to process : 4
16/01/10 19:06:07 INFO mapreduce.JobSubmitter: number of splits:4
16/01/10 19:06:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1452444283951_0007
16/01/10 19:06:08 INFO impl.YarnClientImpl: Submitted application application_1452444283951_0007
16/01/10 19:06:08 INFO mapreduce.Job: The url to track the job: http://172.16.199.132:8088/proxy/application_1452444283951_0007/
16/01/10 19:06:08 INFO mapreduce.Job: Running job: job_1452444283951_0007
16/01/10 19:06:19 INFO mapreduce.Job: Job job_1452444283951_0007 running in uber mode : false
16/01/10 19:06:20 INFO mapreduce.Job:  map 0% reduce 0%
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000002_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000000_0, Status : FAILED
16/01/10 19:06:49 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_0, Status : FAILED
16/01/10 19:07:07 INFO mapreduce.Job:  map 25% reduce 0%
16/01/10 19:07:08 INFO mapreduce.Job:  map 50% reduce 0%
16/01/10 19:07:10 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_1, Status : FAILED
16/01/10 19:07:11 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_1, Status : FAILED
16/01/10 19:07:25 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_r_000000_0, Status : FAILED
16/01/10 19:07:32 INFO mapreduce.Job:  map 100% reduce 0%
16/01/10 19:07:32 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000003_2, Status : FAILED
16/01/10 19:07:32 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_m_000001_2, Status : FAILED
16/01/10 19:07:33 INFO mapreduce.Job:  map 50% reduce 0%
16/01/10 19:07:43 INFO mapreduce.Job:  map 75% reduce 0%
16/01/10 19:07:44 INFO mapreduce.Job: Task Id : attempt_1452444283951_0007_r_000000_1, Status : FAILED
16/01/10 19:07:50 INFO mapreduce.Job:  map 100% reduce 100%
16/01/10 19:07:51 INFO mapreduce.Job: Job job_1452444283951_0007 failed with state FAILED due to: Task failed task_1452444283951_0007_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/01/10 19:07:51 INFO mapreduce.Job: Counters: 40
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=3048165
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=765
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=12
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Failed map tasks=9
        Failed reduce tasks=2
        Killed reduce tasks=1
        Launched map tasks=12
        Launched reduce tasks=3
        Other local map tasks=8
        Data-local map tasks=4
        Total time spent by all maps in occupied slots (ms)=239938
        Total time spent by all reduces in occupied slots (ms)=34189
        Total time spent by all map tasks (ms)=239938
        Total time spent by all reduce tasks (ms)=34189
        Total vcore-seconds taken by all map tasks=239938
        Total vcore-seconds taken by all reduce tasks=34189
        Total megabyte-seconds taken by all map tasks=245696512
        Total megabyte-seconds taken by all reduce tasks=35009536
    Map-Reduce Framework
        Map input records=3
        Map output records=270000
        Map output bytes=2160000
        Map output materialized bytes=2700018
        Input split bytes=441
        Combine input records=0
        Spilled Records=270000
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=538
        CPU time spent (ms)=5520
        Physical memory (bytes) snapshot=643928064
        Virtual memory (bytes) snapshot=2537975808
        Total committed heap usage (bytes)=408760320
    File Input Format Counters 
        Bytes Read=324
Constructing image...
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://172.16.199.132:9000/user/hduser/FractalJob_1452445557585_342741171/out/pixels-out
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752)
    at FractalJob.generateFractal(FractalJob.j..

This is the configuration:

conf.setInt("image.size", imgSize);
    conf.setDouble("constant.re", FractalJob.constant.re());
    conf.setDouble("constant.im", FractalJob.constant.im());

    Job job = Job.getInstance(conf);
    job.setJobName(FractalJob.class.getSimpleName());

    job.setJarByClass(FractalJob.class);
    job.setInputFormatClass(SequenceFileInputFormat.class);

    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(IntWritable.class);
    job.setOutputFormatClass(SequenceFileOutputFormat.class);

    job.setMapperClass(RasterMapper.class);

    job.setReducerClass(ImageReducer.class);
    job.setNumReduceTasks(1);

    job.setSpeculativeExecution(false);

    final Path input = new Path(filePath, "in");
    final Path output = new Path(filePath, "out");

    FileInputFormat.setInputPaths(job, input);
    FileOutputFormat.setOutputPath(job, output);

Copyright License:
Author:「Niinle」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/34678029/sequencefile-is-not-created-in-hadoop

About “SequenceFile is not created in hadoop” questions

Currently I use the following code to append to an existing SequenceFile: // initialize sequence writer Writer writer = SequenceFile.createWriter( FileContext.getFileContext(this.conf), ...
I am trying to read a sequencefile in hadoop 2.0 but I am unable to achieve it. I am using the below code which works perfectly fine in hadoop 1.0. Please let me know if I am missing something wrt ...
I try to build nodejs server which collect binary data from user and storing it to Hadoop sequencefile. As a good tutorial, there's approach using the Hadoop executable. My question: Is there java...
I am writing a MapReduce job to test some calculations. I split my input into maps so that each map does part of the calculus, the result will be a list of (X,y) pairs which I want to flush into a
I'm creating a HashMap of key value pairs of a Hadoop Vector that is stored inside a SequenceFile. For efficiency purposes I want to know how long the Vector of key value pairs is so that I can
The Hadoop SequenceFile is basically a collection of key/value pairs. In my application, I need to consume events from Kafka and handle the possible duplicates. Can I use SequenceFile for deduplica...
I have written some binary image data to a Hadoop SequenceFile and would like to write it out as a PNG outside of Hadoop, if possible, using Java. [Edited] Overview of the data flow: Input files →
I was trying to run a matrix multiplication example presented by Mr. Norstadt under following link http://www.norstad.org/matrix-multiply/index.html. I can run it successfully with hadoop 0.20.2 bu...
I'm thinking to use a SequenceFile as "a little database" to store small files. I need that concurrency-client could store small file in this SequenceFile and retrieve an unique id (key of the reco...
I read the SequenceFile.java in hadoop-1.0.4 source codes. And I find the sync(long) method which is used to find a "sync marker" (a 16 bytes MD5 when generated at file creation time) in SequenceFile

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.