How do I find the value in a Hadoop SequenceFile?

2020-07-14T08:00:40

I have written some binary image data to a Hadoop SequenceFile and would like to write it out as a PNG outside of Hadoop, if possible, using Java.

[Edited] Overview of the data flow: Input files → Generate BufferedImages from input → Convert BufferedImages into binary arrays → Store as SequenceFile in HDFS → Trying to take the SequenceFile outside of HDFS and convert it into PNG.

However, I am not sure of how to locate where the data starts inside the SequenceFile. From what I have seen of the SequenceFile documentation, I can use the sync marker to locate the end of the SequenceFile header, and then use the record length and key length information to find the beginning of the value.

However, I am unsure of how to find where the sync marker is. How would I find where the header's metadata stops and where the sync marker begins and ends? Would it be possible for me to calculate the value of the sync marker and look for it that way? Also, how can I find out the number of bytes the record length and key length take up?

If there are alternative ways of finding the SequenceFile value, please let me know. If it helps, here is a little bit of code that I used to write to the SequenceFile.

baos = new ByteArrayOutputStream(); 
ImageIO.write(img, "png", baos); //img is a BufferedImage
byte[] imBytes = baos.toByteArray();
baos.write(imBytes);
writer = SequenceFile.createWriter(conf, writer.file(new Path(imgPath)), writer.keyClass(Text.class),writer.valueClass(BytesWritable.class));
writer.append(new Text(imgPath), new BytesWritable(imBytes));

Essentially I took a BufferedImage generated by the program, wrote it to a byte array as a PNG, then wrote it to SequenceFile.

[Edit] I've looked through the SequenceFile source code and there is a function called getSync()? I think it is private though so I'm not sure how I'd use it.

Copyright License:
Author:「dcs」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/62886095/how-do-i-find-the-value-in-a-hadoop-sequencefile

About “How do I find the value in a Hadoop SequenceFile?” questions

I have written some binary image data to a Hadoop SequenceFile and would like to write it out as a PNG outside of Hadoop, if possible, using Java. [Edited] Overview of the data flow: Input files →
Currently I use the following code to append to an existing SequenceFile: // initialize sequence writer Writer writer = SequenceFile.createWriter( FileContext.getFileContext(this.conf), ...
The Hadoop SequenceFile is basically a collection of key/value pairs. In my application, I need to consume events from Kafka and handle the possible duplicates. Can I use SequenceFile for deduplica...
I am trying to read a sequencefile in hadoop 2.0 but I am unable to achieve it. I am using the below code which works perfectly fine in hadoop 1.0. Please let me know if I am missing something wrt ...
I'm creating a HashMap of key value pairs of a Hadoop Vector that is stored inside a SequenceFile. For efficiency purposes I want to know how long the Vector of key value pairs is so that I can
I try to build nodejs server which collect binary data from user and storing it to Hadoop sequencefile. As a good tutorial, there's approach using the Hadoop executable. My question: Is there java...
I read the SequenceFile.java in hadoop-1.0.4 source codes. And I find the sync(long) method which is used to find a "sync marker" (a 16 bytes MD5 when generated at file creation time) in SequenceFile
I used the Hbase Export utility tool to export a hbase table into HDFS as a SequenceFile. And now I want to use a mapreduce job to process this file: public class MapSequencefile { public
I am looking for an example which is using the new API to read and write Sequence Files. Effectively I need to know how to use these functions createWriter(Configuration conf, org.apache.hadoop.io.
I'm thinking to use a SequenceFile as "a little database" to store small files. I need that concurrency-client could store small file in this SequenceFile and retrieve an unique id (key of the reco...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.