Split reduced data into output and new input in Hadoop

2013-01-14T00:15:47

I've been looking around for days trying to find a way using reduced data for further mapping in hadoop. I've got objects of class A as input data and objects of class B as output data. The Problem is, that while mapping not only Bs are generated but new As as well.

Here's what I'd like to achieve:

1.1 input: a list of As
1.2 map result: for each A a list of new As and a list of Bs is generated
1.3 reduce: filtered Bs are saved as output, filtered As are added to the map jobs

2.1 input: a list of As produced by the first map/reduce
2.2 map result: for each A a list of new As and a list of Bs is generated
2.3 ...

3.1 ...

You should get the basic idea.

I've read a lot about chaining but I'm not sure how to combine ChainReducer and ChainMapper or even if this would be the right approach.

So here's my question: How can I split the mapped data while reducing to save one part as output and the other part as new input data.

Copyright License:
Author:「Mennny」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/14305351/split-reduced-data-into-output-and-new-input-in-hadoop

About “Split reduced data into output and new input in Hadoop” questions

I've been looking around for days trying to find a way using reduced data for further mapping in hadoop. I've got objects of class A as input data and objects of class B as output data. The Problem...
I am trying to learn MapReduce and doing this task. My input is as below(State, Sport, Amount(in USD)): California Football 69.09 California Swimming 31.5 Illinois Golf 8.31 Illinois Tennis 15.75
I deployed a 5 node hadoop MR cluster in Azure. I am using a bash script to perform chaining. I am using Hadoop streaming API, as my implementation is in Python. My input data is always in one fil...
This is my first implementation in Hadoop. I am trying to implement my algorithm for probabilistic dataset in Map Reduce. In my dataset, last column will have some id(number of unique id's in the d...
I am new to hadoop have few questions? which node will do split input data to multiple blocks? Find datanode based on shortpath . question is find shortpath between client vs datanode or datanode vs
I need to iterate over the input splits more than once. The reason I need this is beyond the scope of this question. Let's suppose I just need it (A brief explanation would be that I need to use the
What is the default size of input split in Hadoop. As I know default size of block is 64 MB. Is there any file in Hadoop jar in which we can see the default values of all such things ? like default
I'm writing an application in Java on Hadoop 1.1.1 (Ubuntu) that compares strings in order to find the longest common substrings. I've got both the map and reduce phases running successfully for sm...
I'm having a bit difficult in understanding in Hadoop, how the data put into the map and reduced functions. I know that we can define the input format and output format and then the key types for i...
I got the WordCount.java code from the internet and I tried to run it in eclipse after including the necessary libraries. But the code throws this exception: 2015-05-27 17:48:24,759 WARN util.

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.