Are Hadoop and Map/Reduce useful for BIG parallel processes?

2015-11-17T02:35:26

I have a superficial understanding of Hadoop and Map/Reduce. I see it can be useful for running many instances of small independent processes. But can I use this infrastructure (with its fault tolerance, scalability and ease of use) to run BIG independent processes?

Let's say I want to run certain analysis of the status of the clients of my company (600), and this analysis requires about 1 min of process, accessing a variety of static data, but the analysis of one client is not related to the others. So now I have 10 hs of centralized processing, but if I can distribute this processing in 20 nodes, I can expect to finish it in about half hour (plus some overhead due to replication of data). And if I can rent 100 nodes in Amazon EC2 for an affordable price, it will be done in about 6 minutes and that will change radically the usability of my analysis.

Is Hadoop the right tool to solve my problem? Can it run big Mapper processes that take 1 min each? If not, where should I look?

Thanks in advance.

Copyright License:
Author:「Andrés Parra」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/33742240/are-hadoop-and-map-reduce-useful-for-big-parallel-processes

About “Are Hadoop and Map/Reduce useful for BIG parallel processes?” questions

I have a superficial understanding of Hadoop and Map/Reduce. I see it can be useful for running many instances of small independent processes. But can I use this infrastructure (with its fault tole...
I manage a small team of developers and at any given time we have several on going (one-off) data projects that could be considered "Embarrassingly parallel" - These generally involve running a sin...
Our organization has hundreds of batch jobs that run overnight. Many of these jobs require 2, 3, 4 hours to complete; some even require up to 7 hours. Currently, these jobs run in single-threaded m...
For internal reason, I'd like Hadoop to instantiate Reduce tasks together with Map tasks in parallel. I mean, Hadoop should not wait for some progress from Map tasks to start Reduce tasks. I've tr...
I am trying to understand the integration between SAS and Hadoop. From what I understand, SAS processes like proc sql can only work against a SAS data set, I cannot issue proc sql against a text fi...
We are trying to migrate our jobs to Hadoop 2 (Hadoop 2.8.1, single node cluster, to be precise) from Hadoop 1.0.3. We are using YARN to manage our map-reduce jobs. One of the differences that we h...
Ive seen big data queuing jobs that are performant for real time work because they produce data that is readily consumed. Map/Reduce jobs (hadoop) are performant for a different reason : they are
I have read a lot about Hadoop and Map-Reduce running on clusters of machines. Does some one know if the Apache distribution can be run on an SMP with several cores. In particular, can multiple Map-
I am seeing the following error when I try to process big file like size > 35GB files, but doesn't happen when I try less big file like size < 10GB . App > Error: org.apache.hadoop.mapreduce...
I have to run in hadoop 1.0.4 many (maybe 12) jobs. I want tha five first to run in parallel, and when all finish to run 4 other jobs in parallel and at last to run the last 3 again to run in paral...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.