How to change memory in EMR hadoop streaming job

2014-06-07T07:19:27

I am trying to overcome the following error in a hadoop streaming job on EMR.

Container [pid=30356,containerID=container_1391517294402_0148_01_000021] is running beyond physical memory limits

I tried searching for answers but the one I found isn't working. My job is launched as shown below.

hadoop jar ../.versions/2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar \
 -input  determinations/part-00000 \
 -output  determinations/aggregated-0 \
 -mapper cat \
 -file ./det_maker.py \
 -reducer det_maker.py \
 -Dmapreduce.reduce.java.opts="-Xmx5120M"

The last line above is supposed to do the trick as far as I understand, but I get the error:

ERROR streaming.StreamJob: Unrecognized option: -Dmapreduce.reduce.java.opts="-Xmx5120M"

What is the correct way change the memory usage ? Also is there some documentation that explains these things to n00bs like me?

Copyright License:
Author:「user3394040」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/24091973/how-to-change-memory-in-emr-hadoop-streaming-job

About “How to change memory in EMR hadoop streaming job” questions

I am trying to overcome the following error in a hadoop streaming job on EMR. Container [pid=30356,containerID=container_1391517294402_0148_01_000021] is running beyond physical memory limits I t...
TL;DR How I can upload or specify additional JARs to an Hadoop Streaming Job on Amazon Elastic MapReduce (Amazon EMR)? Long version I want to analyze a set of Avro files (> 2000 files) using Had...
Attempt to run EMR Streaming job frequently fails with: 2014-10-15 18:36:36,560 ERROR [main] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[main,5,main] threw an Exception. jav...
I have been attempting to use Hadoop streaming in Amazon EMR to do a simple word count for a bunch of text files. In order to get a handle on hadoop streaming and on Amazon's EMR I took a very simp...
This previous question addressed how to import modules such as nltk for hadoop streaming. The steps outlined were: zip -r nltkandyaml.zip nltk yaml mv ntlkandyaml.zip /path/to/where/your/mapper/w...
I have setup an emr step in AWS datapipeline. The step command looks like this: /usr/lib/hadoop-mapreduce/hadoop-streaming.jar,-input,s3n://input-bucket/input-file,-output,s3://output/output-dir,-
I am using AWS EMR-6.5.0 with Hadoop-3.2.1 I'm following this guide to launch the stream job: https://levelup.gitconnected.com/map-reduce-with-python-hadoop-on-aws-emr-341bdd07b804 When I run the c...
I need to check how much memory my job/task is using in hadoop. There are 2 jobs listed in http://<my emr machine url>/cluster/apps/RUNNING. I see Memory used (=16.5 GB) and memory total (=33...
I ran several streaming spark jobs and batch spark jobs in the same EMR cluster. Recently, one batch spark job is programmed wrong, which consumed a lot of memory. It causes the master node not res...
Hello I am trying to run Hadoop Streaming job using Python in EMR 4.7.2 with command as follows: hadoop-streaming -archives s3://mybucket/scripts/HDP/python_scripts/py.tgz -mapper py.tgz/processR...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.