How to solve this problem in MapReduce in Hadoop?

2020-11-13T16:51:08

I am mastering the computing paradigm MapReduce in the Hadoop environment. I created two Python files containing a transformer and a reducer.

with open('mapper_hadoop.py', 'w') as fh:
    fh.write("""#!/usr/bin/env python

import sys

for line in sys.stdin:
    print "chars", len(line.rstrip('\\n'))
    print "words", len(line.split())
    print "lines", 1
    """)
with open('reducer_hadoop.py', 'w') as fh:
    fh.write("""#!/usr/bin/env python

import sys

counts = {"chars": 0, "words": 0, "lines": 0}
for line in sys.stdin:
    kv = line.rstrip().split()
    counts[kv[0]] += int(kv[1])

for k,v in counts.items():
    print k,v
    """)

I set file permissions and then tried to run these two scripts locally without using Hadoop. The scripts turned out to be working. After that, I wanted to use the MapReduce computational paradigm in Hadoop to see the output of two scenarios. In Jupyter, I ran the following code:

!hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.4.jar \
        -files mapper_hadoop.py,reducer_hadoop.py \
        -mapper mapper_hadoop.py -reducer reducer_hadoop.py \
        -input hadoop_git_readme.txt \
        -output /tmp/mr.out

As a result, I got the following error:

packageJobJar: [/tmp/hadoop-unjar8018849492984741801/] [] /tmp/streamjob7225804067539527250.jar tmpDir=null
20/11/12 18:44:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/11/12 18:44:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/11/12 18:44:05 INFO mapred.FileInputFormat: Total input paths to process : 1
20/11/12 18:44:05 INFO mapreduce.JobSubmitter: number of splits:2
20/11/12 18:44:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1605187077393_0005
20/11/12 18:44:06 INFO impl.YarnClientImpl: Submitted application application_1605187077393_0005
20/11/12 18:44:06 INFO mapreduce.Job: The url to track the job: http://sparkbox:8088/proxy/application_1605187077393_0005/
20/11/12 18:44:06 INFO mapreduce.Job: Running job: job_1605187077393_0005
20/11/12 18:44:17 INFO mapreduce.Job: Job job_1605187077393_0005 running in uber mode : false
20/11/12 18:44:17 INFO mapreduce.Job:  map 0% reduce 0%
20/11/12 18:44:17 INFO mapreduce.Job: Job job_1605187077393_0005 failed with state FAILED due to: Application application_1605187077393_0005 failed 2 times due to AM Container for appattempt_1605187077393_0005_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://sparkbox:8088/proxy/application_1605187077393_0005/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1605187077393_0005_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
20/11/12 18:44:17 INFO mapreduce.Job: Counters: 0
20/11/12 18:44:17 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

What is the problem? The log page "http: // sparkbox: 8088 / proxy / application_1605187077393_0005 / Then" does not open.

Copyright License:
Author:「aspcartman111」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/64817902/how-to-solve-this-problem-in-mapreduce-in-hadoop

About “How to solve this problem in MapReduce in Hadoop?” questions

I am mastering the computing paradigm MapReduce in the Hadoop environment. I created two Python files containing a transformer and a reducer. with open('mapper_hadoop.py', 'w') as fh: fh.write(...
I've got an application that runs in an AWS EMR hadoop cluster with mapreduce. We're trying to modernize some of the tools we're using to build this application, and the current build environment i...
How can load balancing be handled in Hadoop mapreduce? I am writing a distributed application in which the server distributes jobs to worker nodes based on a benchmark test, memory available, numbe...
I am following this hadoop mapreduce tutorial given by Apache. The Java code given there uses these Apache-hadoop classes: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.f...
I wanted to run a MapReduce-Job on my FreeBSD-Cluster with two nodes but I get the following Exception 14/08/27 14:23:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
With reference to the basic WordCount example: https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html I know that HDFS divide files in bl...
Ok, I am attempting to learn Hadoop and mapreduce. I really want to start with mapreduce and what I find are many, many simplified examples of mappers and reducers, etc. However, I seen to be missing
I'm using HADOOP on a VM. When I try to run a jar, the execution stops because is unable to find the file resource-type.xml. How can I solve this? Thank you. gaia@gaia-virtual-machine:~/hadoop-3.3....
I know this question might have already been answered but i havn't found a proper answer. I am using hadoop mapreduce on eclipse and i want to create an executable jar to put it on a linux server ...
I am trying to add another function to the code provided by hadoop apache https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Partition...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.