Hadoop streaming with private python interpreter

2015-11-04T01:11:38

I am trying to use Hadoop streaming with a private python interpreter (Hortonworks data platform 2.2.0). The python interpreter is private in the sense that it is a virtual environment interpreter in a home directory and only the specific user account has permission to run it.

I am specifying the python interpreter in the hashbang line. My streaming job works with the system python or with #!/usr/bin/env python. However, it produces a permission denied error when I use the private python interpreter: #!/home/dmazur/test/tempenv/bin/python

Here is a segment of the output that shows the error message:

15/11/03 11:31:13 INFO mapreduce.Job:  map 0% reduce 0%
15/11/03 11:31:22 INFO mapreduce.Job: Task Id : attempt_1440596114865_0249_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
    ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
Caused by: java.io.IOException: Cannot run program "/gs/hadoop/yarn/local/lm-2r01-n10/usercache/dmazur/appcache/application_1440596114865_0249/container_1440596114865_0249_01_000002/./mapper_mean.py": error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
    ... 23 more
Caused by: java.io.IOException: error=13, Permission denied
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
    ... 24 more

I believe the problem is with permissions on the python interpreter and not with the mapper_mean.py file. When the hashbang line is changed without changing the permissions on the file itself, the job runs fine. I imagine this means that the MapReduce job is run by a daemon process owned by another user. I haven't seen anything in the documentation about how to use a private interpreter for Hadoop streaming. Is it possible? If so, what permissions need to be set to let it run?

Copyright License:
Author:「Dan Mazur」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/33505172/hadoop-streaming-with-private-python-interpreter

About “Hadoop streaming with private python interpreter” questions

I am trying to use Hadoop streaming with a private python interpreter (Hortonworks data platform 2.2.0). The python interpreter is private in the sense that it is a virtual environment interpreter ...
I have established a basic hadoop master slave cluster setup and able to run mapreduce programs (including python) on the cluster. Now I am trying to run a python code which accesses a C binary a...
I'm trying to get map-reduce functionality with python using mongo-hadoop. Hadoop is working, hadoop streaming is working with python and the mongo-hadoop adaptor is working. However, the mongo-had...
I'm trying to run a Map-Reduce job on Hadoop Streaming with Python scripts, and It work fines when I use jupyter terminal. But when I run the following ./bin/hadoop jar /usr/local/hadoop/share/hadoop/
Does Hadoop officially support streaming with binary formats as of 0.21? The hadoop-streaming.jar accepts an inputFormat that is a Java class name. How do you provide the Hadoop streaming job thi...
im trying to implement an algorithm in hadoop. i tried to execute part of the code in hadoop but streaming job fails $ /home/hadoop/hadoop/bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -...
I have a mapreduce job written in Python. The program was tested successfully in linux env but failed when I run it under Hadoop. Here is the job command: hadoop jar $HADOOP_HOME/contrib/streaming/
I have a large scale log processing problem that I have to run on a hadoop cluster. The task is to feed each line of the log into a executable "cmd" and check the result to decide whether to keep t...
I have a hadoop streaming job. This job makes use of a python script which imports another python script. The command works fine from the command line but fails when using hadoop streaming. Here...
I am running a python script with hadoop streaming. I have both python 2.7 and anaconda installed. When I run the hadoop stream with python script using #!/usr/bin/env python It works fine. But ...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.