hadoop - Loading my own python modules for Pig UDFs on Amazon EMR -
i trying call 2 of own modules pig.
here's module_one.py:
import sys print sys.path def foo(): pass
here's module_two.py:
from module_one import foo def bar(): foo()
i got both of them s3.
here's when trying import them pig:
2015-06-14 12:12:10,578 [main] info org.apache.pig.main - apache pig version 0.12.0-amzn-2 (rexported) compiled may 05 2015, 19:03:23 2015-06-14 12:12:10,579 [main] info org.apache.pig.main - logging error messages to: /mnt/var/log/apps/pig.log 2015-06-14 12:12:10,620 [main] info org.apache.pig.impl.util.utils - default bootup file /home/hadoop/.pigbootup not found 2015-06-14 12:12:11,277 [main] info org.apache.hadoop.conf.configuration.deprecation - mapred.job.tracker deprecated. instead, use mapreduce.jobtracker.address 2015-06-14 12:12:11,279 [main] info org.apache.hadoop.conf.configuration.deprecation - fs.default.name deprecated. instead, use fs.defaultfs 2015-06-14 12:12:11,279 [main] info org.apache.pig.backend.hadoop.executionengine.hexecutionengine - connecting hadoop file system at: hdfs://1.1.1.1:9000 2015-06-14 12:12:12,794 [main] info org.apache.hadoop.conf.configuration.deprecation - fs.default.name deprecated. instead, use fs.defaultfs grunt> register 's3://mybucket/pig/module_one.py' using jython m1; 2015-06-14 12:12:15,177 [main] info org.apache.hadoop.conf.configuration.deprecation - fs.default.name deprecated. instead, use fs.defaultfs 2015-06-14 12:12:17,457 [main] info com.amazon.ws.emr.hadoop.fs.emrfilesystem - consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.s3nativefilesystem filesystem implementation 2015-06-14 12:12:17,889 [main] info amazon.emr.metrics.metricssaver - metricsconfigrecord disabledincluster: false instanceenginecyclesec: 60 clusterenginecyclesec: 60 disableclusterengine: false maxmemorymb: 3072 maxinstancecount: 500 2015-06-14 12:12:17,889 [main] info amazon.emr.metrics.metricssaver - created metricssaver j-5g45fr7n987g:i-a95a5379:runjar:03073 period:60 /mnt/var/em/raw/i-a95a5379_20150614_runjar_03073_raw.bin 2015-06-14 12:12:18,633 [main] info com.amazon.ws.emr.hadoop.fs.s3n.s3nativefilesystem - opening 's3://mybucket/pig/module_one.py' reading 2015-06-14 12:12:18,661 [main] info amazon.emr.metrics.metricssaver - thread 1 created metricslockfreesaver 1 2015-06-14 12:12:18,743 [main] info org.apache.pig.scripting.jython.jythonscriptengine - created tmp python.cachedir=/tmp/pig_jython_4599752347759040376 2015-06-14 12:12:21,060 [main] warn org.apache.pig.scripting.jython.jythonscriptengine - pig.cmd.args.remainders empty. not expected unless on testing. ['/home/hadoop/.versions/pig-0.12.0-amzn-2/lib/lib', '/home/hadoop/.versions/pig-0.12.0-amzn-2/lib/jython-standalone-2.5.3.jar/lib', 'classpath', 'pyclasspath/', '/home/hadoop'] 2015-06-14 12:12:21,142 [main] info org.apache.pig.scripting.jython.jythonscriptengine - register scripting udf: m1.foo
grunt> register 's3://mybucket/pig/module_two.py' using jython m2; 2015-06-14 12:12:33,870 [main] info org.apache.hadoop.conf.configuration.deprecation - fs.default.name deprecated. instead, use fs.defaultfs 2015-06-14 12:12:33,918 [main] info org.apache.hadoop.conf.configuration.deprecation - fs.default.name deprecated. instead, use fs.defaultfs 2015-06-14 12:12:34,020 [main] info com.amazon.ws.emr.hadoop.fs.s3n.s3nativefilesystem - opening 's3://mybucket/pig/module_two.py' reading 2015-06-14 12:12:34,064 [main] warn org.apache.pig.scripting.jython.jythonscriptengine - pig.cmd.args.remainders empty. not expected unless on testing. 2015-06-14 12:12:34,621 [main] error org.apache.pig.tools.grunt.grunt - error 1121: python error. traceback (most recent call last): file "/tmp/pig1436120267849453375tmp/module_two.py", line 1, in module_one import foo importerror: no module named module_one details @ logfile: /mnt/var/log/apps/pig.log
i tried:
the usual
sys.path.append('./lib')
,sys.path.append('.')
, didn't helphacking folder location
sys.path.append(os.path.dirname(__file__))
gotnameerror: name '__file__' not defined
creating
__init__.py
, loading registersys.path.append('s3://mybucket/pig/')
didn't work either.
i'm using apache pig version 0.12.0-amzn-2
since that's 1 apparently can selected.
you importing first python udf m1
, therefore should access namespace m1.foo()
, , not module_one
.
edit : second python file should :
from m1 import foo def bar(): foo()
i tested on amazon emr , works.
Comments
Post a Comment