hadoop - How to package python script with dependencies into zip/tar? -
i've got hadoop cluster i'm doing data analytics on using numpy, scipy, , pandas. i'd able submit hadoop jobs zip/tar file using '--file ' argument command. zip file should have python program needs execute such no matter node script executes on in cluster, won't run importerror @ runtime.
due company policies, installing these libraries on every node isn't feasible, exploratory/agile development. have pip , virtualenv installed create sandboxes needed though.
i've looked @ zipimport , python packaging none of seems fulfill needs/i'm having difficulty using these tools.
has had luck doing this? can't seem find success stories online.
thanks!
i have solved similar problem in apache spark , python context creating docker image has needed python libraries , spark slave script installed. image distributed other machines , when container started auto-joins to cluster, have 1 such image / host machine.
our ever-changing python project submitted zip file along job , imports work transparently form there. luckily need re-create slave images , don't run jobs conflicting requirements.
i'm not sure how applicable in scenario, since (in understanding) python libraries must compiled.
Comments
Post a Comment