java - Spark NotSerializableException -
in spark code, attempting create indexedrowmatrix csv file. however, following error:
exception in thread "main" org.apache.spark.sparkexception: task not serializable ... caused by: java.io.notserializableexception: org.apache.spark.api.java.javasparkcontext
here code:
sc = new javasparkcontext("local", "app", "/srv/spark", new string[]{"target/app.jar"}); javardd<string> csv = sc.textfile("data/matrix.csv").cache(); javardd<indexedrow> entries = csv.zipwithindex().map( new function<scala.tuple2<string, long>, indexedrow>() { /** * **/ private static final long serialversionuid = 4795273163954440089l; @override public indexedrow call(tuple2<string, long> tuple) throws exception { string line = tuple._1; long index = tuple._2; string[] strings = line.split(","); double[] doubles = new double[strings.length]; (int = 0; < strings.length; i++) { doubles[i] = double.parsedouble(strings[i]); } vector v = new densevector(doubles); return new indexedrow(index, v); } });
i had same issue. drove me around twist. java restriction anonymous instances , serializability. solution declare anonymous instance of function named static class implements serializable , instantiate it. declared functions library outer class included static inner class definitions of functions wanted use.
of course, if write in scala, 1 file neater code, not going in instance.
Comments
Post a Comment