scala - Avoid RDD nested in Spark without Array -
i've big problem!
i have rdd[(int, vector)]
, int
sort of label.
for example :
(0, (a,b,c) ); (0, (d,e,f) ); (1, (g,h,i) )
etc...
now, need use rdd(i call myrdd ) :
myrdd.map{ case(l,v) => myrdd.map { case(l_, v_) => compare(v, v_) } }
now, know it's impossible in spark use rdd nested.
i can bypass problem using array. problem can't use array, or goes in memory.
how resolve problem without using array?
thanks in advance!!!
cartesian
sounds should work:
myrdd.cartesian(myrdd).map{ case ((_,v),(_,v_)) => compare(v,v_) }
Comments
Post a Comment