apache spark - How do I flatMap a row of arrays into multiple rows? -
after parsing jsons have one-column dataframe of arrays
scala> val jj =sqlcontext.jsonfile("/home/aahu/jj2.json") res68: org.apache.spark.sql.dataframe = [r: array<bigint>] scala> jj.first() res69: org.apache.spark.sql.row = [list(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]
i'd explode each row out several rows. how?
edit:
original json file:
{"r": [0,1,2,3,4,5,6,7,8,9]} {"r": [0,1,2,3,4,5,6,7,8,9]}
i want rdd or dataframe 20 rows.
i can't use flatmap here - i'm not sure appropriate command in spark is:
scala> jj.flatmap(r => r) <console>:22: error: type mismatch; found : org.apache.spark.sql.row required: traversableonce[?] jj.flatmap(r => r)
you can use dataframe.explode
achieve desire. below tried in spark-shell sample json data.
import scala.collection.mutable.arraybuffer val jj1 = jj.explode("r", "r1") {list : arraybuffer[long] => list.tolist } val jj2 = jj1.select($"r1") jj2.collect
you can refer api documentation understand more dataframe.explode
Comments
Post a Comment