apache spark - How do I flatMap a row of arrays into multiple rows? -


after parsing jsons have one-column dataframe of arrays

scala> val jj =sqlcontext.jsonfile("/home/aahu/jj2.json") res68: org.apache.spark.sql.dataframe = [r: array<bigint>] scala> jj.first() res69: org.apache.spark.sql.row = [list(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)] 

i'd explode each row out several rows. how?

edit:

original json file:

{"r": [0,1,2,3,4,5,6,7,8,9]} {"r": [0,1,2,3,4,5,6,7,8,9]} 

i want rdd or dataframe 20 rows.

i can't use flatmap here - i'm not sure appropriate command in spark is:

scala> jj.flatmap(r => r) <console>:22: error: type mismatch;  found   : org.apache.spark.sql.row  required: traversableonce[?]               jj.flatmap(r => r) 

you can use dataframe.explode achieve desire. below tried in spark-shell sample json data.

import scala.collection.mutable.arraybuffer val jj1 = jj.explode("r", "r1") {list : arraybuffer[long] => list.tolist } val jj2 = jj1.select($"r1") jj2.collect 

you can refer api documentation understand more dataframe.explode


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -