How to transform a tabular data into transactions in spark(scala)? -


i have order transaction dataset, looks following table

1,john,iphone cover,9.99 2,jack,iphone cover,9.99  4,jill,samsung galaxy cover,9.95 3,john,headphones,5.49 5,bob,ipad cover,5.45 

i considering grouping data within differences different transactions. example, group product 1,2,4 transaction list list(1,2,4) absolute differences in price less 1. , on other hand, put product 3, 5 same transactions list(3,5).

i know can in python following code:

f = open('test.csv', 'r') current_price = 0 res = [] ary = [] id, line in enumerate(f.readlines()):     dt = line.strip().split(',')     if id ==0:         current_price = float(dt[3])     if abs(float(dt[3]) - current_price) < 1:         ary.append(dt[0])     else:         res.append(ary)         current_price = float(dt[3])         ary = [dt[0]] res.append(ary) print res 

but scala functional programming language, how achieve same goal functional programming style?

something this:

val xs = input.map(_.split(","))  //list(array(1, john, iphone cover, 9.99), //     array(2, jack, iphone cover, 9.99), //     array(4, jill, samsung galaxy cover, 9.95), //     array(3, john, headphones, 5.49), //     array(5, bob, ipad cover, 5.45))  xs.tail.foldleft((xs.head(3), list(list(xs.head(0))))) {   case ((cur, acc), e) =>     if (math.abs(cur.todouble - e(3).todouble) < 1.0)       (cur, (acc.head :+ e(0)) :: acc.tail)     else (e(3), list(e(0)) :: acc) }._2.reverse //list(list(1, 2, 4), list(3, 5)) 

we pass each iteration current price of current group, , list of groups far. if current price close enough next price, add id current group. otherwise, start new group next element, , change current price price that.

looks more complex is. if doing real, i'd below - define case class hold values of each line, , method "close enough price".

case class line(id: int, person: string, product: string, price: double) {     def closeenough(other: line) = (math.abs(price - other.price) < 1.0)   } 

then make objects lines

val xs = input.map { l => val xs = l.split(","); line(xs(0).toint, xs(1), xs(2), xs(3).todouble) } // list(line(1,john,iphone cover,9.99), //      line(2,jack,iphone cover,9.99), //      line(4,jill,samsung galaxy cover,9.95), //      line(3,john,headphones,5.49), //      line(5,bob,ipad cover,5.45)) 

now fold, work lines

val groups = xs.tail.foldleft(list(list(xs.head))) {   case (acc, e) =>     if (e.closeenough(acc.head.head))       (acc.head :+ e) :: acc.tail     else list(e) :: acc }.reverse 

and if need to, convert lists of lists of ids

groups.map(_.map(_.id)) // list(list(1, 2, 4), list(3, 5)) 

Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -