How to transform a tabular data into transactions in spark(scala)? -
i have order transaction dataset, looks following table
1,john,iphone cover,9.99 2,jack,iphone cover,9.99 4,jill,samsung galaxy cover,9.95 3,john,headphones,5.49 5,bob,ipad cover,5.45
i considering grouping data within differences different transactions. example, group product 1,2,4 transaction list list(1,2,4) absolute differences in price less 1. , on other hand, put product 3, 5 same transactions list(3,5).
i know can in python following code:
f = open('test.csv', 'r') current_price = 0 res = [] ary = [] id, line in enumerate(f.readlines()): dt = line.strip().split(',') if id ==0: current_price = float(dt[3]) if abs(float(dt[3]) - current_price) < 1: ary.append(dt[0]) else: res.append(ary) current_price = float(dt[3]) ary = [dt[0]] res.append(ary) print res
but scala functional programming language, how achieve same goal functional programming style?
something this:
val xs = input.map(_.split(",")) //list(array(1, john, iphone cover, 9.99), // array(2, jack, iphone cover, 9.99), // array(4, jill, samsung galaxy cover, 9.95), // array(3, john, headphones, 5.49), // array(5, bob, ipad cover, 5.45)) xs.tail.foldleft((xs.head(3), list(list(xs.head(0))))) { case ((cur, acc), e) => if (math.abs(cur.todouble - e(3).todouble) < 1.0) (cur, (acc.head :+ e(0)) :: acc.tail) else (e(3), list(e(0)) :: acc) }._2.reverse //list(list(1, 2, 4), list(3, 5))
we pass each iteration current price of current group, , list of groups far. if current price close enough next price, add id current group. otherwise, start new group next element, , change current price price that.
looks more complex is. if doing real, i'd below - define case class hold values of each line, , method "close enough price".
case class line(id: int, person: string, product: string, price: double) { def closeenough(other: line) = (math.abs(price - other.price) < 1.0) }
then make objects lines
val xs = input.map { l => val xs = l.split(","); line(xs(0).toint, xs(1), xs(2), xs(3).todouble) } // list(line(1,john,iphone cover,9.99), // line(2,jack,iphone cover,9.99), // line(4,jill,samsung galaxy cover,9.95), // line(3,john,headphones,5.49), // line(5,bob,ipad cover,5.45))
now fold, work lines
val groups = xs.tail.foldleft(list(list(xs.head))) { case (acc, e) => if (e.closeenough(acc.head.head)) (acc.head :+ e) :: acc.tail else list(e) :: acc }.reverse
and if need to, convert lists of lists of ids
groups.map(_.map(_.id)) // list(list(1, 2, 4), list(3, 5))
Comments
Post a Comment