PySpark reduceByKey on multiple values -
if have k,v pair like:
(k, (v1, v2)) (k, (v3, v4))
how can sum values such (k, (v1 + v3, v2 + v4))
?
reducebykey supports functions. lets array of key-value pairs.
output = a.reducebykey(lambda x, y: x[0]+y[0], x[1]+y[1])
Comments
Post a Comment