apache pig - Count the grouped records in pig query -
below test data.
john,q1,correct jack,q1,wrong john,q2,correct jack,q2,wrong john,q3,wrong jack,q3,correct john,q4,wrong jack,q4,wrong john,q5,wrong jack,q5,wrong i want find below:
john wrong 4 john correct 1 jack wrong 3 jack correct 2 my code:
data = load '/stackoverflowq4.txt' using pigstorage(',') ( name:chararray, number:chararray, result:chararray); b = group data (name,result); now out put looks below:
((john,wrong),{(john,q5,wrong),(john,q4,wrong),(john,q2,wrong),(john,q1,wrong)}) ((john,correct),{(john,q3,correct)}) ((jack,wrong),{(jack,q5,wrong),(jack,q4,wrong),(jack,q3,wrong)}) ((jack,correct),{(jack,q2,correct),(jack,q1,correct)}) how should calculate count grouped records.
the count function give number of elements in bag, want. after grouping user , result, end bag number of times each combination appeared.
therefore, have add 1 line:
data = load '/stackoverflowq4.txt' using pigstorage(',') ( name:chararray, number:chararray, result:chararray); b = group data (name,result); c = foreach b generate flatten(group) (name,result), count(data) count; dump d; (jack,wrong,4) (jack,correct,1) (john,wrong,3) (john,correct,2) the flatten(group) because after grouping, tuple containing elements grouped generated, , looks of want output don't want inside tuple, output ((jack,wrong),4).
Comments
Post a Comment