hadoop - Get the count through iterate over Data Bag but condition should be different count for each value associated to that field -
below data have , schema same is- student_name, question_number, actual_result(either - false/correct)
(b,q1,correct) (a,q1,false) (b,q2,correct) (a,q2,false) (b,q3,false) (a,q3,correct) (b,q4,false) (a,q4,false) (b,q5,flase) (a,q5,false)
what want count each student i.e. a/b total correct , false answer he/she has made.
for use case shared, below pig script suffice.
pig script :
student_data = load 'student_data.csv' using pigstorage(',') (student_name:chararray, question_number:chararray, actual_result:chararray); student_data_grp = group student_data student_name; student_correct_answer_data = foreach student_data_grp { answers = student_data.actual_result; correct_answers = filter answers actual_result=='correct'; incorrect_answers = filter answers actual_result=='false'; generate group student_name, count(correct_answers) correct_ans_count, count(incorrect_answers) incorrect_ans_count ; };
input : student_data.csv :
b,q1,correct a,q1,false b,q2,correct a,q2,false b,q3,false a,q3,correct b,q4,false a,q4,false b,q5,false a,q5,false
output : dump kpi:
-- schema : (student_name, correct_ans_count, incorrect_ans_count) (a,1,4) (b,2,3)
ref : more details on nested each
Comments
Post a Comment