hadoop - Get the count through iterate over Data Bag but condition should be different count for each value associated to that field -


below data have , schema same is- student_name, question_number, actual_result(either - false/correct)

(b,q1,correct) (a,q1,false) (b,q2,correct) (a,q2,false) (b,q3,false) (a,q3,correct) (b,q4,false) (a,q4,false) (b,q5,flase) (a,q5,false) 

what want count each student i.e. a/b total correct , false answer he/she has made.

for use case shared, below pig script suffice.

pig script :

student_data = load 'student_data.csv' using pigstorage(',') (student_name:chararray, question_number:chararray, actual_result:chararray); student_data_grp = group student_data student_name; student_correct_answer_data = foreach student_data_grp {     answers = student_data.actual_result;     correct_answers = filter answers actual_result=='correct';     incorrect_answers = filter answers actual_result=='false';     generate group student_name, count(correct_answers) correct_ans_count, count(incorrect_answers) incorrect_ans_count ; }; 

input : student_data.csv :

b,q1,correct a,q1,false b,q2,correct a,q2,false b,q3,false a,q3,correct b,q4,false a,q4,false b,q5,false a,q5,false 

output : dump kpi:

-- schema : (student_name, correct_ans_count, incorrect_ans_count) (a,1,4) (b,2,3) 

ref : more details on nested each

  1. http://pig.apache.org/docs/r0.12.0/basic.html#foreach
  2. http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach

Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -