python - Pandas: creating dataframe rows from other dataframe information -


i'm working aggregated data, need dis-aggregate in order process further. original df contains value 'no. of students' per row , need 1 row in new df per student:

original df:

                faculty   faculty b   faculty x male students           2           7       ... female students         4           3       ... 

new df:

 no.           gender  faculty   ...  1             m        2             m        3             f       

and on. original df contains more information (like nationality , regional info), dealt same way gender, etc. i'd start transposing (df.t), fun begins... i'm quite beginner, pointer welcome.

i think easiest way "disaggregate" data use generator expression enumerate desired rows:

(key key, val in series.iteritems() in range(val)) 

import pandas pd  df = pd.dataframe({'faculty a': [2,4], 'faculty b':[7,3]},                    index=['male students', 'female students']) df.columns = [re.sub(r'faculty ', '', col) col in df.columns] df.index = ['m', 'f'] series = df.stack() df = pd.dataframe(     (key key, val in series.iteritems() in range(val)),     columns=['gender','faculty']) 

yields

   gender faculty 0       m       1       m       2       m       b 3       m       b 4       m       b 5       m       b 6       m       b 7       m       b 8       m       b 9       f       10      f       11      f       12      f       13      f       b 14      f       b 15      f       b 

ps. above shows possible "disaggregate" data, sure want that? disaggregation seems rather inefficient. if 1 of values million, end million duplicate rows...

instead of disaggregating, might better off finding way perform computation on aggregated data.


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -