python - Pandas: creating dataframe rows from other dataframe information -
i'm working aggregated data, need dis-aggregate in order process further. original df contains value 'no. of students' per row , need 1 row in new df per student:
original df:
faculty faculty b faculty x male students 2 7 ... female students 4 3 ...
new df:
no. gender faculty ... 1 m 2 m 3 f
and on. original df contains more information (like nationality , regional info), dealt same way gender, etc. i'd start transposing (df.t), fun begins... i'm quite beginner, pointer welcome.
i think easiest way "disaggregate" data use generator expression enumerate desired rows:
(key key, val in series.iteritems() in range(val))
import pandas pd df = pd.dataframe({'faculty a': [2,4], 'faculty b':[7,3]}, index=['male students', 'female students']) df.columns = [re.sub(r'faculty ', '', col) col in df.columns] df.index = ['m', 'f'] series = df.stack() df = pd.dataframe( (key key, val in series.iteritems() in range(val)), columns=['gender','faculty'])
yields
gender faculty 0 m 1 m 2 m b 3 m b 4 m b 5 m b 6 m b 7 m b 8 m b 9 f 10 f 11 f 12 f 13 f b 14 f b 15 f b
ps. above shows possible "disaggregate" data, sure want that? disaggregation seems rather inefficient. if 1 of values million, end million duplicate rows...
instead of disaggregating, might better off finding way perform computation on aggregated data.
Comments
Post a Comment