python - How to apply functions with multiple arguments on Pandas selected columns data frame -


i have following data frame:

import pandas pd  data = {'gene':['a','b','c','d','e'],         'count':[61,320,34,14,33],         'gene_length':[152,86,92,170,111]} df = pd.dataframe(data) df = df[["gene","count","gene_length"]] 

that looks this:

in [9]: df out[9]:   gene  count  gene_length 0        61          152 1    b    320           86 2    c     34           92 3    d     14          170 4    e     33          111 

what want apply function:

def calculate_rpkm(thec,then,thel):     """     thec  == total reads mapped feature (gene/linc)     thel  == length of feature (gene/linc)      == total reads mapped     """     rpkm = float((10**9) * thec)/(then * thel)     return rpkm 

on count , gene_length columns , constant n=12345 , name new result 'rpkm'. why failed?

n=12345 df["rpkm"] = calculate_rpkm(df['count'],n,df['gene_length']) 

what's right way it? first row should this:

 gene  count  gene_length rpkm        61          152  32508.366 

update: error got this:

-------------------------------------------------------------------------- typeerror                                 traceback (most recent call last) <ipython-input-4-6270e1d19b89> in <module>() ----> 1 df["rpkm"] = calculate_rpkm(df['count'],n,df['gene_length'])  <ipython-input-1-48e311ca02f3> in calculate_rpkm(thec, then, thel)      13      == total reads mapped      14     """ ---> 15     rpkm = float((10**9) * thec)/(then * thel)      16     return rpkm  /u21/coolme/.anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in wrapper(self)      74             return converter(self.iloc[0])      75         raise typeerror( ---> 76             "cannot convert series {0}".format(str(converter)))      77     return wrapper      78 

don't cast float in method , work fine:

in [9]: def calculate_rpkm(thec,then, thel):     """     thec  == total reads mapped feature (gene/linc)     thel  == length of feature (gene/linc)      == total reads mapped     """     rpkm = ((10**9) * thec)/(then * thel)     return rpkm n=12345 df["rpkm"] = calculate_rpkm(df['count'],n,df['gene_length']) df  out[9]:   gene  count  gene_length           rpkm 0        61          152   32508.366908 1    b    320           86  301411.926493 2    c     34           92   29936.429112 3    d     14          170    6670.955138 4    e     33          111   24082.405613 

the error message telling you cannot cast pandas series float, whilst call apply call method row-wise. should @ rewriting method can work on entire series, vectorised , faster calling apply for loop.

timings

in [11]:  def calculate_rpkm1(thec,then, thel):     """     thec  == total reads mapped feature (gene/linc)     thel  == length of feature (gene/linc)      == total reads mapped     """     rpkm = ((10**9) * thec)/(then * thel)     return rpkm ​ def calculate_rpkm(thec,then,thel):     """     thec  == total reads mapped feature (gene/linc)     thel  == length of feature (gene/linc)      == total reads mapped     """     rpkm = float((10**9) * thec)/(then * thel)     return rpkm n=12345  %timeit calculate_rpkm1(df['count'],n,df['gene_length']) %timeit df[(['count', 'gene_length'])].apply(lambda x: calculate_rpkm(x[0], n, x[1]), axis=1)  1000 loops, best of 3: 238 µs per loop 100 loops, best of 3: 1.5 ms per loop 

you can see non casting version on 6x faster , more performant on larger datasets

update

the following code along using non-casting float version of method semantically equivalent:

df['rpkm'] = calculate_rpkm1(df['count'].astype(float),n,df['gene_length']) df  out[16]:   gene  count  gene_length           rpkm 0        61          152   32508.366908 1    b    320           86  301411.926493 2    c     34           92   29936.429112 3    d     14          170    6670.955138 4    e     33          111   24082.405613 

Comments

Popular posts from this blog

javascript - Google App Script ContentService downloadAsFile not working -

javascript - Function overwritting -

php - Find a regex to take part of Email -