r cumsum-like function for splitting dataframe -


given following dataframe:

mydf <- data.frame(x=c(1:10,10:1),y=c(10:1,1:10)) 

how possible split such each sub-dataframe have consecutive values of 1 column greater other column?

for example in mydf, outcome hoping spliting 3 dataframes:

  1. (y > x; should contain first 5 rows of mydf)
  2. (x > y; should contain rows 6 15 of mydf)
  3. (y > x again; should contain last 5 rows of mydf)

i tried using following code produced bad results each y > x split individually; moreover, dataframes x > y contain y > x in first row:

split(mydf, cumsum(mydf$x > mydf$y)) 

another less elegant approach tried sapply individual ifs inside split function, don't want go path because of performance issues.

try

rl <- with(mydf, rle(x >y)) grp <- inverse.rle(within.list(rl , values <- seq_along(values))) split(mydf, grp)   #$`1` #  x  y #1 1 10 #2 2  9 #3 3  8 #4 4  7 #5 5  6  #$`2` #    x y #6   6 5 #7   7 4 #8   8 3 #9   9 2 #10 10 1 #11 10 1 #12  9 2 #13  8 3 #14  7 4 #15  6 5  #$`3` #   x  y #16 5  6 #17 4  7 #18 3  8 #19 2  9 #20 1 10 

or

group <-  with(mydf, cumsum(c(1,abs(diff(x >y))))) split(mydf, group) 

or can use rleid devel version of data.table (from @david arenburg's comments) , i.e. v1.9.5. onstructions install here

 library(data.table)  split(mydf, rleid(with(mydf, y > x))) 

Comments

Popular posts from this blog

javascript - Google App Script ContentService downloadAsFile not working -

javascript - Function overwritting -

php - Find a regex to take part of Email -