r cumsum-like function for splitting dataframe -
given following dataframe:
mydf <- data.frame(x=c(1:10,10:1),y=c(10:1,1:10)) how possible split such each sub-dataframe have consecutive values of 1 column greater other column?
for example in mydf, outcome hoping spliting 3 dataframes:
- (y > x; should contain first 5 rows of
mydf) - (x > y; should contain rows 6 15 of
mydf) - (y > x again; should contain last 5 rows of
mydf)
i tried using following code produced bad results each y > x split individually; moreover, dataframes x > y contain y > x in first row:
split(mydf, cumsum(mydf$x > mydf$y)) another less elegant approach tried sapply individual ifs inside split function, don't want go path because of performance issues.
try
rl <- with(mydf, rle(x >y)) grp <- inverse.rle(within.list(rl , values <- seq_along(values))) split(mydf, grp) #$`1` # x y #1 1 10 #2 2 9 #3 3 8 #4 4 7 #5 5 6 #$`2` # x y #6 6 5 #7 7 4 #8 8 3 #9 9 2 #10 10 1 #11 10 1 #12 9 2 #13 8 3 #14 7 4 #15 6 5 #$`3` # x y #16 5 6 #17 4 7 #18 3 8 #19 2 9 #20 1 10 or
group <- with(mydf, cumsum(c(1,abs(diff(x >y))))) split(mydf, group) or can use rleid devel version of data.table (from @david arenburg's comments) , i.e. v1.9.5. onstructions install here
library(data.table) split(mydf, rleid(with(mydf, y > x)))
Comments
Post a Comment