In R, for all columns count factors/characters occurence, group by key -
this question needs solving data.table
or dplyr
. have dataset (data frame) looks this:
summary(mooc_events) signup_id time source event min. : 1 min. :2013-10-27 browser:3869940 access :3112191 1st qu.: 18721 1st qu.:2013-12-19 server :4287337 discussion: 649259 median : 48331 median :2014-05-30 navigate :1009309 mean : 63476 mean :2014-04-05 page_close:1237883 3rd qu.:110375 3rd qu.:2014-06-15 problem :1261170 max. :200905 max. :2014-08-01 video : 796958 wiki : 90507 artefact_sha length:8157277 class :character mode :character
one signup_id have multiple events, there many rows starts same signup_id.
what i'm trying achieve aggregated dataset (data.table or data frame) columns there distinct values per particular column, grouped signup_id
, data this:
signup_id, source_browser, source_server, event_access, event_discussion, ... , event_wiki, artefact_sha_{first_element_in_whole_dataset}, ..., artefact_sha_{last_element_in_whole_dataset} 1, 23, 37, 9, 0, ..., 3, 7, ..., 1 2, 2, 7, 2, 2, ..., 1, 0, ..., 0
in other words, it's counting occurrences given set of columns, grouped single column signup_id
, i'm not interesting in grouping e.g. both signup_id , source.
column naming not strict (_
can substituted makes sense).
(let's skip time column @ moment)
best regards , upfront.
it's more reshaping problem solved using tidyr , reshape2 libraries.
reshaping tidyr , counting occurences reshape2 :
my example doesn't include artefact_sha
because didn't understand want it.
library(dplyr) # or library(magrittr) pipe syntax library(tidyr) library(reshape2) set.seed(42) mooc_events <- data.frame(signup_id = rep(1:3, each = 5), time = sys.date(), source = sample(c("browser", "server"), 15, rep = true), event = sample(c("access", "discussion", "navigate"), 15, rep = true), stringsasfactors = false) mooc_events.m <- mooc_events %>% gather(key, value, -c(signup_id, time)) %>% unite(var, key, value, sep = "_") mytable <- dcast(mooc_events.m, signup_id ~ var, fun.aggregate = length) > mytable signup_id event_access event_discussion event_navigate source_browser source_server 1 1 1 2 2 1 4 2 2 2 0 3 1 4 3 3 0 3 2 3 2
Comments
Post a Comment