r - caret with SNOW not using parallel processor on a particular set of data -


i have 2 data frames (here reproducibility) trainfin1 , trainfin2, both sampled same bigger dataset.

i'm trying run cross-validated rpart on them using caret on multiprocessor using dosnow package.

interestingly, trainfin1 trained nicely across 4 processors (finishing in 25 seconds). trainfin2 seems stuck on 1 processor (observed in windows task manager window), , never see finish processing after half hour.

my code below

require(caret) require(rpart) load("trainfin.rdata")  fitcontrol <- traincontrol(method = "repeatedcv", number = 5, repeats = 5)  #setup parallel processing require(dosnow) cl <- makecluster(4, type = "sock") registerdosnow(cl)  #train set.seed(12345) firstset <- train(x = trainfin1[, names(trainfin1) != "happiness"],                   y = trainfin1$happiness,                   method = "rpart2", trcontrol = fitcontrol)  set.seed(12345) secondset <- train(x = trainfin2[, names(trainfin2) != "happiness"],                    y = trainfin2$happiness,                    method = "rpart2", trcontrol = fitcontrol)  stopcluster(cl) 

do note avoided use of formula in train , instead feed raw data, avoid caret converting ordinal variables dummy categorical variables (see answer this question). when used formula (i.e. train(happiness ~ ., data = trainfin2, method = "rpart2", trcontrol = fitcontrol)), there seems no issue parallel processing. want avoid using formula per other question.

any suggestions on how can parallel-process data without converting predictors categorical dummies ?


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -