r - caret with SNOW not using parallel processor on a particular set of data -
i have 2 data frames (here reproducibility) trainfin1
, trainfin2
, both sampled same bigger dataset.
i'm trying run cross-validated rpart
on them using caret
on multiprocessor using dosnow
package.
interestingly, trainfin1
trained nicely across 4 processors (finishing in 25 seconds). trainfin2
seems stuck on 1 processor (observed in windows task manager window), , never see finish processing after half hour.
my code below
require(caret) require(rpart) load("trainfin.rdata") fitcontrol <- traincontrol(method = "repeatedcv", number = 5, repeats = 5) #setup parallel processing require(dosnow) cl <- makecluster(4, type = "sock") registerdosnow(cl) #train set.seed(12345) firstset <- train(x = trainfin1[, names(trainfin1) != "happiness"], y = trainfin1$happiness, method = "rpart2", trcontrol = fitcontrol) set.seed(12345) secondset <- train(x = trainfin2[, names(trainfin2) != "happiness"], y = trainfin2$happiness, method = "rpart2", trcontrol = fitcontrol) stopcluster(cl)
do note avoided use of formula
in train
, instead feed raw data, avoid caret
converting ordinal variables dummy categorical variables (see answer this question). when used formula
(i.e. train(happiness ~ ., data = trainfin2, method = "rpart2", trcontrol = fitcontrol)
), there seems no issue parallel processing. want avoid using formula
per other question.
any suggestions on how can parallel-process data without converting predictors categorical dummies ?
Comments
Post a Comment