r - caret with SNOW not using parallel processor on a particular set of data -
i have 2 data frames (here reproducibility) trainfin1 , trainfin2, both sampled same bigger dataset.
i'm trying run cross-validated rpart on them using caret on multiprocessor using dosnow package.
interestingly, trainfin1 trained nicely across 4 processors (finishing in 25 seconds). trainfin2 seems stuck on 1 processor (observed in windows task manager window), , never see finish processing after half hour.
my code below
require(caret) require(rpart) load("trainfin.rdata") fitcontrol <- traincontrol(method = "repeatedcv", number = 5, repeats = 5) #setup parallel processing require(dosnow) cl <- makecluster(4, type = "sock") registerdosnow(cl) #train set.seed(12345) firstset <- train(x = trainfin1[, names(trainfin1) != "happiness"], y = trainfin1$happiness, method = "rpart2", trcontrol = fitcontrol) set.seed(12345) secondset <- train(x = trainfin2[, names(trainfin2) != "happiness"], y = trainfin2$happiness, method = "rpart2", trcontrol = fitcontrol) stopcluster(cl) do note avoided use of formula in train , instead feed raw data, avoid caret converting ordinal variables dummy categorical variables (see answer this question). when used formula (i.e. train(happiness ~ ., data = trainfin2, method = "rpart2", trcontrol = fitcontrol)), there seems no issue parallel processing. want avoid using formula per other question.
any suggestions on how can parallel-process data without converting predictors categorical dummies ?
Comments
Post a Comment