R Bigmemory matrix Kmeans converting from data frame -
i new using r k-means clustering , tried sample application of clustering 2 files , succeeded following code. original files using larger these initial test files, below code on using ram , think inefficient use on larger files.
file1 <- read.csv("//tmp//file1.txt", sep="\t", header=true) file1[is.na(file1)]<-0 file2 <- read.csv("//tmp//file2.txt", sep="\t", header=true) file2[is.na(file2)]<-0 file1_new <- cbind(file1, file_number = 1) file2_new <- cbind(file1, file_number = 2) total_input <- rbind(file1_new, file2_new) myvars <- data.frame(col1 = total_input$col1, file_number = total_input$file_number) myvars_k_means <- kmeans(myvars, 6) myvars_k_clustered <- cbind(myvars$col1, myvars$file_number, myvars_k_means$cluster)
i came across bigmemory , biganalytics bigkmeans function. struggling translate above use bigmatrix. here code working on right now.
file1 <- read.big.matrix("//tmp//bigfile1.txt", sep="\t", header=false) file2 <- read.big.matrix("//tmp//bigfile2.txt", sep="\t", header=false) file1[is.finite(file1)] <-0 file1[is.finite(file2)] <-0 total_input <- list(file1, file2) myvars <- cbind(total_input[,1], total_input[,2]) myvars_k_means <- bigkmeans(myvars,6) myvars_k_clustered <- cbind(total_input[,1], total_input[,2], myvars_k_means$cluster)
replacing na 0 not working, if ignore step producing single cluster due nas. , cbinds, column additions not working. think missing easier way, couldn't understand bigmemory/matrix documentation. can please help?
Comments
Post a Comment