r - Grow a ffdf data frame on disk gradually -


from documentation of save.ffdf:

using ‘save.ffdf’ automagically sets ‘finalizer’s of ‘ff’ vectors ‘"close"’. means data preserved on disk when object removed or r sessions closed. data can deleted either using ‘delete’ or removing directory object saved (‘dir’).

i want starting small ffdf data frame, add bit new data @ time, , grow on disk. did little experiment:

# in r ffiris = as.ffdf(iris) save.ffdf(ffiris, dir = "~/desktop/iris")  # in bash ls ~/desktop/iris/ ## ffiris$petal.length.ff ffiris$petal.width.ff  ffiris$sepal.length.ff ffiris$sepal.width.ff  ffiris$species.ff  # in r # add new column ffiris =transform(ffiris, new1 = rep(99, nrow(iris))) rm(ffiris)  # in bash ls ~/desktop/iris/ ## ffiris$petal.length.ff ffiris$petal.width.ff  ffiris$sepal.length.ff ffiris$sepal.width.ff  ffiris$species.ff 

it turns out doesn't automatically update ff data on disk when remove ffiris. saving manually?

# in r # add new column ffiris =transform(ffiris, new1 = rep(99, nrow(iris))) save.ffdf(ffiris, "~/desktop/iris")  # in bash ls ~/desktop/iris/ ## ffiris$petal.length.ff ffiris$petal.width.ff  ffiris$sepal.length.ff ffiris$sepal.width.ff  ffiris$species.ff 

hmm, still no luck. why?

what removing folder before saving?

# in r ffiris = as.ffdf(iris) unlink("~/desktop/iris", recursive = true, force = true) save.ffdf(ffiris, "~/desktop/iris", overwrite = true) ffiris =transform(ffiris, new1 = rep(99, nrow(iris))) unlink("~/desktop/iris", recursive = true, force = true) save.ffdf(ffiris, "~/desktop/iris", overwrite = true)  # in bash ls ~/desktop/iris/ # ls: /users/ky/desktop/iris/: no such file or directory 

even stranger. if works, still terribly inefficient. looking like:

updateondisk(ffiris) 

could help?

ff , ffbase offer out of memory r vectors, introduce reference semantics can give problems r idioms.

r functional programming language, meaning functions not change parameters , objects, return modified copies. in ffbase implement functions in r way, i.e. transform returns copy of original ffdf data.frame. can seen looking @ filenames:

ffiris = as.ffdf(iris) save.ffdf(ffiris, dir = "~/desktop/iris") filename(ffiris) # show contents of ~/desktop/iris  ffiris =transform(ffiris, new1 = 99) # create copy of whole data.frame! filename(ffiris)    ffiris$new2 <- ff(rep(99, nrow(iris)))  # creates new column, not yet in right directory filename(ffiris)  save.ffdf(ffiris, dir="~/desktop/iris", overwrite=true) # fixes that. 

transform inefficient add new column, because copies whole data frame (that r semantics). because transform might temparory result , don't wont change original data.

in ffbase2 fixing issue


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -