r - Grow a ffdf data frame on disk gradually -
from documentation of save.ffdf:
using ‘save.ffdf’ automagically sets ‘finalizer’s of ‘ff’ vectors ‘"close"’. means data preserved on disk when object removed or r sessions closed. data can deleted either using ‘delete’ or removing directory object saved (‘dir’).
i want starting small ffdf data frame, add bit new data @ time, , grow on disk. did little experiment:
# in r ffiris = as.ffdf(iris) save.ffdf(ffiris, dir = "~/desktop/iris") # in bash ls ~/desktop/iris/ ## ffiris$petal.length.ff ffiris$petal.width.ff ffiris$sepal.length.ff ffiris$sepal.width.ff ffiris$species.ff # in r # add new column ffiris =transform(ffiris, new1 = rep(99, nrow(iris))) rm(ffiris) # in bash ls ~/desktop/iris/ ## ffiris$petal.length.ff ffiris$petal.width.ff ffiris$sepal.length.ff ffiris$sepal.width.ff ffiris$species.ff
it turns out doesn't automatically update ff data on disk when remove ffiris. saving manually?
# in r # add new column ffiris =transform(ffiris, new1 = rep(99, nrow(iris))) save.ffdf(ffiris, "~/desktop/iris") # in bash ls ~/desktop/iris/ ## ffiris$petal.length.ff ffiris$petal.width.ff ffiris$sepal.length.ff ffiris$sepal.width.ff ffiris$species.ff
hmm, still no luck. why?
what removing folder before saving?
# in r ffiris = as.ffdf(iris) unlink("~/desktop/iris", recursive = true, force = true) save.ffdf(ffiris, "~/desktop/iris", overwrite = true) ffiris =transform(ffiris, new1 = rep(99, nrow(iris))) unlink("~/desktop/iris", recursive = true, force = true) save.ffdf(ffiris, "~/desktop/iris", overwrite = true) # in bash ls ~/desktop/iris/ # ls: /users/ky/desktop/iris/: no such file or directory
even stranger. if works, still terribly inefficient. looking like:
updateondisk(ffiris)
could help?
ff
, ffbase
offer out of memory r vectors, introduce reference semantics can give problems r idioms.
r functional programming language, meaning functions not change parameters , objects, return modified copies. in ffbase
implement functions in r way, i.e. transform
returns copy of original ffdf data.frame
. can seen looking @ filenames:
ffiris = as.ffdf(iris) save.ffdf(ffiris, dir = "~/desktop/iris") filename(ffiris) # show contents of ~/desktop/iris ffiris =transform(ffiris, new1 = 99) # create copy of whole data.frame! filename(ffiris) ffiris$new2 <- ff(rep(99, nrow(iris))) # creates new column, not yet in right directory filename(ffiris) save.ffdf(ffiris, dir="~/desktop/iris", overwrite=true) # fixes that.
transform inefficient add new column, because copies whole data frame (that r semantics). because transform might temparory result , don't wont change original data.
in ffbase2 fixing issue
Comments
Post a Comment