r - Calculating a mean from data held in multiple files -

July 15, 2013

i trying write r script calculates mean of specified pollutant (nitrate or sulfate) based on data 1 or more of 332 monitor stations. data each station held in separate file, numbered 1:332. new r and, fair chooses me, should homework problem. have written script below, works 1 file:

pollutantmean <- function(directory, pollutant, id = 1:332) {     filepath <- "/users/jim/documents/coursera/2_r_prog/data"     for(i in seq_along(id)) {             if(id < 10) {                     name <- paste("00", id[i], sep = "")             }             if(id >= 10 && id < 100) {                     name <- paste("0", id[i], sep = "")             }              if(id >= 100) {                     name <- id[i]             }         }     file <- paste(name, "csv", sep = ".")     station <- paste(filepath, directory, file, sep = "/")     monitor <- read.csv(station)     if(pollutant == "nitrate") {             x <- mean(monitor$nitrate, na.rm = t)     }     if(pollutant == "sulfate") {             x <- mean(monitor$sulfate, na.rm = t)     }     x }

however, if enter more 1 file (eg 70:72) mean last file (72). suggests me calculating mean each file , overwriting mean of next, last outputted. able solve using rbind(), can't figure out how assign unique names each variable become arguments rbind(). grateful can offer. cheers, jim

you don't loop on files.

and mean of last file because when loop on ids create names, loop returns last name created.

you should create vector of names stations , loop on !

tips : don't need loop , conditional statements create names, use sprintf precising size of string expected (3) , want "expand" string (0)

> id <- c(1, 10, 100) > names <- sprintf("%03d", id) > names [1] "001" "010" "100"

and should works :

pollutantmean <- function(directory, pollutant, id = 1:332) {   filepath <- "/users/jim/documents/coursera/2_r_prog/data"    names <- sprintf("%03d", id)   files <- paste0(names, ".csv") # or directly : files <- sprintf("%03d.csv", id)   station <- file.path(filepath, directory, files)    means <- numeric(length(station))    (i in seq_along(station)) {     monitor <- read.csv(station[i])     if(pollutant == "nitrate") {       means[i] <- mean(monitor$nitrate, na.rm = t)     } else if(pollutant == "sulfate") {       means[i] <- mean(monitor$sulfate, na.rm = t)     }   }   return(means) }

edit : if want single mean, can use code above , ponderate each means nrow non na. replace loop :

means <- numeric(length(station)) counts <- numeric(length(station))  (i in seq_along(station)) {   monitor <- read.csv(station[i])   if(pollutant == "nitrate") {     means[i] <- mean(monitor$nitrate, na.rm = true)     counts[i] <- sum(!is.na(monitor$nitrate))   } else if(pollutant == "sulfate") {     means[i] <- mean(monitor$sulfate, na.rm = true)     counts[i] <- sum(!is.na(monitor$sulfate))   } }  mymean <- sum(means * counts) / sum(counts) return(mymean)

since first intention gather datas 1 vector, here solution create list in each element desire "pollutant" variable of each datasframes, unlist gather vectors 1 , can compute mean on vector.

pollutantmean <- function(directory, pollutant, id = 1:332) {   filepath <- "/users/jim/documents/coursera/2_r_prog/data"    names <- sprintf("%03d", id)   files <- paste0(names, ".csv") # or directly : files <- sprintf("%03d.csv", id)   station <- file.path(filepath, directory, files)    li <- lapply(station, function(x) {     monitor <- read.csv(x)     if(pollutant == "nitrate") {       monitor$nitrate     } else if(pollutant == "sulfate") {       monitor$sulfate     }   })    mymean <- mean(unlist(li))    return(mymean) }

Search This Blog

ANgular

r - Calculating a mean from data held in multiple files -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -