renaming data frame columns in lists
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
OK, so the scenario is as follows:
- we have a list of 2 elements which in turn are again lists with 2 elements (each of which is a data frame).
- None of the elements in question carry names (neither the list entries nor the data frames)
- we want to only set the names of the data frames that are buried 2 levels down the main list
First we create some mock data that resembles the scenario (mimicking temperature and relative humidity observations during January and February 2010)
## create 2 mock months date_jan <- as.Date(seq(1, 31, 1), origin = "2010-01-01") date_feb <- as.Date(seq(1, 28, 1), origin = "2010-02-01") ## create mock observations for the months Ta_200_jan <- rnorm(31, 10, 3) Ta_200_feb <- rnorm(28, 11, 3) rH_200_jan <- rnorm(31, 75, 10) rH_200_feb <- rnorm(28, 70, 10) df1 <- data.frame(V1 = date_jan, V2 = Ta_200_jan) df2 <- data.frame(V1 = date_jan, V2 = rH_200_jan) df3 <- data.frame(V1 = date_feb, V2 = Ta_200_feb) df4 <- data.frame(V1 = date_feb, V2 = rH_200_feb) lst <- list(list(df1, df2), list(df3, df4))
So now we have a list of two elements which are again a list of 2 which is made up of 2 data frames each.
None of these elements are named (actually the columns of the data frames are named V1 and V2 – which is not very informative).
This is what the list structure looks like:
str(lst) ## List of 2 ## $ :List of 2 ## ..$ :'data.frame': 31 obs. of 2 variables: ## .. ..$ V1: Date[1:31], format: "2010-01-02" ... ## .. ..$ V2: num [1:31] 9.95 15.49 9.45 12.16 8.84 ... ## ..$ :'data.frame': 31 obs. of 2 variables: ## .. ..$ V1: Date[1:31], format: "2010-01-02" ... ## .. ..$ V2: num [1:31] 70.4 87.6 69.6 80.2 59 ... ## $ :List of 2 ## ..$ :'data.frame': 28 obs. of 2 variables: ## .. ..$ V1: Date[1:28], format: "2010-02-02" ... ## .. ..$ V2: num [1:28] 11.95 8.42 13.06 9.55 10.76 ... ## ..$ :'data.frame': 28 obs. of 2 variables: ## .. ..$ V1: Date[1:28], format: "2010-02-02" ... ## .. ..$ V2: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...
Now we define the names to set
name.x <- c("Date") name.y <- c("Ta_200", "rH_200")
And finally, we use lapply()
to recursively set the column names of the data frames within the list of lists
The crux is to define a data frame (y) at iteration 2 which is subsequently returned (and as lapply()
always returns a list, we again get a list of lists)
lst <- lapply(seq(lst), function(i) { lapply(seq(name.y), function(j) { y <- data.frame(lst[[i]][[j]]) names(y) <- c(name.x, name.y[j]) return(y) }) })
And this is what we end up with:
str(lst) ## List of 2 ## $ :List of 2 ## ..$ :'data.frame': 31 obs. of 2 variables: ## .. ..$ Date : Date[1:31], format: "2010-01-02" ... ## .. ..$ Ta_200: num [1:31] 9.95 15.49 9.45 12.16 8.84 ... ## ..$ :'data.frame': 31 obs. of 2 variables: ## .. ..$ Date : Date[1:31], format: "2010-01-02" ... ## .. ..$ rH_200: num [1:31] 70.4 87.6 69.6 80.2 59 ... ## $ :List of 2 ## ..$ :'data.frame': 28 obs. of 2 variables: ## .. ..$ Date : Date[1:28], format: "2010-02-02" ... ## .. ..$ Ta_200: num [1:28] 11.95 8.42 13.06 9.55 10.76 ... ## ..$ :'data.frame': 28 obs. of 2 variables: ## .. ..$ Date : Date[1:28], format: "2010-02-02" ... ## .. ..$ rH_200: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...
Problem solved!
we now have a list of lists with named columns for each data frame with correct labels for date and parameter of the observations!
PS: if you wanted to name the first level entries of the list according to the month of observation, this would do the job:
names(lst) <- c("January", "February") str(lst) ## List of 2 ## $ January :List of 2 ## ..$ :'data.frame': 31 obs. of 2 variables: ## .. ..$ Date : Date[1:31], format: "2010-01-02" ... ## .. ..$ Ta_200: num [1:31] 9.95 15.49 9.45 12.16 8.84 ... ## ..$ :'data.frame': 31 obs. of 2 variables: ## .. ..$ Date : Date[1:31], format: "2010-01-02" ... ## .. ..$ rH_200: num [1:31] 70.4 87.6 69.6 80.2 59 ... ## $ February:List of 2 ## ..$ :'data.frame': 28 obs. of 2 variables: ## .. ..$ Date : Date[1:28], format: "2010-02-02" ... ## .. ..$ Ta_200: num [1:28] 11.95 8.42 13.06 9.55 10.76 ... ## ..$ :'data.frame': 28 obs. of 2 variables: ## .. ..$ Date : Date[1:28], format: "2010-02-02" ... ## .. ..$ rH_200: num [1:28] 78.7 63.9 62.6 67.5 73.5 ...
I leave it up to your imagination how to set the names of the second level list entries…
sessionInfo() ## R version 2.15.2 (2012-10-26) ## Platform: x86_64-pc-linux-gnu (64-bit) ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=C LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] knitr_1.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 stringr_0.6.2 ## [5] tools_2.15.2
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.