Example 8.20: Referencing lists of variables, part 2

Posted on January 10, 2011 by Ken Kleinman in R bloggers, Uncategorized | 0 Comments

[This article was first published on SAS and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In Example 8.19, we discussed how to refer to a group of variables with sequential names, such as varname1, varname2, varname3. This is trivial in SAS and can be done in R as we showed.

It’s also sometimes useful to refer to all variables which begin with a common character string. For example, in the HELP data set, there are the variables cesd, cesd1, cesd2, cesd3 and cesd4.

SAS
In SAS, this can be done with the : operator. This functions much like the * wildcard available in many operating systems.

proc means data="c:\book\help.sas7bdat" mean;
  var cesd:;
run;

Variable            Mean
------------------------
CESD1         22.7154472
CESD2         23.5837321
CESD3         22.0685484
CESD4         20.1428571
CESD          32.8476821
------------------------

R
This functionality is not built into R. But, as with the sequentially named variable problem, you can use the string functions available within R to replicate the effect.

In this case, we use the names() function (section 1.3.4) to get a list of the variables in the data set, then search for names whose beginnings match the desired string using the substr() function (section 1.4.3). Note that the substr() == section returns a vector of logicals, rather than variable names.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
mean(ds[, substr(names(ds), 1, 4) == "cesd"], na.rm=TRUE) 

   cesd1    cesd2    cesd3    cesd4     cesd 
22.71545 23.58373 22.06855 20.14286 32.84768

The typing required for the previous statement is rather involved, and requires counting characters. You may want to make a function to do this instead.

The function will accept a data frame as input and return the data frame with just the desired variables. It looks much like the direct version displayed above, but uses the substitute() function to access the “varname” parameter as text, rather than as an object. I store those characters in the object vname.

matchin = function(dsname, varname)   {
  vname = substitute(varname)
  return(dsname[substr(names(dsname),1,nchar(vname)) == vname])
  }

Now we can just type

mean(matchin(ds, cesd), na.rm=TRUE)

with results identical to those displayed above.

To leave a comment for the author, please follow the link and comment on their blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)