Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In Example 8.19, we discussed how to refer to a group of variables with sequential names, such as varname1, varname2, varname3. This is trivial in SAS and can be done in R as we showed.

It’s also sometimes useful to refer to all variables which begin with a common character string. For example, in the HELP data set, there are the variables cesd, cesd1, cesd2, cesd3 and cesd4.

SAS
In SAS, this can be done with the : operator. This functions much like the * wildcard available in many operating systems.
proc means data="c:\book\help.sas7bdat" mean;
var cesd:;
run;


Variable            Mean
------------------------
CESD1         22.7154472
CESD2         23.5837321
CESD3         22.0685484
CESD4         20.1428571
CESD          32.8476821
------------------------


R
This functionality is not built into R. But, as with the sequentially named variable problem, you can use the string functions available within R to replicate the effect.

In this case, we use the names() function (section 1.3.4) to get a list of the variables in the data set, then search for names whose beginnings match the desired string using the substr() function (section 1.4.3). Note that the substr() == section returns a vector of logicals, rather than variable names.
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
mean(ds[, substr(names(ds), 1, 4) == "cesd"], na.rm=TRUE)

cesd1    cesd2    cesd3    cesd4     cesd
22.71545 23.58373 22.06855 20.14286 32.84768


The typing required for the previous statement is rather involved, and requires counting characters. You may want to make a function to do this instead.

The function will accept a data frame as input and return the data frame with just the desired variables. It looks much like the direct version displayed above, but uses the substitute() function to access the “varname” parameter as text, rather than as an object. I store those characters in the object vname.
matchin = function(dsname, varname)   {
vname = substitute(varname)
return(dsname[substr(names(dsname),1,nchar(vname)) == vname])
}


Now we can just type
mean(matchin(ds, cesd), na.rm=TRUE)


with results identical to those displayed above.