# Example 8.20: Referencing lists of variables, part 2

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In Example 8.19, we discussed how to refer to a group of variables with sequential names, such as **SAS and R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

`varname1, varname2, varname3`. This is trivial in SAS and can be done in R as we showed.

It’s also sometimes useful to refer to all variables which begin with a common character string. For example, in the HELP data set, there are the variables

`cesd, cesd1, cesd2, cesd3`and

`cesd4`.

**SAS**

In SAS, this can be done with the

`:`operator. This functions much like the

`*`wildcard available in many operating systems.

proc means data="c:\book\help.sas7bdat" mean; var cesd:; run;

Variable Mean ------------------------ CESD1 22.7154472 CESD2 23.5837321 CESD3 22.0685484 CESD4 20.1428571 CESD 32.8476821 ------------------------

**R**

This functionality is not built into R. But, as with the sequentially named variable problem, you can use the string functions available within R to replicate the effect.

In this case, we use the

`names()`function (section 1.3.4) to get a list of the variables in the data set, then search for names whose beginnings match the desired string using the

`substr()`function (section 1.4.3). Note that the

`substr() ==`section returns a vector of logicals, rather than variable names.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv") mean(ds[, substr(names(ds), 1, 4) == "cesd"], na.rm=TRUE) cesd1 cesd2 cesd3 cesd4 cesd 22.71545 23.58373 22.06855 20.14286 32.84768

The typing required for the previous statement is rather involved, and requires counting characters. You may want to make a function to do this instead.

The function will accept a data frame as input and return the data frame with just the desired variables. It looks much like the direct version displayed above, but uses the

`substitute()`function to access the “varname” parameter as text, rather than as an object. I store those characters in the object

`vname`.

matchin = function(dsname, varname) { vname = substitute(varname) return(dsname[substr(names(dsname),1,nchar(vname)) == vname]) }

Now we can just type

mean(matchin(ds, cesd), na.rm=TRUE)

with results identical to those displayed above.

To

**leave a comment**for the author, please follow the link and comment on their blog:**SAS and R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.