(This article was first published on

In Example 8.19, we discussed how to refer to a group of variables with sequential names, such as **SAS and R**, and kindly contributed to R-bloggers)`varname1, varname2, varname3`. This is trivial in SAS and can be done in R as we showed.

It's also sometimes useful to refer to all variables which begin with a common character string. For example, in the HELP data set, there are the variables

`cesd, cesd1, cesd2, cesd3`and

`cesd4`.

**SAS**

In SAS, this can be done with the

`:`operator. This functions much like the

`*`wildcard available in many operating systems.

proc means data="c:\book\help.sas7bdat" mean;

var cesd:;

run;

Variable Mean

------------------------

CESD1 22.7154472

CESD2 23.5837321

CESD3 22.0685484

CESD4 20.1428571

CESD 32.8476821

------------------------

**R**

This functionality is not built into R. But, as with the sequentially named variable problem, you can use the string functions available within R to replicate the effect.

In this case, we use the

`names()`function (section 1.3.4) to get a list of the variables in the data set, then search for names whose beginnings match the desired string using the

`substr()`function (section 1.4.3). Note that the

`substr() ==`section returns a vector of logicals, rather than variable names.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")

mean(ds[, substr(names(ds), 1, 4) == "cesd"], na.rm=TRUE)

cesd1 cesd2 cesd3 cesd4 cesd

22.71545 23.58373 22.06855 20.14286 32.84768

The typing required for the previous statement is rather involved, and requires counting characters. You may want to make a function to do this instead.

The function will accept a data frame as input and return the data frame with just the desired variables. It looks much like the direct version displayed above, but uses the

`substitute()`function to access the "varname" parameter as text, rather than as an object. I store those characters in the object

`vname`.

matchin = function(dsname, varname) {

vname = substitute(varname)

return(dsname[substr(names(dsname),1,nchar(vname)) == vname])

}

Now we can just type

mean(matchin(ds, cesd), na.rm=TRUE)

with results identical to those displayed above.

To

**leave a comment**for the author, please follow the link and comment on his blog:**SAS and R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...