[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this previous post, I showed how you can include a dummy variable for the baseline level in the output of the `model.matrix` function. In this post, I show how you can make changes to the column names of `model.matrix`‘s output to make downstream parsing a little easier.

Let’s use the iris dataset again:

```data(iris)
str(iris)
# 'data.frame':	150 obs. of  5 variables:
#  \$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#  \$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#  \$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#  \$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  \$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

x <- model.matrix(Sepal.Length ~ Species, iris)
#   (Intercept) Speciesversicolor Speciesvirginica
# 1           1                 0                0
# 2           1                 0                0
# 3           1                 0                0
# 4           1                 0                0
# 5           1                 0                0
# 6           1                 0                0
```

Notice the default behavior for the column names of the returned matrix: for a given level, the column name is the name of the variable concatenated with the name of the level, with no spaces in between. For example, the last column in the matrix above represents the `virginica` level of the `Species` variable.

Because the concatenation happens with no characters in between the variable and level names, it can be hard to programmatically separate the two parts in the returned column names. We can make our life easier by having `model.matrix` return the variable and level names with some special character, e.g. `.`, in between.

We can achieve this by modifying the `contrasts.arg` function argument. In our example, the default value for this argument is `list(Species = contrasts(iris\$Species))`. The code below shows what `contrasts(iris\$Species)` is:

```contrasts(iris\$Species)
#            versicolor virginica
# setosa              0         0
# versicolor          1         0
# virginica           0         1
```

We can modify the column names of `contrasts(iris\$Species)` to achieve the desired effect:

```speciesContrast <- contrasts(iris\$Species)
colnames(speciesContrast) <- paste0(".", colnames(speciesContrast))
x <- model.matrix(
Sepal.Length ~ Species,
iris,
contrasts.arg = list(Species = speciesContrast)
)
#   (Intercept) Species.versicolor Species.virginica
# 1           1                  0                 0
# 2           1                  0                 0
# 3           1                  0                 0
# 4           1                  0                 0
# 5           1                  0                 0
# 6           1                  0                 0
```

We can do this programmatically for all factor variables in a data frame too. Here is our example data frame:

```df <- data.frame(x = factor(rep(c("a", "b", "c"), times = 3)),
y = factor(rep(c("d", "e", "f"), times = 3)),
z = 1:9)
str(df)
# 'data.frame':	9 obs. of  3 variables:
#  \$ x: Factor w/ 3 levels "a","b","c": 1 2 3 1 2 3 1 2 3
#  \$ y: Factor w/ 3 levels "d","e","f": 1 2 3 1 2 3 1 2 3
#  \$ z: int  1 2 3 4 5 6 7 8 9

x <- model.matrix(~ ., data = df)
#   (Intercept) xb xc ye yf z
# 1           1  0  0  0  0 1
# 2           1  1  0  1  0 2
# 3           1  0  1  0  1 3
# 4           1  0  0  0  0 4
# 5           1  1  0  1  0 5
# 6           1  0  1  0  1 6
```

Here is the code that adds `---` between the variable and level names:

```ChangeColnames <- function(x) {
colnames(x) <- paste0("---", colnames(x))
x
}

x <- model.matrix(
~ .,
data = df,
contrasts.arg = lapply(df[, sapply(df, is.factor), drop = FALSE],
function(x) ChangeColnames(contrasts(x)))
)
#   (Intercept) x---b x---c y---e y---f z
# 1           1     0     0     0     0 1
# 2           1     1     0     1     0 2
# 3           1     0     1     0     1 3
# 4           1     0     0     0     0 4
# 5           1     1     0     1     0 5
# 6           1     0     1     0     1 6
```