# Casting a Wide (and Sparse) Matrix in R

January 19, 2016
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I routinely use `melt()` and `cast()` from the reshape2 package as part of my data munging workflow. Recently I’ve noticed that the data frames I’ve been casting are often extremely sparse. Stashing these in a dense data structure just feels wasteful. And the dismal drone of page thrashing is unpleasant.

So I had a look around for an alternative. As it turns out, it’s remarkably easy to cast a sparse matrix using `sparseMatrix()` from the Matrix package. Here’s an example.

First we’ll put together some test data.

```> set.seed(11)
>
> N = 10
>
> data = data.frame(
+   row = sample(1:3, N, replace = TRUE),
+   col = sample(LETTERS, N, replace = TRUE),
+   value = sample(1:3, N, replace = TRUE))
>
> data = transform(data,
+                  row = factor(row),
+                  col = factor(col))
```

It’s just a data.frame with two fields which will be transformed into the rows and columns of the matrix and a third field which gives the values to be stored in the matrix.

```> data
row col value
1    1   E     1
2    1   L     3
3    2   X     2
4    1   W     2
5    1   T     1
6    3   O     2
7    1   M     2
8    1   I     1
9    3   E     1
10   1   M     2
```

Doing the cast is pretty easy using `sparseMatrix()` because you specify the row and column for every entry inserted into the matrix. Multiple entries for a single cell (like the highlighted records above) are simply summed, which is generally the behaviour that I am after anyway.

```> library(Matrix)
>
> data.sparse = sparseMatrix(as.integer(data\$row), as.integer(data\$col), x = data\$value)
>
> colnames(data.sparse) = levels(data\$col)
> rownames(data.sparse) = levels(data\$row)
```

And here’s the result:

```> data.sparse
3 x 8 sparse Matrix of class "dgCMatrix"
E I L M O T W X
1 1 1 3 4 . 1 2 .
2 . . . . . . . 2
3 1 . . . 2 . . .
```

The post Casting a Wide (and Sparse) Matrix in R appeared first on Exegetic Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.