# Subset and Fill via an Index

January 4, 2017
By

(This article was first published on R – Jason unedited, and kindly contributed to R-bloggers)

I had a very large (70,000+ columns) problem that I needed to reduce. A function took two matrices and transformed them into a single vector the length as the two inputs. I needed to reduce the inputs and then map the output back to the original position of the corresponding column. This entry may seem obvious to veteran `R` users and I am mainly writing this to provide a reference to myself. Here is a visual example of what I needed.

``````time <- c(1, 1, 2, 2, 3, 3)
money <- c(2, 2, 4, 4, 6, 6)
ownership <- c(1, 0, 1, 0, 1, 0)
mat <- rbind(time, money, ownership)
print(mat)``````
``````##           [,1] [,2] [,3] [,4] [,5] [,6]
## time         1    1    2    2    3    3
## money        2    2    4    4    6    6
## ownership    1    0    1    0    1    0``````
``````dat <- c(1, 0, 2, 0, 3, 0)
obj <- matrix(dat,nrow=1)
print(obj)``````
``````##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    2    0    3    0``````
``````f <- function(mat,obj){
#generic function with output of the same number of columns as obj
}
soln <- f(mat, obj)``````

where `soln` is a 1×6 matrix.

The size of my problem made the function `f` extremely slow and unreliable. I needed a way to reduce the inputs and then map the output appropriately to a results matrix.

The matrices `mat` and `obj` reduce to:

``````time <- c(1, 2, 3)
money <- c(2, 4, 6)
ownership <- c(1, 1, 1)
mat <- rbind(time, money, ownership)
print(mat)``````
``````##           [,1] [,2] [,3]
## time         1    2    3
## money        2    4    6
## ownership    1    1    1``````
``````dat <- c(1, 2, 3)
obj <- matrix(dat,nrow=1)
print(obj)``````
``````##      [,1] [,2] [,3]
## [1,]    1    2    3``````
``soln <- f(mat, obj)``

with the `soln` being a 1×3 matrix. For example:

``````soln=               [,1]     [,2]      [,3]
[1,] 4.151969 5.759826  5.537563 ``````

where the decision to exclude a column from `mat` is based on the value in `ownership[]=0` and the same for `obj`. The added difficulty, is that I need to be able to assign the output in `soln` to the mapped to the corresponding original position in a larger `SOLN` matrix. In this case columns 1,3,5. Ownership is randomly assigned, so there will be no pattern other than the zeros described above.

``````time <- c(1, 1, 2, 2, 3, 3)
money <- c(2, 2, 4, 4, 6, 6)
ownership <- c(1, 0, 1, 0, 1, 0)
mat <- rbind(time, money, ownership)
print(mat)``````
``````##           [,1] [,2] [,3] [,4] [,5] [,6]
## time         1    1    2    2    3    3
## money        2    2    4    4    6    6
## ownership    1    0    1    0    1    0``````
``````dat <- c(1, 0, 2, 0, 3, 0)
obj <- matrix(dat,nrow=1)
print(obj)``````
``````##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    2    0    3    0``````
``````obj2 <- obj[, as.logical(ownership), drop = FALSE]
print(obj2)``````
``````##      [,1] [,2] [,3]
## [1,]    1    2    3``````
``````mat2 <- mat[, as.logical(ownership)]
print(mat2)``````
``````##           [,1] [,2] [,3]
## time         1    2    3
## money        2    4    6
## ownership    1    1    1``````

evaluates to

``soln <- f(mat, obj)``

with the `soln` being a 1×3 matrix. For example:

``````soln=               [,1]     [,2]      [,3]
[1,] 4.151969 5.759826  5.537563 ``````

I need to create a result space:

``````SOLN <- matrix(data=NA,nrow=1,ncol=6)
print(SOLN)``````
``````##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]   NA   NA   NA   NA   NA   NA``````

Then map the results from `soln` to columns 1, 3, & 5.

``````SOLN[, as.logical(ownership)] <- soln
print(SOLN)``````
``````##          [,1] [,2]     [,3] [,4]     [,5] [,6]
## [1,] 4.151969   NA 5.759826   NA 5.537563   NA``````

A much more elegant solution than the `for` loops I was trying to write!

For more `R` posts visit:http://www.R-bloggers.com

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...