# Subset and Fill via an Index

January 4, 2017
By

(This article was first published on R – Jason unedited, and kindly contributed to R-bloggers)

I had a very large (70,000+ columns) problem that I needed to reduce. A function took two matrices and transformed them into a single vector the length as the two inputs. I needed to reduce the inputs and then map the output back to the original position of the corresponding column. This entry may seem obvious to veteran `R` users and I am mainly writing this to provide a reference to myself. Here is a visual example of what I needed.

``````time <- c(1, 1, 2, 2, 3, 3)
money <- c(2, 2, 4, 4, 6, 6)
ownership <- c(1, 0, 1, 0, 1, 0)
mat <- rbind(time, money, ownership)
print(mat)``````
``````##           [,1] [,2] [,3] [,4] [,5] [,6]
## time         1    1    2    2    3    3
## money        2    2    4    4    6    6
## ownership    1    0    1    0    1    0``````
``````dat <- c(1, 0, 2, 0, 3, 0)
obj <- matrix(dat,nrow=1)
print(obj)``````
``````##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    2    0    3    0``````
``````f <- function(mat,obj){
#generic function with output of the same number of columns as obj
}
soln <- f(mat, obj)``````

where `soln` is a 1×6 matrix.

The size of my problem made the function `f` extremely slow and unreliable. I needed a way to reduce the inputs and then map the output appropriately to a results matrix.

The matrices `mat` and `obj` reduce to:

``````time <- c(1, 2, 3)
money <- c(2, 4, 6)
ownership <- c(1, 1, 1)
mat <- rbind(time, money, ownership)
print(mat)``````
``````##           [,1] [,2] [,3]
## time         1    2    3
## money        2    4    6
## ownership    1    1    1``````
``````dat <- c(1, 2, 3)
obj <- matrix(dat,nrow=1)
print(obj)``````
``````##      [,1] [,2] [,3]
## [1,]    1    2    3``````
``soln <- f(mat, obj)``

with the `soln` being a 1×3 matrix. For example:

``````soln=               [,1]     [,2]      [,3]
[1,] 4.151969 5.759826  5.537563 ``````

where the decision to exclude a column from `mat` is based on the value in `ownership[]=0` and the same for `obj`. The added difficulty, is that I need to be able to assign the output in `soln` to the mapped to the corresponding original position in a larger `SOLN` matrix. In this case columns 1,3,5. Ownership is randomly assigned, so there will be no pattern other than the zeros described above.

``````time <- c(1, 1, 2, 2, 3, 3)
money <- c(2, 2, 4, 4, 6, 6)
ownership <- c(1, 0, 1, 0, 1, 0)
mat <- rbind(time, money, ownership)
print(mat)``````
``````##           [,1] [,2] [,3] [,4] [,5] [,6]
## time         1    1    2    2    3    3
## money        2    2    4    4    6    6
## ownership    1    0    1    0    1    0``````
``````dat <- c(1, 0, 2, 0, 3, 0)
obj <- matrix(dat,nrow=1)
print(obj)``````
``````##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    2    0    3    0``````
``````obj2 <- obj[, as.logical(ownership), drop = FALSE]
print(obj2)``````
``````##      [,1] [,2] [,3]
## [1,]    1    2    3``````
``````mat2 <- mat[, as.logical(ownership)]
print(mat2)``````
``````##           [,1] [,2] [,3]
## time         1    2    3
## money        2    4    6
## ownership    1    1    1``````

evaluates to

``soln <- f(mat, obj)``

with the `soln` being a 1×3 matrix. For example:

``````soln=               [,1]     [,2]      [,3]
[1,] 4.151969 5.759826  5.537563 ``````

I need to create a result space:

``````SOLN <- matrix(data=NA,nrow=1,ncol=6)
print(SOLN)``````
``````##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]   NA   NA   NA   NA   NA   NA``````

Then map the results from `soln` to columns 1, 3, & 5.

``````SOLN[, as.logical(ownership)] <- soln
print(SOLN)``````
``````##          [,1] [,2]     [,3] [,4]     [,5] [,6]
## [1,] 4.151969   NA 5.759826   NA 5.537563   NA``````

A much more elegant solution than the `for` loops I was trying to write!

For more `R` posts visit:http://www.R-bloggers.com

To leave a comment for the author, please follow the link and comment on their blog: R – Jason unedited.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...