Moving columns using basic english prepositions!

(This article was first published on Digital Age Economist on Digital Age Economist, and kindly contributed to R-bloggers)

Moveme!

I recently worked with a dataset that had over 100 columns and had to keep moving the order of the columns such that I could easier conduct my analysis. For example, whenever you try and conduct a multiple-factor analysis (FactoMineR::MFA), the function requires specific grouping of your variables to conduct the analysis. This meant that after feature engineering, I was left with the problem of having to order my columns so that the analysis could be run. By now you can guess the problem statement… how the heck was I suppose to move a 100 columns to specific places in the data set and do so in a clean, easy to read format?

If you are a regular user of tidyverse packages, you should be VERY familiar with the following code:

library(dplyr)
iris %>% 
  select(Species, everything()) %>% 
  head
##   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1  setosa          5.1         3.5          1.4         0.2
## 2  setosa          4.9         3.0          1.4         0.2
## 3  setosa          4.7         3.2          1.3         0.2
## 4  setosa          4.6         3.1          1.5         0.2
## 5  setosa          5.0         3.6          1.4         0.2
## 6  setosa          5.4         3.9          1.7         0.4

So, why is this code so familiar to you? Well, its because you have been using it to move the order of your columns within your data.frame. But, what if you didn’t want to move columns only to the front or back, but rather after certain columns, between two different columns etc. Imagine you could tell your data.frame to please (because #rstats people are polite) move column A just after B, move C to the front, and column G after F.

Well, thanks to the wonderful world of stackoverflow such a function exist if you know where to look! The original code is accredited to user A5C1D2H2I1M1N2O1R2T1 who answered a question on moving columns within a data frame without retyping. So if we take his code and sprinkle a tiny bit of magic, we to could integrate this into our tidy workflow:

moveme <- function (df, movecommand) {
  invec <- names(df)
  
  movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]], 
                                 ",|\\s+"), function(x) x[x != ""])
  movelist <- lapply(movecommand, function(x) {
    Where <- x[which(x %in% c("before", "after", "first", 
                              "last")):length(x)]
    ToMove <- setdiff(x, Where)
    list(ToMove, Where)
  })
  myVec <- invec
  for (i in seq_along(movelist)) {
    temp <- setdiff(myVec, movelist[[i]][[1]])
    A <- movelist[[i]][[2]][1]
    if (A %in% c("before", "after")) {
      ba <- movelist[[i]][[2]][2]
      if (A == "before") {
        after <- match(ba, temp) - 1
      }
      else if (A == "after") {
        after <- match(ba, temp)
      }
    }
    else if (A == "first") {
      after <- 0
    }
    else if (A == "last") {
      after <- length(myVec)
    }
    myVec <- append(temp, values = movelist[[i]][[1]], after = after)
  }
  
  df[,match(myVec, names(df))]
}

To use your new function, you can merely pipe the data.frame into the moveme function as follow:

a <- b <- c <- d <- e <- f <- g <- 1:100
df <- data.frame(a,b,c,d,e,f,g)
df <- df %>% tbl_df

# Usage
df %>% moveme(., "g first")
## # A tibble: 100 x 7
##        g     a     b     c     d     e     f
##          
##  1     1     1     1     1     1     1     1
##  2     2     2     2     2     2     2     2
##  3     3     3     3     3     3     3     3
##  4     4     4     4     4     4     4     4
##  5     5     5     5     5     5     5     5
##  6     6     6     6     6     6     6     6
##  7     7     7     7     7     7     7     7
##  8     8     8     8     8     8     8     8
##  9     9     9     9     9     9     9     9
## 10    10    10    10    10    10    10    10
## # ... with 90 more rows

Ok,so that isn’t that impressive, so lets try stringing multiple move commands into one character vector splitting the commands with a semi-colon ;:

df %>% moveme(., "g first; a last; e before c")
## # A tibble: 100 x 7
##        g     b     e     c     d     f     a
##          
##  1     1     1     1     1     1     1     1
##  2     2     2     2     2     2     2     2
##  3     3     3     3     3     3     3     3
##  4     4     4     4     4     4     4     4
##  5     5     5     5     5     5     5     5
##  6     6     6     6     6     6     6     6
##  7     7     7     7     7     7     7     7
##  8     8     8     8     8     8     8     8
##  9     9     9     9     9     9     9     9
## 10    10    10    10    10    10    10    10
## # ... with 90 more rows

As you can see this API allows for an endless array of ways you can move columns in one single blow. Verbs include:

  • before
  • after
  • first
  • last

So, once again, thanks again to the amazing #rstats community out there!

To leave a comment for the author, please follow the link and comment on their blog: Digital Age Economist on Digital Age Economist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)