R tips: Swapping columns in a matrix

March 31, 2009 · Posted in R bloggers · Comments Off 

Using R, the statistical analysis and computing platform, swapping two columns in a matrix is really easy: m[ , c(1,2)] <- m[ , c(2,1)].

Note, however, that this does not swap the column names (if you have any) but only the values. You could do something like colnames(m)[c(1,2)] <- colnames(m)[c(2,1)] if you need the names changed as well, but better is perhaps just to assign:

m <- m[ , c(2, 1, 3:ncol(m)) ]

Enhanced tidy.source() (Preserve Some Comments)

March 31, 2009 · Posted in R bloggers · Comments Off 

After a few hours’ work, I modified the function tidy.source() in the animation package so that it can preserve complete comment lines. See the tidy.source() wiki page for example.

Downdload the R code here
tidy.source <- function(source = "clipboard", keep.comment = TRUE,
  keep.blank.line = FALSE, begin.comment, end.comment, ...) {
  # parse and deparse the code
  tidy.block = function(block.text) {
      exprs = parse(text = block.text)
      n = length(exprs)
      res = character(n)
      for (i in 1:n) {
        dep = paste(deparse(exprs[i]), collapse = "\n")
        res[i] = substring(dep, 12, nchar(dep) - 1)
      }
      return(res)
  }
  text.lines = readLines(source, warn = FALSE)
  if (keep.comment) {
      # identifier for comments
      identifier = function() paste(sample(LETTERS), collapse = "")
      if (missing(begin.comment))
        begin.comment = identifier()
      if (missing(end.comment))
        end.comment = identifier()
      # remove leading and trailing white spaces
      text.lines = gsub("^[[:space:]]+|[[:space:]]+$", "",
        text.lines)
      # make sure the identifiers are not in the code
      # or the original code might be modified
      while (length(grep(sprintf("%s|%s", begin.comment, end.comment),
        text.lines))) {
        begin.comment = identifier()
        end.comment = identifier()
      }
      head.comment = substring(text.lines, 1, 1) == "#"
      # add identifiers to comment lines to cheat R parser
      if (any(head.comment)) {
        text.lines[head.comment] = gsub("\"", "\'", text.lines[head.comment])
        text.lines[head.comment] = sprintf("%s=\"%s%s\"",
          begin.comment, text.lines[head.comment], end.comment)
      }
      # keep blank lines?
      blank.line = text.lines == ""
      if (any(blank.line) & keep.blank.line)
        text.lines[blank.line] = sprintf("%s=\"%s\"", begin.comment,
          end.comment)
      text.tidy = tidy.block(text.lines)
      # remove the identifiers
      text.tidy = gsub(sprintf("%s = \"|%s\"", begin.comment,
        end.comment), "", text.tidy)
  }
  else {
      text.tidy = tidy.block(text.lines)
  }
  cat(paste(text.tidy, collapse = "\n"), "\n", ...)
  invisible(text.tidy)
}

Note that inline comments will still be removed. I don’t want to spend more time on dealing with inline comments any more.

Related Posts

Multiple plot in a single image using ImageMagick

March 31, 2009 · Posted in R bloggers · Comments Off 
Sometimes you need to add several plots/images either by row or by column to a single page/sheet.
If you generate all your plot with R base graphics you can easily accomplished the task using the par() function, e.g., using par(mfrow=c(2,2)) and then drawing 4 plots of your choice.
However, if you need to create a single image build up from different sources, e.g. external images, plots not compatible with R base graphics, etc. , you can create/retrieve the single images and then merge them together using the tools from the Unix (Linux, Mac OS X, etc.) ImageMagick suite.

## Example
# we generate some random plot
require(seqLog)
## the first plot is taken from the seqLogo help ( ?seqLogo )
## I selected this example on purpose because the seqLogo function is based on the grid graphics
and is coded in such a way that doesn't allow the use of the par() function
mFile <- system.file("Exfiles/pwm1", package="seqLogo")
m <- read.table(mFile)
pwm <- makePWM(m)
png("seqLogo1.png", width=400, height=400)
seqLogo(pwm)
dev.off()
## totally unrelated
png("plot1.png", width=400, height=400)
plot(density(rnorm(1000)))
dev.off()


Then you can type:

system("convert \\( seqLogo1.png plot1.png +append \\) \\( seqLogo1.png plot1.png +append \\) -background none -append final.png")

Remember that in R you have to start escape character with '\' !

Or, alternatively, from the command line:

convert \( seqLogo1.png plot1.png +append \) \( seqLogo1.png plot1.png +append \) -background none -append final.png

See man convert and man ImageMagick for the full story.

How accurate or reliable are R calculations?

March 28, 2009 · Posted in R bloggers · Comments Off 

On the REvolutions Blog there is a nice posting treating the often raised concern on “How good or reliable R is”. At my university R is hardly used. Sometimes I was asked by lecturers wether the calculations done by R and its packages are accurate. The linked posting treats this matter and tries to clarify this point.


R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner

March 27, 2009 · Posted in R bloggers · Comments Off 

zipperzippersSometimes I find it useful to merge two data frames like the following ones

  X1 X2 X3 X4      Y1 Y2 Y3 Y4   
1  o  o  o  o       X  X  X  X
2  o  o  o  o       X  X  X  X
3  o  o  o  o       X  X  X  X

by using zip feeding either along the columns

   X1 Y1 X2 Y2 X3 Y3 X4 Y4
1  o  X  o  X  o  X  o  X
2  o  X  o  X  o  X  o  X
3  o  X  o  X  o  X  o  X

or along the rows of the data frames.

  V1 V2 V3 V4
1  o  o  o  o
4  X  X  X  X
2  o  o  o  o
5  X  X  X  X
3  o  o  o  o
6  X  X  X  X

The following function acts like a “zip fastener” for combining two dataframes. It takes the first column (or row) of the first data frame and places it next to the first column (or row) of the second data frame and so on. Only one dimension of the data frame has to be equal to do this. E.g. to combine the columns by zip feeding the number of rows must be equal and vice versa.

So here comes the code for the zipFastener() function. Actually its only the last few lines (from #zip fastener operations on) that do the job, but as I did not want to restrict the function to equal dimensions there is a little prelude.

###############################################################

# zipFastener for TWO dataframes of unequal length
zipFastener <- function(df1, df2, along=2)
{
    # parameter checking
    if(!is.element(along, c(1,2))){
        stop("along must be 1 or 2 for rows and columns
                                              respectively")
    }
    # if merged by using zip feeding along the columns, the
    # same no. of rows is required and vice versa
    if(along==1 & (ncol(df1)!= ncol(df2))) {
        stop ("the no. of columns has to be equal to merge
               them by zip feeding")
    }
    if(along==2 & (nrow(df1)!= nrow(df2))) {
        stop ("the no. of rows has to be equal to merge them by
               zip feeding")
    }

    # zip fastener preperations
    d1 <- dim(df1)[along]
    d2 <- dim(df2)[along]
    i1 <- 1:d1           # index vector 1
    i2 <- 1:d2 + d1      # index vector 2

    # set biggest dimension dMax
    if(d1==d2) {
        dMax <- d1
    } else if (d1 > d2) {
        length(i2) <- length(i1)    # make vectors same length, 
        dMax <- d1                  # fill blanks with NAs   
    } else  if(d1 < d2){
        length(i1) <- length(i2)    # make vectors same length,
        dMax <- d2                  # fill blanks with NAs   
    }
    
    # zip fastener operations
    index <- as.vector(matrix(c(i1, i2), ncol=dMax, byrow=T))
    index <- index[!is.na(index)]         # remove NAs
    
    if(along==1){
        colnames(df2) <- colnames(df1)   # keep 1st colnames                  
        res <- rbind(df1,df2)[ index, ]  # reorder data frame
    }
    if(along==2) res <- cbind(df1,df2)[ , index]           

    return(res)
}

###############################################################

Here come some examples.

###############################################################
### examples ###
require(plyr)

# data frames equal dimensions
df1 <- rdply(3, rep("o",4))[ ,-1]       # from plyr package
df2 <- rdply(3, rep("X",4))[ ,-1]       

zipFastener(df1, df2)
zipFastener(df1, df2, 2)
zipFastener(df1, df2, 1)

# data frames unequal in no. of rows
df1 <- rdply(10, rep("o",4))[ ,-1]
zipFastener(df1, df2, 1)
zipFastener(df2, df1, 1)

# data frames unequal in no. of columns
df2 <- rdply(10, rep("X",3))[ ,-1]
zipFastener(df1, df2)
zipFastener(df2, df1, 2)

###############################################################

I hope you find that useful.

Ciao, Mark


R tips: Eliminating the “save workspace image” prompt on exit

March 26, 2009 · Posted in R bloggers · Comments Off 

When using R, the statistical analysis and computing platform, I find it really annoying that it always prompts to save the workspace when I exit. This is how I turn it off.

I wish there was an option to change the default of the q/quit functions. I start and stop R frequently and so the exit question which I have to answer every time is really annoying:

Save workspace image? [y/n/c]:

Why is there no R option to disable this prompt? If I want to save the image, I have already saved it. And I don’t like the default name anyhow, preferring to give my own with save.image(file=...). For a while, I had a function defined in my ~/.Rprofile that terminated the session without prompting.

exit <- function() { q("no") }

While this means I can type exit() and avoid the annoying prompt, in practice I normally type Control-D to end the session which still calls the normal q function with its annoying prompt.

So instead I use the alias functionality of my (bash) shell to change the default. In my ~/.bashrc I now have

alias R="$(/usr/bin/which R) --no-save"

And finally I am happy. But I still think R should have an option (accessible through options) to change the default behavior.

R tips: Keep your packages up-to-date

March 25, 2009 · Posted in R bloggers · Comments Off 

In this entry in a small series of tips for the use of the R statistical analysis and computing tool, we look at how to keep your addon packages up-to-date.

One of the great strengths of R is the many packages available. All the new approaches, as well as some of the best implementations of your old favorites are there. But it can also be a little daunting, and so the CRAN task views are often the best way to get started and download a reasonable “bundle” of packages for your analysis.

First we need a place to store the packages. On Linux (and other Unix-like systems) I use the file ~/.Renviron to set the R_LIBS variable to where I want the files:

## R environment
R_LIBS="~/R"

On Windows, I set the same variable for the user account. Don’t forget to create the directory.

Now your can start R and install the CRAN task view package:

> install.packages("ctv")

Then I have a few things in my ~/.Rprofile startup file. The previous command probably prompted you for a download mirror which is annoying, so let’s exit R and edit the startup file to contain:

## Default CRAN mirror
local({r <- getOption("repos"); r["CRAN"] <- "http://cran.uk.r-project.org"; options(repos=r)})
## Libraries
require("utils", quietly=TRUE)
require("ctv", quietly=TRUE)

Then I define three functions. The first is to install the views I need. I like to try new things, so my list is long. Edit it to suit your needs:

install.myviews <- function() {
  require("ctv", quietly=TRUE)
  my.views = c("Bayesian", "Cluster", "Graphics", "gR", "HighPerformanceComputing", "MachineLearning", "Multivariate", "NaturalLanguageProcessing", "Robust", "SocialSciences", "Spatial", "Survival", "TimeSeries")
  install.views(views=my.views, lib=Sys.getenv("R_LIBS"), dependencies=c("Depends","Suggests"))
}

Try it out! Save the file, start R, and type install.myviews() at the prompt. If your list is as long as mine, then this may take some time and you may get some warnings and errors. We might add a tip on these later, but the main reason for the errors is probably that you are missing the development files for external libraries (or that R just can’t find it).

Now that we have finally got them, we need to make sure they are up-to-date. I add two functions to ~/.Rprofile:

update.local <- function() {
  update.packages(lib.loc=Sys.getenv("R_LIBS"), ask=FALSE)
}

update.myviews <- function() {
  require("ctv", quietly=TRUE)
  my.views = c("Bayesian", "Cluster", "Graphics", "gR", "HighPerformanceComputing", "MachineLearning", "Multivariate", "NaturalLanguageProcessing", "Robust", "SocialSciences", "Spatial", "Survival", "TimeSeries")
  update.views(views=my.views, lib.loc=Sys.getenv("R_LIBS"))
}

The first allows me to easily update all my locally installed libraries (not just these installed from views). The second updates my views which is useful when the view definitions change (rarely, but it happens as the recommended packages evolve).

Now I can of course update from the R command prompt using update.local() or update.myviews(). But that is not the main benefit. I can now update directly from the shell command line using commands like:

echo "update.local()" > /tmp/r.cmd
R CMD BATCH /tmp/r.cmd /tmp/r.out

The beauty of this is that I can add it to my crontab(5) and have it run automatically every night or every week as I feel I need it. This way I always have the latest versions installed.

Alternative implementations using ggplot2

March 25, 2009 · Posted in R bloggers · Comments Off 
Here and here, you can find alternative implementations of two plots  (1, 2) I created time ago using R basic graphic. The author recreates the plots taking advantage of the excellent ggplot2 package.

Inference for R

March 24, 2009 · Posted in R bloggers · Comments Off 

CREATE AUTOMATICALLY UPDATED R CHARTS AND TABLES INSIDE WORD & EXCEL

inf

Decision Science News’ imagination has been recently captured by an innovative product called Inference for R. (R as in the open-source language for statistical computation.) To use it, you simply insert some code into your Microsoft Office documents. The Inference product connects to the R engine on your computer and outputs the results of the computation directly into your Word doc or Excel spreadsheet. It even works for plots, as shown below:

inf2

The 2 minute video walk-through is informative. Since most DSN readers are academics, they might be happy to know that there is a free one-year academic license.

If you are interested in learning R, don’t miss the excellent:

Comparison of different circle graphs

March 24, 2009 · Posted in R bloggers · Comments Off 
See in my Picasa here and get corrplot package here. Thanks Bob O'Hara's advice:)I found people's tastes differ, so input parameter col (fill color) and bg (background color) was added in new edition. What is more, now you can order your variables using PCA (order=TRUE) to get a better impression.

Next Page »