## Changing Axis Values in R Plot

March 15, 2013
By

A colleague asked me for how one can change axis attributes in a basic plot. Plotting anything in R is really, really easy. It is enough typing plot(x, y). In general, plot functions are nicely pre-cooked, so hardly one needs to change anything. But if changes in the default attributes are needed, it is possible

## Evaluation of Orthogonal Signal Correction for PLS modeling (OSC-PLS and OPLS)

March 15, 2013
By

Partial least squares projection to latent structures or PLS is one of my favorite modeling algorithms. PLS is an optimal algorithm for predictive modeling using wide data or data with  rows << variables. While there is s a wealth of literature regarding the application of PLS to various tasks, I find it especially useful for biological

## How Did I Miss “The Golden Dilemma”?

March 15, 2013
By

I am ashamed to admit that I am way behind (about 10,127 downloads) in discovering this wonderful paper: The Golden Dilemma (January 8, 2013)Erb, Claude B. and Harvey, Campbell R.Available at SSRN: http://ssrn.com/abstract=2078535 Here are the authors presenting the concept in July 2012 if you prefer slideshow format (thanks...

## How do I make my graphs?

March 15, 2013
By

Someone who wishes to remain anonymous writes: I’ve been following your blog a long time and enjoy your posts on visualization/statistical graphics matters. I don’t recall however you ever describing the details of your setup for plotting. I’m a new R user (convert from matplotlib) and would love to know your thoughts on the ideal The post How...

## Calender Heatmap with Google Analytics Data

March 15, 2013
By

As data analytics consulting firm, we think we are fortunate that we keep finding problems to find. Recently my team mate found a glaring problem of not having any connector for R with Google. With the inspiration from Michael, Ajay O, it soon become a worth problem to solve. With RGoogleAnalytics package now, we have

## Veterinary Epidemiologic Research: GLM – Logistic Regression

March 14, 2013
By
$Veterinary Epidemiologic Research: GLM – Logistic Regression$

We continue to explore the book Veterinary Epidemiologic Research and today we’ll have a look at generalized linear models (GLM), specifically the logistic regression (chapter 16). In veterinary epidemiology, often the outcome is dichotomous (yes/no), representing the presence or absence of disease or mortality. We code 1 for the presence of the outcome and 0

## Data Science Education gets personal

March 14, 2013
By

by Joseph B. Rickert It is difficult to imagine that there is anyone on the planet with an internet connection and a desire to learn something new who has not at least looked into taking a massive open online course (MOOC). Last Fall, in an 11/4/12 article, the New York Time declared the Year of the MOOC and quoted...

## Upcoming events

March 14, 2013
By

Highlighted LondonR is soon — see the “Previously Announced” section. New Events Thirsty Quants 2013 March 21, London. Some thirsty quants will be going for a drink on the 21st of March as of 18.30 at the Lamb Tavern in Leadenhall Market. http://www.lambtavernleadenhall.com/ Rethinking the Economics of Pensions 2013 March 21 & 22 in London. … Continue reading...

## Apply-style commands in R

March 14, 2013
By

Here's a quick table of what I think are the most useful apply-style commands in R: FunctionInputOutputBest forapplyRectangularRectangular or vectorApplying function to rows or columnslapplyAnythingListNon-trivial operations on almost any data typesap...

March 14, 2013
By

Nomen Est Omen?Lately, the terms "data science" and "data scientist" turn up at an increasing pace in the R-blog-sphere. Since its first occurrence (to my knowledge,  "data scientist" has been coined by DJ Patil and Jeff Hammerbacher in 2008), th...

## Using bigmemory with Rcpp

March 14, 2013
By

The bigmemory package allows users to create matrices that are stored on disk, rather than in RAM. When an element is needed, it is read from the disk and cached in RAM. These objects can be much larger than native R matrices. Objects stored as such larger-than-RAM matrices are defined in the big.matrix class and they are designed...

## On ENAR, or Statistical Meetings in General

March 14, 2013
By

Last year I accepted an invitation from Ben to go to ENAR 2013 -- my first ENAR. I used to go to JSM and useR!, and apparently I enjoy useR! most. The reason is not, or not only, because I'm more of a technical person. It is just hard to concentrate at large statistical conferences. I want...

## qdap 0.2.1 Released

March 13, 2013
By

I’m very pleased to announce the release of qdap 0.2.1 This is the second installment of the qdap package available at CRAN. The qdap package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse, including … Continue reading →

## In case you missed it: February 2013 Roundup

March 13, 2013
By

In case you missed them, here are some articles from February of particular interest to R users. How to resample from a large data set with RHadoop, and a video introduction to the RHadoop packages. A 90-second video explains: What is Revolution R Enterprise? Jeffrey Stanton has published a free e-book "An Introduction to Data Science" using R. I...

## John Snow’s Cholera data in more formats

March 13, 2013
By

In honour of the bicentenary of John Snow’s birth – and because I was asked to by someone via email – I have now released my digitisation of John Snow’s Cholera data in a few other formats: KML and as Google Fusion Tables. To save you reading my previous blog posts on the subject, I’ll

## Using maps and ggplot2 to visualize college hockey championships

March 13, 2013
By

Short: I plot the frequency of college hockey championships by state using the maps package, and ggplot2 Note: this example is based heavily on the example provided athttp://www.dataincolour.com/2011/07/maps-with-ggplot2/ data reference:http://en.wikipedia.org/wiki/NCAA_Men%27s_Ice_Hockey_Championship Question of interestAs a good Minnesotan, I've believed for quite some time that the colder, Northern states enjoy a competitive advantage when it...

## Webinar tomorrow: 100% R and More

March 13, 2013
By

A quick note that I'll be hosting our regularly-scheduled webinar, Revolution R Enterprise, 100% R and More, at 10AM Pacific tomorrow. If you're new to R, or want to learn about the power, scalability and productivity features of Revolution R Enterprise, this is a great place to start. Revolution Analytics webinars: Revolution R Enterprise, 100% R and More

## New package for ensembling R models

March 13, 2013
By

I've written a new R package called caretEnsemble for creating ensembles of caret models in R.  It currently works well for regression models, and I've written some preliminary support for binary classification models. At th...

## R needs some bureaucracy

March 12, 2013
By

Writing a program in R is almost bureaucracy free: variables don’t need to be declared, the language does a reasonable job of guessing the type a value might need to be automatically be converted to, there is no need to create a function having a special name that gets called at program startup, the commonly

## Rcpp master class in New York last weekend

March 12, 2013
By

On Saturday I had the opportunity to teach another one-day master class on Rcpp. The class had been organized by Jared Lander, and organized very well I might add. The weekend started with a slight disappointment. I had taken Friday off, and hoped t...

March 12, 2013
By

Conrad released a first bug-fix release 3.800.1 of Armadillo earlier today. This has been wrapped up in release 0.3.800.1 of RcppArmadillo as usual. This release also contains a very nice function sample() (contributed by Christian Gunning) which p...

## A map of worldwide email traffic, created with R

March 12, 2013
By

The Washing Post reports that by analyzing more than 10 million emails sent through the Yahoo! Mail service in 2012, a team of researchers used the R language to create a map of countries whose citizens email each other most frequently: The chart above shows the top 1000 country-country pairs by email frequency, arranged in a clustered network using...

## Distrust of R

March 12, 2013
By

I guess I've been living in a bubble for a bit, but apparently there are a lot of people who still mistrust R. I got asked this week why I used R (and, specifically, the package rpart) to generate classification and regression trees instead of SAS Ente...

## R to Latex packages: Coverage

March 12, 2013
By

There are now quite a few R packages to turn cross-tables and fitted models into nicely formatted latex. In a previous post I showed how to use one of them to display regression tables on the fly. In this post I summarise what types of R object each of the major packages can deal with.

March 12, 2013
By

The AQP family of R packages has seen a lot of development over the last 3 months. Some of the highlights include: HTML manual pages with syntax-highlighting and figures, c/o knitr new vignettes: "dealing with bad data", gridded SSURGO (gSSURGO) demo,...

March 12, 2013
By

We finally got around to prepare everything we needed to advertise the position which will be available in the MRC grant we've been awarded last year.The project will run for 30 months and we're looking for a post-doctoral candidate to work on the Rese...

## reports 0.1.2 Released

March 12, 2013
By

I’m very pleased to announce the release of reports : An R package to assist in the workflow of writing academic articles and other reports. This is the first CRAN release of reports: http://cran.r-project.org/web/packages/reports/index.html The reports package assists in writing … Continue reading →

## Third Milano R net meeting to be held on April 18, 2013

March 12, 2013
By

Third Milano R net meeting April 18, 2013 @ 6.00 PM Fiori Oscuri Bistrot & Bar Via Fiori Oscuri, 3 Milano Further details will be published shortly. Stay connected!

## How to use optim in R

March 12, 2013
By

A friend of mine asked me the other day how she could use the function optim in R to fit data. Of course there are functions for fitting data in R and I wrote about this earlier. However, she wanted to understand how to do this from scratch using optim. The function optim provides algorithms for general...