The R Journal Volume 4/1, June 2012

July 6, 2012
By

As first reported by Paolo, the new R journal is out! You can Download the complete issue from here.  Refereed articles may be downloaded individually using the links below. Table of Contents Editorial 3   Contributed Research Articles   Analysing Seasonal Data  Adrian G Barnett, Peter Baker and Annette J Dobson 5 MARSS: Multivariate Autoregressive State-space...

Read more »

Three hours of pure soccer emotion, visualized with R

July 6, 2012
By
Three hours of pure soccer emotion, visualized with R

The biggest prize in UK soccer, the Premier League Championship, is decided by a points system. Unlike most sports competitions, there's no final round or playoff series: once the regular round of games is complete, the team that has accumulated the most points (three for a win, and one for a draw) is the champion of English football. In...

Read more »

Soda vs. Pop with Twitter

July 6, 2012
By
Soda vs. Pop with Twitter

One of the great things about Twitter is that it’s a global conversation anyone can join anytime. Eavesdropping on the world, what what! Of course, it gets even better when you can mine all this chatter to study the way humans live and interact. For example, how do people in New York City differ from those in Silicon Valley? We...

Read more »

Error metrics for multi-class problems in R: beyond Accuracy and Kappa

July 6, 2012
By
Error metrics for multi-class problems in R: beyond Accuracy and Kappa

The caret package for R provides a variety of error metrics for regression models and 2-class classification models, but only calculates Accuracy and Kappa for multi-class models.  Therefore, I wrote the following function to allow caret:::train t...

Read more »

RSAP, Rook and ERP

RSAP, Rook and ERP

As I wrote in my blog Analytics with SAP and R (Windows version) we can use RSAP to connect to our ERP system and play with the data. This time I wanted of course, to keep exploring the capabilities of RSAP, but using something else. As everybody kno...

Read more »

Fix Overplotting with Colored Contour Lines

July 6, 2012
By
Fix Overplotting with Colored Contour Lines

I saw this plot in the supplement of a recent paper comparing microarray results to RNA-seq results. Nothing earth-shattering in the paper - you've probably seen a similar comparison many times before - but I liked how they solved the overplotting...

Read more »

Interest Differencing: Folk Commonly Followed by Tweeting MPs of Different Parties

July 6, 2012
By
Interest Differencing: Folk Commonly Followed by Tweeting MPs of Different Parties

Earlier this year I doodled a recipe for comparing the folk commonly followed by users of a couple of BBC programme hashtags (Social Media Interest Maps of Newsnight and BBCQT Twitterers). Prompted in part by a tweet from Michael Smethurst/@fantasticlife about generating an ESP map for UK politicians (something I’ve also doodled before – Sketching

Read more »

A practical introduction to garch modeling

July 6, 2012
By
A practical introduction to garch modeling

We look at volatility clustering, and some aspects of modeling it with a univariate GARCH(1,1) model. Volatility clustering Volatility clustering — the phenomenon of there being periods of relative calm and periods of high volatility — is a seemingly universal attribute of market data.  There is no universally accepted explanation of it. GARCH (Generalized AutoRegressive … Continue reading...

Read more »

The R Journal Volume 4/1

July 6, 2012
By
The R Journal Volume 4/1

The 'Summer edition' of the R Journal is out! Get it from here.

Read more »

automated cell phenotyping — R package “EBImage”

July 5, 2012
By
automated cell phenotyping — R package “EBImage”

Counting cells under microscope is always laborious and null. Those in the art would be relieved with assistance of a powerful image processing package, EBImage. Images are treated as “Image” objects, essentially multi-dimensional arrays. The class “Image” contains spatial information, pixel … Continue reading →

Read more »

More Exploration of Crazy RUT

July 5, 2012
By
More Exploration of Crazy RUT

Unintentionally while playing with the lawstat package in R, I started trying to build systems (STANDARD DISCLAIMER: NOT INVESTMENT ADVICE AND WILL LOSE LOTS OF MONEY SO PROCEED WITH CAUTION) based on the Jarque Bera test of normality (entry in Wikiped...

Read more »

A better ‘nls’ (?)

July 5, 2012
By
A better ‘nls’ (?)

Those that do a lot of nonlinear regression will love the nls function of R. In most of the cases it works really well, but there are some mishaps that can occur when using bad starting values for the parameters. One of the most dreaded is the “singular gradient matrix at initial parameter estimates” which

Read more »

Health Care Costs – Part 1, "The Problem"

July 5, 2012
By
Health Care Costs – Part 1, "The Problem"

The Problem In the United States, health care costs have been going up for a number of years, even when adjusted for inflation. Not unlike a runaway freight train, this rampant inflation cannot continue indefinitely without crashing. ...

Read more »

New R User Group in Leipzig, Germany

July 5, 2012
By

Leipzig R Statistical Computing is the sixth local R user group in Germany, and has been holding meetings since February. In the next meeting on July 12, member Claudia Beleites will talk about her pacakges softclassval (for classifier performance measures) and hyperspec (for hyperspectral data). meetup.com: Leipzig R Statistical Computing

Read more »

Validating email adresses in R

July 5, 2012
By

I currently program an automated report generation in R – participants fill out a questionnaire, and they receive a nicely formatted pdf with their personality profile. I use knitr, LaTex, and the sendmailR package. Some participants did not provide valid email addresses, which caused the sendmail function to crash. Therefore I wanted some validation of

Read more »

A tiny RCurl headache ;)

July 4, 2012
By

As more and more data go online (plus we love Google Drive) we are forced to connect to our data over the net. We mostly do this via RCurl (but we could do this using RGoogleDocs as well).In that case all that is required to get the data into R is the two lines of

Read more »

A tiny RCurl headache ;)

July 4, 2012
By

As more and more data go online (plus we love Google Drive) we are forced to connect to our data over the net. We mostly do this via RCurl (but we could do this using RGoogleDocs as well).In that case all that is required to get the data into R is the two lines of ...read more

Read more »

A new open journal on Data Science

July 4, 2012
By

Springer has introduced a new open, peer-reviewed journal focused on Data Science: EPJ Data Science. What makes this a Data Science journal is novel uses of statistics, data analysis, computer techniques and public data sources to research a topic in another domain, rather than methodological research. Here are a few examples of the papers you'll find in the journal:...

Read more »

Alternative to Monte Carlo Testing

July 4, 2012
By
Alternative to Monte Carlo Testing

When we backtest a strategy on a portfolio, it is a simple analysis of a single period in time. There are ways to “stress test” a strategy such as monte carlo, random portfolios, or shuffling the returns in a random order. I could never really wrap my head around monte carlo and shuffling the returns … Continue reading...

Read more »

Three Questions about a Matrix of Coefficient Plots

July 4, 2012
By

It's Independence Day in the U.S., so I am taking the day off, but I received the following request for advice and thought I'd pass it along to my readers. I wonder if you could help – I am trying to create 9 different coefficient plots , which repr...

Read more »

A tutorial on outlier detection techniques

July 4, 2012
By
A tutorial on outlier detection techniques

by Yanchang Zhao, RDataMining.com There is an excellent tutorial on outlier detection techniques, presented by Hans-Peter Kriegel et al. at ACM SIGKDD 2010. It presents many popular outlier detection algorithms, most of which were published between mid 1990s and 2010, … Continue reading →

Read more »

The Higgs boson: 5-sigma and the concept of p-values

July 4, 2012
By
The Higgs boson: 5-sigma and the concept of p-values

Why are physicists talking about 5-sigma, and what's it got to do with statistics? In this short post I'll explain what 5-sigma is and why it's not a measure of how certain scientist are that they've found the Higgs boson

Read more »

Glmnet_1.8 uploaded to CRAN

July 4, 2012
By

(by Trevor Hastie) Glmnet_1.8 uploaded to CRAN – This is a major revision, with two additional models included. 1) Multiresponse regression – family=”mgaussian” Here we have a matrix of M responses, and we fit a series of linear models in parallel. We use a group-lasso penalty on the set of M coefficients for each variable. This means they are...

Read more »

To the Basics: Bayesian Inference on A Binomial Proportion

July 4, 2012
By
To the Basics: Bayesian Inference on A Binomial Proportion

Think of something observable – countable – that you care about with only one outcome or another. It could be the votes cast in a two-way election in your town, or the free throw shots the center on your favorite...

Read more »

Example of Factor Attribution

July 3, 2012
By
Example of Factor Attribution

In the prior post, Factor Attribution 2, I have shown how Factor Attribution can be applied to decompose fund’s returns in to Market, Capitalization, and Value factors, the “three-factor model” of Fama and French. Today, I want to show you a different application of Factor Attribution. First, let’s run Factor Attribution on each the stocks

Read more »

RcppBDT 0.2.0

A new release of the RcppBDT package appeared on CRAN earlier today. RcppBDT uses Rcpp, and in particular the nifty Rcpp modules feature of wrapping C++ code for R just by declaring the (class or function) interfaces. It uses this to bring in some useful functions from Boost Date.Time to R so that one can do things like R> library(RcppBDT) R> sapply(2012:2016, function(year) +...

Read more »

The role of Statistics in the Higgs Boson discovery

July 3, 2012
By
The role of Statistics in the Higgs Boson discovery

News is starting to leak that the Large Hadron Collider may have accomplished its primary mission of confirming the existence of the hypothesised and heretofore elusive subatomic particle, the Higgs Boson. And sure, billions of Euros worth of state-of-the-art high-energy machinery and an army of experimental and theoretical physicists probably had something to do with the discovery. But did...

Read more »

An Improvement to Coefficient Plots

July 3, 2012
By

I recently posted about coefficient plots, discussing my approach and providing some example R code to create the graphs. I had the good fortune of hearing Amanda Driscoll give a talk recently, and she made a small, but really nice … Continue rea...

Read more »

Combining ggplot Images

July 3, 2012
By
Combining ggplot Images

The ggplot2 package provides an excellent platform for data visualization. One (minor) drawback of this package is that combining ggplot images into one plot, like the par() function does for regular plots, is not a straightforward procedure. Fortunately, R user Stephen Turner has kindly provided a function called “arrange” that does exactly this. The function,

Read more »