## Comparing ESPN’s, CBS’s, and NFL.com’s Fantasy Football Projections using R

March 17, 2013
In the future, we will determine how to select the best possible team by maximizing your team's projected points and minimizing its downside risk.  But in order to do this, we will have to rely on our best guess of how many points each player will score.  We will use 2012 projections from ESPN, CBS, and NFL.com and actual...

## Extracting Information From Objects Using Names()

March 17, 2013
One of the big differences between a language like Stata compared to R is the ability in R to handle many different types of objects at once, and combine them together or pull them apart.  I had a post about objects last year, but I thought I'd sh...

## Mumbai, Mar 2013 – Portfolio Tutorial

March 17, 2013
(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave,...

## Variability of garch predictions

March 17, 2013
How variable are garch predictions? Previously There have been several posts on garch, in particular: A practical introduction to garch modeling The components garch model in the rugarch package Both of these posts speak about the two common prediction targets: prediction (of volatility) at the individual times (usually days) term structure prediction — the average … Continue reading...

## Ordinal Data

March 17, 2013
I expect to be getting some ordinal data, from 5 or 9 point rating scales, pretty soon, so I am having a look ahead how to treat those. Often ANOVA is used, even though it is well known not to be ideal fro a statistical point of view, so that is the st...

## Happy St Patrick’s Day

March 17, 2013
I love Saint Patrick’s Day for, at least, two reasons. The first one is that, on March 17th, you can play out loud The Pogues, the second one is that it’s the only day in the year when I really enjoy getting a Guiness in a pub. And Guiness is important in statistical science (I did mention a couple...

## caretEnsemble Classification example

March 16, 2013
Here's a quick demo of how to fit a binary classification model with caretEnsemble.  Please note that I haven't spent as much time debugging caretEnsemble for classification models, so there's probably more bugs than my last post.  ...

## Blend what?

March 16, 2013
Why?Over the years I have learned quite a few things about machine learning but I have never thought of writing them down properly. Too often I can't figure out exactly what I did when I look at my old codes. The time is NOW!More importantly, I have fa...

## GNU R loop speed comparison

March 16, 2013
Recently I had several discussions about using for loops in GNU R and how they compare to *apply family in terms of speed. I have not seen a direct benchmark comparing them so I decided to execute one (warning: some of the code presented today tak...

March 16, 2013
Scholarly metadata - the meta-information surrounding articles - can be super useful. Although metadata does not contain the full content of articles, it contains a lot of useful information, including title, authors, abstract, URL to the article, etc. One of the largest sources of metadata is provided via the Open Archives Initiative Protocol for Metadata Harvesting or OAI-PMH....

## Changing Axis Values in R Plot

March 15, 2013
A colleague asked me for how one can change axis attributes in a basic plot. Plotting anything in R is really, really easy. It is enough typing plot(x, y). In general, plot functions are nicely pre-cooked, so hardly one needs to change anything. But if changes in the default attributes are needed, it is possible

## Evaluation of Orthogonal Signal Correction for PLS modeling (OSC-PLS and OPLS)

March 15, 2013
Partial least squares projection to latent structures or PLS is one of my favorite modeling algorithms. PLS is an optimal algorithm for predictive modeling using wide data or data with  rows << variables. While there is s a wealth of literature regarding the application of PLS to various tasks, I find it especially useful for biological

## How Did I Miss “The Golden Dilemma”?

March 15, 2013
I am ashamed to admit that I am way behind (about 10,127 downloads) in discovering this wonderful paper: The Golden Dilemma (January 8, 2013)Erb, Claude B. and Harvey, Campbell R.Available at SSRN: http://ssrn.com/abstract=2078535 Here are the authors presenting the concept in July 2012 if you prefer slideshow format (thanks...

## How do I make my graphs?

March 15, 2013
Someone who wishes to remain anonymous writes: I’ve been following your blog a long time and enjoy your posts on visualization/statistical graphics matters. I don’t recall however you ever describing the details of your setup for plotting. I’m a new R user (convert from matplotlib) and would love to know your thoughts on the ideal The post How...

## Calender Heatmap with Google Analytics Data

March 15, 2013
As data analytics consulting firm, we think we are fortunate that we keep finding problems to find. Recently my team mate found a glaring problem of not having any connector for R with Google. With the inspiration from Michael, Ajay O, it soon become a worth problem to solve. With RGoogleAnalytics package now, we have

## Veterinary Epidemiologic Research: GLM – Logistic Regression

March 14, 2013
By
$Veterinary Epidemiologic Research: GLM – Logistic Regression$

We continue to explore the book Veterinary Epidemiologic Research and today we’ll have a look at generalized linear models (GLM), specifically the logistic regression (chapter 16). In veterinary epidemiology, often the outcome is dichotomous (yes/no), representing the presence or absence of disease or mortality. We code 1 for the presence of the outcome and 0

## Data Science Education gets personal

March 14, 2013
by Joseph B. Rickert It is difficult to imagine that there is anyone on the planet with an internet connection and a desire to learn something new who has not at least looked into taking a massive open online course (MOOC). Last Fall, in an 11/4/12 article, the New York Time declared the Year of the MOOC and quoted...

## Upcoming events

March 14, 2013
Highlighted LondonR is soon — see the “Previously Announced” section. New Events Thirsty Quants 2013 March 21, London. Some thirsty quants will be going for a drink on the 21st of March as of 18.30 at the Lamb Tavern in Leadenhall Market. http://www.lambtavernleadenhall.com/ Rethinking the Economics of Pensions 2013 March 21 & 22 in London. … Continue reading...

## Apply-style commands in R

March 14, 2013
Here's a quick table of what I think are the most useful apply-style commands in R: FunctionInputOutputBest forapplyRectangularRectangular or vectorApplying function to rows or columnslapplyAnythingListNon-trivial operations on almost any data typesap...

March 14, 2013
Nomen Est Omen?Lately, the terms "data science" and "data scientist" turn up at an increasing pace in the R-blog-sphere. Since its first occurrence (to my knowledge,  "data scientist" has been coined by DJ Patil and Jeff Hammerbacher in 2008), th...

## Using bigmemory with Rcpp

March 14, 2013
The bigmemory package allows users to create matrices that are stored on disk, rather than in RAM. When an element is needed, it is read from the disk and cached in RAM. These objects can be much larger than native R matrices. Objects stored as such larger-than-RAM matrices are defined in the big.matrix class and they are designed...

## On ENAR, or Statistical Meetings in General

March 14, 2013
Last year I accepted an invitation from Ben to go to ENAR 2013 -- my first ENAR. I used to go to JSM and useR!, and apparently I enjoy useR! most. The reason is not, or not only, because I'm more of a technical person. It is just hard to concentrate at large statistical conferences. I want...

## qdap 0.2.1 Released

March 13, 2013
I’m very pleased to announce the release of qdap 0.2.1 This is the second installment of the qdap package available at CRAN. The qdap package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse, including … Continue reading →

## In case you missed it: February 2013 Roundup

March 13, 2013
In case you missed them, here are some articles from February of particular interest to R users. How to resample from a large data set with RHadoop, and a video introduction to the RHadoop packages. A 90-second video explains: What is Revolution R Enterprise? Jeffrey Stanton has published a free e-book "An Introduction to Data Science" using R. I...

## John Snow’s Cholera data in more formats

March 13, 2013
In honour of the bicentenary of John Snow’s birth – and because I was asked to by someone via email – I have now released my digitisation of John Snow’s Cholera data in a few other formats: KML and as Google Fusion Tables. To save you reading my previous blog posts on the subject, I’ll

## Using maps and ggplot2 to visualize college hockey championships

March 13, 2013
Short: I plot the frequency of college hockey championships by state using the maps package, and ggplot2 Note: this example is based heavily on the example provided athttp://www.dataincolour.com/2011/07/maps-with-ggplot2/ data reference:http://en.wikipedia.org/wiki/NCAA_Men%27s_Ice_Hockey_Championship Question of interestAs a good Minnesotan, I've believed for quite some time that the colder, Northern states enjoy a competitive advantage when it...

## Webinar tomorrow: 100% R and More

March 13, 2013
A quick note that I'll be hosting our regularly-scheduled webinar, Revolution R Enterprise, 100% R and More, at 10AM Pacific tomorrow. If you're new to R, or want to learn about the power, scalability and productivity features of Revolution R Enterprise, this is a great place to start. Revolution Analytics webinars: Revolution R Enterprise, 100% R and More

## New package for ensembling R models

March 13, 2013
I've written a new R package called caretEnsemble for creating ensembles of caret models in R.  It currently works well for regression models, and I've written some preliminary support for binary classification models. At th...

## R needs some bureaucracy

March 12, 2013
Writing a program in R is almost bureaucracy free: variables don’t need to be declared, the language does a reasonable job of guessing the type a value might need to be automatically be converted to, there is no need to create a function having a special name that gets called at program startup, the commonly