## Analyze web traffic data with Google Analytics and R

February 7, 2013
By

If you run an e-commerce site, blog or other web property there's a good chance you use Google Analytics to monitor traffic, look at visitor sources, and measure conversions. And while Google Analytics is quite powerful at looking at historic activity on your site, it lacks much in the way of predictive analytics. That's where R shines of course,...

## Slideshows in R

February 7, 2013
By

A while ago I was asked to give a presentation at my job about using R to create statistical graphics. I had also just read some reviews of the Slidify package in R and I thought it would be extremely appropriate to create my presentation about visuali...

## Getting staRted with R.

February 7, 2013
By

As a PhD student and researcher, I often hear friends and colleagues say that they want to learn R, but that the learning curve is so steep that they can't seem to get started.  It's true that learning any tool as powerful as R can be confusing at...

## Slightly silly D3 example: shift one datapoint to get a significant result

February 7, 2013
By

Have you ever seen these scatterplots that report a significant correlation between X and Y, and look like it’s just the one point to the upper-right driving the correlation? Thanks to this interactive tool, you too can do this at home.

## Why cost and fuel efficiency are unrelated: Uncorrelated manifest variables can share the same latent causes

February 7, 2013
By

In structural equation modelling, we are typically proposing theoretical causes of observed phenomena. These are termed "latent" (the unobserved causes) and manifest (the observed variables we measure, otherwise known as data).Importantly, the theoretical causes of behavior need not have a structure remotely resembling the correlations observed in the data. You might have hundreds of columns of...

## Using ARPACK to compute the largest eigenvalue of a matrix

February 7, 2013
By

Thanks to Gábor Csárdi, author of the R interface to ARPACK, for this example of using (the R/Igraph interface to) arpack for finding the largest eigenvalue of a matrix. The key insight is that arpack solves the function passed to … Continue reading →

## Creating ‘Tags’ For PPC Keywords

February 7, 2013
By

When performing search engine marketing, it is usually beneficial to construct a system for making sense of keywords and their performance. While one could construct Bayesian Belief Networks to model the process of consumers clicking on ads, I have found that using ’tags’ to categorize keywords is just as useful for conducting post-hoc analysis on the effectiveness of marketing

## Data analysis class

February 7, 2013
By

I've been writing software to help others do data analysis for a number of years and at the same time trying to work up my nerve to try my own analysis. Why let other people have all the fun? So, when I saw that Jeffrey Leek, biostatistician at Johns Hopkins and coauthor of Simply Statistics, was teaching...

## R Bootcamp @ Sector67 in Madison

February 6, 2013
By

I am pleased to announce that together with Justin Meyer (also from the Wisconsin Department of Public Instruction) I will be presenting a two hour version of the R Bootcamp. Sector67 is a collaborative maker/hacker space in Madison, and is a great ven...

## Ryan Peek on using xts and ggplot for time-series data

February 6, 2013
By

At Davis R Users’ Group today, Ryan Peek gave a presentation on how he takes data from his field instruments and visualizes it in R. Here are his notes. The original *.Rmd file and data can be found here SHORT HOW-TO ON USING XTS AND GGPLOT FOR TIME SERIES DATA XTS is a very helpful package...

## Set operations on more than two sets in R

February 6, 2013
By

ProblemSet operations are a common place thing to do in R, and the enabling functions in the base stats package are:intersect(x, y)union(x, y)setdiff(x, y)setequal(x, y)That said, you'll note that each ONLY takes two arguments - i.e. set X and set Y - ...

## Modelling memory and news trajectories

February 6, 2013
By

Modelling memory In the text below I present two models I've made to quantify and visualise the diverging trajectories of memory and news events, and conclude that linear regression may be used to test which model best describes the story. First, though, I contextualise this with an illustration from the...

## Oracle R Enterprise 1.3 gives predictive analytics an in-database performance boost

February 6, 2013
By

Recently released Oracle R Enterprise 1.3 adds packages to R that enable even more in-database analytics. These packages provide horizontal, commonly used techniques that are blazingly fast in-database for large data. With Oracle R Enterprise 1.3, Oracle makes R even better and usable in enterprise settings. (You can download ORE 1.3 here and documentation here.) ...

## Make building R packages easier with devtools

February 6, 2013
By

If you're writing any significant amount of R code, you might want to start think about bundling it up into packages. An R package combines functions, data, documentation and unit tests, and is a convenient and reliable system to manage and version collections of R content that could otherwise become unwieldy. And if you want to share your code...

## The new Stan 1.1.1, featuring Gaussian processes!

February 6, 2013
By

We just released Stan 1.1.1 and RStan 1.1.1 As usual, you can find download and install instructions at: http://mc-stan.org/ This is a patch release and is fully backward compatible with Stan and RStan 1.1.0. The main thing you should notice is that the multivariate models should be much faster and all the bugs reported for The post The...

## xts object – subscript out of bounds

February 6, 2013
By

I bet you have seen this error a few times. When I compare large xts objects with different number of observations it would hit me right towards the end of the analysis.I wrote a small function, which allows me to check the length of the sets in advanc...

February 5, 2013
By

Some little birds had already been whispering about it, but I didn't want to jinx it and told myself I would wait with an announcement until the booksellers have (at least) placeholder pages. And as I learned from Duncan Murdoch via email earlier toda...

## counts numbers in a interval

February 5, 2013
By

Say I have a list of values, and I cut them by some break points, how do I know the number of values in each interval?We know cut() function in R works for the purpose.  For example,tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)x <- rep(0:8, tx0)> ...

## Learning RStudio for R Statistical Computing

February 5, 2013
By

"Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and … Continue reading →

## Dallas R Users: Learn Shiny this Saturday, 2/9

February 5, 2013
By

Just a heads-up for any R users in the Dallas/Fort Worth Metroplex: I’ll be presenting at the Dallas R Users Group this Saturday, 2/9/2013 at 10:00AM at the University of Dallas (1845 East Northgate Drive, Irving, TX). I’ll be talking about how to use RStudio’s new Shiny framework to create R-powered web applications. For the

## Collinearity and stepwise VIF selection

February 5, 2013
By
$Collinearity and stepwise VIF selection$

Collinearity, or excessive correlation among explanatory variables, can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. For example, forward or backward selection of variables could produce inconsistent results, variance partitioning analyses may be unable to identify unique sources of variation, or parameter estimates may include substantial amounts

## Learn about R through data mining

February 5, 2013
By

If you're in San Francisco for this week's DeveloperWeek conference, our own Joe Rickert will also giving a presentation on Wednesday at 2:10PM on Predictive Modeling with Big Data in R which will feature several demos of data mining massive data sets using the Revolution R Enterprise. Incidentally, the whole team Revolution Analytics was proud to receive the Top...

## Natura non facit saltus

February 5, 2013
By
$\mathbb{E}_{\mathbb{P}}\left(\sum_{i=1}^N Y_i\right)=\mathbb{E}_{\mathbb{P}}(N) \cdot \mathbb{E}_{\mathbb{P}}(Y_i)$

(see John Wilkins’ article on the – interesting – history of that phrase http://scienceblogs.com/evolvingthoughts/…). We will see, this week in class, several smoothing techniques, for insurance ratemaking. As a starting point, assume that we do not want to use segmentation techniques: everyone will pay exactly the same price. no segmentation of the premium And that price should be related to...

## Relearn boxplot and label the outliers

February 5, 2013
By

Despite the fact that box plot is used almost every where and taught at undergraduate statistic classes, I recently had to re-learn the box plot in order to know how to label the outliers.This stackoverflow post was where I found how...

## New Rcpp page on upcoming events — including Master Class in New York

February 5, 2013
By

Lots of exciting things are happening with and around Rcpp. I just added a new page about Upcoming Events to the recently-created Rcpp site. This events page has lots to cover: an upcoming talk at Columbia on March 8 (details still TBD), a day-lon...

## MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

February 5, 2013
By

In case you have not been constantly tracking the changes on the MCMSki IV webpage, here are some news: the number of invited and accepted contributed sessions in the program had considerably increased, to the point of almost filling two parallel sessions for the whole duration of the meeting. This includes an exciting round-table on

## 2011 Census Open Atlas Project

February 5, 2013
By

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for

## Tables from R into Word

February 5, 2013
By

A good looking table matters! This tutorial is on how to create a neat table in Word by combining knitr and R Markdown. I'll be using my own function, htmlTable, from the Gmisc package. Background: Because most journals that I submit to want...

## Proposed techniques for communicating the amount of information contained in a statistical result

February 5, 2013
By
$Proposed techniques for communicating the amount of information contained in a statistical result$

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update