# Monthly Archives: February 2013

## Dallas R Users: Learn Shiny this Saturday, 2/9

February 5, 2013
By

Just a heads-up for any R users in the Dallas/Fort Worth Metroplex: I’ll be presenting at the Dallas R Users Group this Saturday, 2/9/2013 at 10:00AM at the University of Dallas (1845 East Northgate Drive, Irving, TX). I’ll be talking about how to use RStudio’s new Shiny framework to create R-powered web applications. For the

## Collinearity and stepwise VIF selection

February 5, 2013
By
$Collinearity and stepwise VIF selection$

Collinearity, or excessive correlation among explanatory variables, can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. For example, forward or backward selection of variables could produce inconsistent results, variance partitioning analyses may be unable to identify unique sources of variation, or parameter estimates may include substantial amounts

## Learn about R through data mining

February 5, 2013
By

If you're in San Francisco for this week's DeveloperWeek conference, our own Joe Rickert will also giving a presentation on Wednesday at 2:10PM on Predictive Modeling with Big Data in R which will feature several demos of data mining massive data sets using the Revolution R Enterprise. Incidentally, the whole team Revolution Analytics was proud to receive the Top...

## Natura non facit saltus

February 5, 2013
By
$\mathbb{E}_{\mathbb{P}}\left(\sum_{i=1}^N Y_i\right)=\mathbb{E}_{\mathbb{P}}(N) \cdot \mathbb{E}_{\mathbb{P}}(Y_i)$

(see John Wilkins’ article on the – interesting – history of that phrase http://scienceblogs.com/evolvingthoughts/…). We will see, this week in class, several smoothing techniques, for insurance ratemaking. As a starting point, assume that we do not want to use segmentation techniques: everyone will pay exactly the same price. no segmentation of the premium And that price should be related to...

## Relearn boxplot and label the outliers

February 5, 2013
By

Despite the fact that box plot is used almost every where and taught at undergraduate statistic classes, I recently had to re-learn the box plot in order to know how to label the outliers.This stackoverflow post was where I found how...

## New Rcpp page on upcoming events — including Master Class in New York

February 5, 2013
By

Lots of exciting things are happening with and around Rcpp. I just added a new page about Upcoming Events to the recently-created Rcpp site. This events page has lots to cover: an upcoming talk at Columbia on March 8 (details still TBD), a day-lon...

## MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

February 5, 2013
By

In case you have not been constantly tracking the changes on the MCMSki IV webpage, here are some news: the number of invited and accepted contributed sessions in the program had considerably increased, to the point of almost filling two parallel sessions for the whole duration of the meeting. This includes an exciting round-table on

## 2011 Census Open Atlas Project

February 5, 2013
By

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for

## Tables from R into Word

February 5, 2013
By

A good looking table matters! This tutorial is on how to create a neat table in Word by combining knitr and R Markdown. I'll be using my own function, htmlTable, from the Gmisc package. Background: Because most journals that I submit to want...

## Proposed techniques for communicating the amount of information contained in a statistical result

February 5, 2013
By
$Proposed techniques for communicating the amount of information contained in a statistical result$

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update