## Reserving with negative increments in triangles

April 11, 2013
By
$Y_i$

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a translation technique: Fit a model not on ‘s but on , for some , Use that model to make predictions, and then...

## Stepwise Regression for Big Data with RevoScaleR

April 11, 2013
By

by Joseph Rickert In a recent blog post, Revolution's Thomas Dinsmore announced stepwise regression for big data as a new feature of Revolution R Enterprise 6.2 that is scheduled for general availability later this month. Today, I would like to provide a simple example of doing stepwise regression with rxLinMod() (the RevoScaleR analog of lm()), using a 100,000 row...

## High Obesity levels found among fat-tailed distributions

April 11, 2013
By

In my never ending quest to find the perfect measure of tail fatness, I ran across this recent paper by Cooke, Nieboer, and Misiewicz. They created a measure called the “Obesity index.” Here’s how it works: Step 1: Sample four times from a distribution. The sample points should be independent and identically distributed (did your

## Spring Cleaning Data: 4 of 6- Combining the files & Changing the Dates/Credit Type

April 11, 2013
By

So far the individual files have been left on their own, it is now time to combine using the rbind function, simple enough after all we have done so far, then the quick check with summary.Now that we have one data frame, time to make larger changes to ...

## Summarizing Data in R

April 10, 2013
By

When work with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. Of course, R also has similar calculations that can be used to

## In case you missed it: March 2013 Roundup

April 10, 2013
By

In case you missed them, here are some articles from March of particular interest to R users. Facebook used R to analyze profile photo changes to create a map of same-sex marriage support in the USA. Joe Rickert contrasts random sampling with fitting models directly to large data sets. A presentation by Carlos Somohano summarizes the history, skills and...

## A quick introduction to ggplot2

April 10, 2013
By

My friend Jonah asked me to guest lecture in his R seminar aimed at grad students and postdocs in Integrative Biology. I gave Jonah a bunch of topic options ranging from reproducible research with R to data manipulation. The consensus was data visualization so I put together a 2 hour talk/hands on presentation for ggplot2

## Tweaking Movie Subtitles with R

April 10, 2013
By

I use R to fix subtitles that are not in sync with my movies. For the example below the subs were showing too early - so I added some time to each sequence in the srt file. For simplicity I used exactly 1 second in the below example.You'll see that I use my function dl_from_dropbox(), on which I wrote...

April 10, 2013
By

Here is a usefull snippet that I stole from qdap::url_dl to download files from my Dropbox to the working directory.Argument x is the document name and d the document key. dl_from_dropbox require(RCurl) ...

## Are knuckleballers more volatile?

April 10, 2013
By

For years, the Blue Jays have been also-rans in the AL East but splashed out this season turning prospects into established stars in the hope of reaching the World Series Seven games in and the 2-5 start has the perennial doubts resurfacing, particularly as none of the much-vaunted starters has yet to pitch a seventh

## R and social media

April 10, 2013
By

R is a piece of software, but it is also a community. Help community The most visible aspect of the R community is help.  This is also the most useful to new users.  The initial sense of cooperation with R was driven mainly by people helping each other. You don’t need to actively participate in The post R...

## A few lists for data scientists and statisticians

April 10, 2013
By

Looking for more resources on the web or people to follow on Twitter? Here are some lists you may find useful: 100 Savvy Sites on Statistics, which includes 17 sites that focus on R Programming Kalido offers this list of 30 influential data scientists on Twitter Big Data Republic is taking votes for this list of 100 influential tweeters...

## Highlight cells in markdown tables

April 10, 2013
By

Although I have always wanted to add such feature to pander, a recent question on SO urged me to create some helper functions so that users could easily highlight some rows, columns or even just a few cells in a table and export the result to markdown,...

## Video: Using R for causal inference in a study of expensive public policy decisions

April 10, 2013
By

This post shares the video from a talk presented on 9th April 2013 by Jim Savage at Melbourne R Users. Billions of dollars a year are spent subsidising tuition of Australian university students. A controversial report last year by the … Continue reading →

## Spring Cleaning Data: 3 of 6- The Little but Big Correction

April 10, 2013
By

Building on the previous posts (post 1 & post 2) I found there were 12 instances with the type of credit where there was a "Primary*" which means the lender borrowed twice in the same day, in the 2010 q4 data. It would seem simple enough in Excel, ...

## Global Distribution of Breast Cancer: some initial considerations

April 10, 2013
By

As mentioned on a previous post, I am interested in analysing if people’s ‘unhealthy’ lifestyle is associated to new cases of cancer diagnosed globally. The outcome variable I want to explore (at least for now), is the number of new cases of breast cancer in 100,000 female residents. I have this data for 173...

## highlight 0.4.1

April 10, 2013
By

The highlight package has been missing from CRAN for quite some time Now it is back, with fewer dependencies. It used to depend on Rcpp and parser, but since the code logic from parser has been brought to R, highlight … Continue reading →

## Mobile version of the graph gallery

April 10, 2013
By

The R Graph Gallery has been a popular website for many years now. The number of graphics keeps growing as people send me their code. When browsing the website with a mobile device the experience was frustrating, as too much … Continue reading →

## Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda

April 10, 2013
By

April 18, 2013 - 18:00 - 21:00 Fiori Oscuri Bistrot & Bar (www.fiorioscuri.it) Via Fiori Oscuri, 3 - Milano (Zona Brera) 18.00 - 18.15 Registration 18.15 - 18.30 Welcome presentation Andrea Spanò, Partner at Quantide 18.30 - 19.00 Digit recognition Machine … Continue reading →

## Finding the Distribution Parameters

April 9, 2013
By

This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution

## 2013-4 Generating Structured and Labelled SVG

April 9, 2013
By

This article discusses the importance of providing structure and labelling within SVG code, particularly when the SVG code is generated indirectly by a high-level system and when the SVG code describes a complex image such as a statistical plot. We … Continue reading →

## Second edition of Crawley’s The R Book

April 9, 2013
By

The second edition of Michael Cawley's The R Book is available from Wiley. According to the publisher, the new edition boasts the following features:"Features full colour text and extensive graphics throughout.Introduces a clear structure with numbered...

## Some R User Group Presentations from Europe

April 9, 2013
By

by Joseph Rickert I am beginning to get excited about going to Spain for useR 2013 which will be held at the University of Castilla-La Mancha, so I have been using the links on the Revolution's local user directory webpage to see what the European R user groups are doing. Here are just a few highlights of materials that...

## Behind the NCAA Visualizer: Python, R and JavaScript

April 9, 2013
By

Rodrigo Zamith's NCAA Tournament Visualizer is a great example of an interactive data visualization. If you want to create something similar, Rodrigo has shared detailed behind-the-scenes information on how it was created. He used a mix of tools: Python was used to scrape team statistics fromt the NCAA website R was used to prepare the data for analysis, and...

## Matrix Cumulative Coherence: Fourier Bases, Random and Sensing Matrices

April 9, 2013
By

Compressive sampling (CS) is revolutionizing the way we process analog to digital conversion, our understanding of linear systems and the limits of information theory. One of the key concept in CS is that a signal can be represented in a sparse bases o...

## Spring Cleaning Data: 2 of 6- Changing Column Names and Adding a Column

April 9, 2013
By

The first post (found here) we downloaded the data and imported it to R using the gdata package. This post we will be changing the column names to make them more reasonable, and adding a quarter variable. The reason for changing the column names is bec...

## Happy biRthday

April 9, 2013
By

Today is my birthday. It’s also the birthday of a close friend. What an incredible coincidence! Or wait, may be is just expected. One more time R comes into our help, because it has a built-in function to answer our question. … Continue reading →

## How to set axis options in googleVis

April 9, 2013
By

Setting axis options in googleVis charts can be a bit tricky. Here I present two examples where I set several options to customise the layout of a line and combo chart with two axes. The parameters have to be set in line with the Google Chart Tools API, which uses a JavaScript syntax....

## Changing figure options mid-chunk (in a loop) using the pander package.

April 9, 2013
By

I wrote already about changing figure options mid-chunk in reproducible research. This can be important  e.g. if you are looping through a dataset to produce a graphic for each variable but the figure width or height need to depend on properties of the variables, e.g. if you are producing histograms and want the figures to