## Poisson regression on non-integers

May 7, 2013
In the course on claims reserving techniques, I did mention the use of Poisson regression, even if incremental payments were not integers. For instance, we did consider incremental triangles > source("http://perso.univ-rennes1.fr/arthur.charpentier/bases.R") > INC=PAID > INC=PAID-PAID > INC 3209 1163 39 17 7 21 3367 1292 37 24 10 NA 3871...

## R in Insurance: Programme and Abstracts published

May 7, 2013
I am delighted to announce that the programme and abstracts for the first R in Insurance conference at Cass Business School in London, 15 July 2013, have been published. The conference committee received strong abstracts from academia and the industry,...

## SAS Big Data Analytics Benchmark (Part Two)

May 7, 2013
by Thomas Dinsmore On April 26, SAS published on its website an undated Technical Paper entitled Big Data Analytics: Benchmarking SAS, R and Mahout. In the paper, the authors (Allison J. Ames, Ralph Abbey and Wayne Thompson) describe a recent project to compare model quality, product completeness and ease of use for two SAS products together with open source...

## Eigen-analysis of Linear Model Behavior in R

May 7, 2013
This post is actually about replicating the figures in Otto and Day: A Biologist’s Guide to Mathematical Modeling in Ecology and Evolution. The figures I’m interested in for this post are Figures 9.1 and 9.2 in the chapter ‘General Solutions … Continue reading

## DataMind & The R Service Bus @ RBelgium

Within 2 weeks on Friday, May 24, The RBelgium R user group is holding its next Regular meeting in Leuven for which this is the schedule: ** Jonathan Cornelissen - DataMind  Discover DataMind, a new online learning platform for d...

## Subsetting data

May 6, 2013
At School we use R across many courses, because students are supposed to use statistics under a variety of contexts. Imagine their disappointment when they pass stats and discovered that R and statistics haven’t gone away! When students start working … Continue reading

## Passing columns of a dataframe to a function without quotes

May 6, 2013
I love the syntax of calls to lm and ggplot, wherein the dataframe is specified as a variable and specific columns are referenced as though they were separate variables. While developing some of my functions, I’d wanted to introduce something similar. I often find that I have a single large dataframe and want to execute

## Explaining real-time predictive analytics with big data (video)

May 6, 2013
In my presentation to the Strata Santa Clara 2013 conference earlier this year, my goal was to give a succinct (under 20 minutes!) explanation of three terms that are two often used as mere buzzwords: predictive analytics, real time, and big data. You can download the slides for my presentation, Real-time Big Data Analytics: From Deployment to Production, from...

## Veterinary Epidemiologic Research: Count and Rate Data – Zero Counts

May 6, 2013
$Veterinary Epidemiologic Research: Count and Rate Data – Zero Counts$

Continuing on the examples from the book Veterinary Epidemiologic Research, we look today at modelling count when the count of zeros may be higher or lower than expected from a Poisson or negative binomial distribution. When there’s an excess of zero counts, you can fit either a zero-inflated model or a hurdle model. If zero

## Bayesian and Frequentist Approaches: Ask the Right Question

May 6, 2013
It occurred to us recently that we don’t have any articles about Bayesian approaches to statistics here. I’m not going to get into the “Bayesian versus Frequentist” war; in my opinion, which style of approach to use is less about philosophy, and more about figuring out the best way to answer a question. Once you

## Incomplete Data by Design: Bringing Machine Learning to Marketing Research

May 6, 2013
Survey research deals with the problem of question wording by always asking the same question.  Thus, the Gallup Daily Tracking is filled with examples of moving averages for the exact same question asked precisely the same way every day. &nb...

## Creating a QGIS-Style (qml-file) with an R-Script

May 6, 2013
How to get from a txt-file with short names and labels to a QGIS-Style (qml-file)?
I used the below R-script to create a style for this legend table where I copy-pasted the parts I needed to a txt-file, like for the WRB-FULL (WRB-FULL: Full soil code of the STU from the World Reference Base for Soil Resources). The...

## The half variance approximation for mean returns

May 6, 2013
What’s that thing about arithmetic and geometric returns and the variance? Previously An introduction to the difference between simple and log returns is: A tale of two returns Issue Suppose you are predicting the mean annual return of an asset for some number of years.  To simplify the discussion, let’s buy into the fantasy that … Continue reading...

## analyze the social security administration public use microdata files (ssapumf) with r

May 5, 2013
the social security administration (ssa) must be overflowing with quiet heroes, because their public-use microdata files are as inconspicuous as they are thorough.  sure, ssa publishes enough great statistical research of their own that outside re...

## Google Analytics + R = FUN!

May 5, 2013
The scope of this post it to show how simple it is to get data out of the Google Analytics and create your own reports (that you hope that they can be semi-automated at least) and you favourite statistical graphs (those that GA is currently missing). As you already know R is a favourite tool ...read more

## R/Finance 2013 Is Coming Quickly…

May 5, 2013
There is about two weeks remaining until R/Finance 2013 - being held on May 17th and 18th at UIC in Chicago.  Make sure you register beforehand to ensure you have a spot, and – yes - you do want to come to the conference dinner on Friday.   I am particularly excited about the lineup of keynotes

## Simulation shows gain of clmm over ANOVA is small

May 5, 2013
After last post's setting up for a simulation, it is now time to look how the models compare. To my disappointment with my simple simulations of assessors behavior the gain is minimal. Unfortunately, the simulation took much more time than I ...

## Volatility Regimes: Part 2

## Strategy Implications

In this part of the volatility regimes analysis, we’ll use the regime identification framework
established in part 1 to draw conclusions about which strategies work best is each regime.
That should prove useful to us and goes a long way to answering the question, “What strategies should I
be pursuing right...

## Quandl Package – 5,000,000 free datasets at the tip of your fingers!

May 5, 2013
# Yes, you read that correctly and no Quandl (http://www.quandl.com/) did not pay me anything.# Quandl is a new database management tool which seeks to become the place to find datasets.  They boast of having over 5x10^6 data sets available t...

## A Prototype of Monotonic Binning Algorithm with R

May 4, 2013
I’ve been asked many time if I have a piece of R code implementing the monotonic binning algorithm, similar to the one that I developed with SAS (http://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) and with Python (http://statcompute.wordpress.com/2012/12/08/monotonic-binning-with-python). Today, I finally had time to draft a quick prototype with 20 lines of R code, which is however barely useable without the

## Backporting R 3.0.0 to Quantal, Precise, and Lucid

May 4, 2013
Today (May 4, 2013) I will begin the process of backporting R 3.0.0 to Quantal, Precise, and Lucid. This will include all the recommended packages and the packages for R found in the universe repository for Ubuntu. Things to keep in mind: If you do...

## TV shows rated by episode as a Shiny App

May 3, 2013
A few days ago there was an interesting R based article by diffuseprior on the decline and fall in the quality of The Simpsons The author scraped results from GEOS, an online survey of TV programs, and applied the R package changepoint to offer an analysis of the show over time This seemed a candidate aaaa

## LaTeX in R graphs

May 3, 2013
A nice post was recently published on the rsnippets blog, about the tikzDevice R package. This package is – indeed – awesome. Even if it has been removed from the CRAN website. Of course, it can be download from the archive folder, on http://cran.r-project.org/…, but also (for a more recent version)  on http://download.r-forge.r-project.org/…. But first, it is necessary to install...

## Animation, from R to LaTeX

May 3, 2013
$X_{i,j}\sim\mathcal{B}(1/2)$

Just a short post, to share some codes used to generate animated graphs, with R. Assume that we would like to illustrate the law of large number, and the convergence of the average value from binomial sample. We can generate samples  using > n=200 > k=1000 > set.seed(1) > X=matrix(sample(0:1,size=n*k,replace=TRUE),n,k) Each row  will be a trajectory of heads and...

## Old Post with New d3 Life–GARCH and MA Performance

May 3, 2013
Parallel coordinates become much more useful when they are interactive, so I recreated one of my favorite blog posts "Trend is Not Your Friend" Applied to 48 Industries and convert the chart to a living breathing d3 parallel coordinates chart courtesy ...

## Extending RevoScaleR for Mining Big Data – Naive Bayes

May 3, 2013
by Derek McCrae Norton, Senior Sales Engineer In this third installment (following part 1 and part 2) of Extending RevoScaleR for Mining Big Data we look at how to use the building blocks provided by RevoScaleR to create a Naive Bayes model. Motivation: Fit a Naive Bayes model to big data. Naive Bayes is a simple probabilistic classifier based...

## All About Spherically Distributed Regression Errors

May 2, 2013
(This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers) This post is based on a handout that I use for one of my courses, and it relates to the usual linear regression model,                                  y = Xβ...

## Improved R Profiling Summaries

May 2, 2013
In my last post I mentioned that I had improved on R’s `summaryRprof()` function with a custom function called `proftable()`. I’ve updated `proftable()` to take advantage of R 3.0.0’s ability to record line numbers while profiling. I’ve put it on github – you can get it there or below.

`proftable` reads in a file generated by...