## The law of small numbers

January 28, 2013
By
$N$

In insurance, the law of large numbers (named loi des grands nombres initially by Siméon Poisson, see e.g. http://en.wikipedia.org/…) is usually mentioned to legitimate large portfolios, because of pooling and diversification: the larger the pool, the more ‘predictable’ the losses will be (in a given period). Of course, under standard statistical assumption, namely finite expected value, and independence (see http://freakonometrics.blog.free.fr/…....

## Evolution of a logistic regression

January 28, 2013
By

In my last post I showed how one can easily summarize the outcome of a logistic regression. Here I want to show how this really depends on the data-points that are used to estimate the model. Taking a cue from the evolution of a correlation I have plotted the estimated Odds Ratios (ORs) depending on

## analyze the survey of consumer finances (scf) with r

January 28, 2013
By

the survey of consumer finances (scf) tracks the wealth of american families.  every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on th...

## My template for controlling publication quality figures

January 28, 2013
By

The following is a template that I usually start with when producing figures for publication. It allows me to control:The overall size of the figure (in inches) (WIDTH, HEIGHT)The layout of figure subplots (using the layout() function) (LO)The resoluti...

## The components garch model in the rugarch package

January 28, 2013
By

How to fit and use the components model. Previously Related posts are: A practical introduction to garch modeling Variability of garch estimates garch estimation on impossibly long series Variance targeting in garch estimation The model The components model (created by Engle and Lee) generally works better than the more common garch(1,1) model.  Some hints about … Continue reading...

## My template for controlling publication quality figures

January 28, 2013
By

The following is a template that I usually start with when producing figures for publication. It allows me to control:The overall size of the figure (in inches) (WIDTH, HEIGHT)The layout of figure subplots (using the layout() function) (LO)The resolution of the figure (for a .png file) (RESO)I define the overall dimensions of...

## I thought R was a letter…intro/installation

January 27, 2013
By

I will make a confession. This past summer, I didn’t spend my spare time watching relentlessly addicting TV shows nor clubbing in San Francisco. Instead, I checked out figures. No, not the sort of figures you’re probably thinking about. The ones that are included in research papers and have the potential to be beautiful works of

## European Fishing

January 27, 2013
By

I am playing around with Eurostat data and ggplot2 a bit more. As I progress it seems the plotting gets more easy, the data pre-processing a bit more simple and the surprises on the data stay.Eurostat dataThe data used are fish_fleet (number of ships) and fish_pr (production=catch+aquaculture). After a bit of year selection, 1992 and later, I decided to...

## A slightly different introduction to R, part II

January 27, 2013
By

In part I, we looked at importing data into R and simple ways to manipulate data frames. Once we’ve gotten our data safely into R, the first thing we want to do is probably to make some plots. Below, we’ll make some simple plots of the made-up comb gnome data. If you want to play

## Regression tree using Gini’s index

January 27, 2013
By
$Y$

In order to illustrate the construction of regression tree (using the CART methodology), consider the following simulated dataset, > set.seed(1) > n=200 > X1=runif(n) > X2=runif(n) > P=.8*(X1<.3)*(X2<.5)+ + .2*(X1<.3)*(X2>.5)+ + .8*(X1>.3)*(X1<.85)*(X2<.3)+ + .2*(X1>.3)*(X1<.85)*(X2>.3)+ + .8*(X1>.85)*(X2<.7)+ + .2*(X1>.85)*(X2>.7) > Y=rbinom(n,size=1,P) > B=data.frame(Y,X1,X2) with one dichotomos varible (the variable of interest, ), and two continuous ones (the explanatory ones  and ). > tail(B) Y...

## Tracking Number of Historical Clusters

January 26, 2013
By

In the prior post, Optimal number of clusters, we looked at methods of selecting number of clusters. Today, I want to continue with clustering theme and show historical Number of Clusters time series using these methods. In particular, I will look at the following methods of selecting optimal number of clusters: Minimum number of clusters

## ggplot2 multiple boxplots with metadata

January 26, 2013
By

Recently I was asked for an advice of how to plot values with an additional attached condition separating the boxplots. This turns out to be ugly in base graphics, but amazingly simple in ggplot2.

## Learning R using a Chemical Reaction Engineering Book: Part 3

January 26, 2013
By
$Learning R using a Chemical Reaction Engineering Book: Part 3$

In case you missed previous parts, the links to them are listed below. Part 1 Part 2 In this part, I tried to recreate the examples in section A.2.3 of the computational appendix in the reaction engineering book (by Rawlings and … Continue reading →

## Learning R using a Chemical Reaction Engineering Book: Part 2

January 26, 2013
By
$Learning R using a Chemical Reaction Engineering Book: Part 2$

In case you missed part 1, you can view it here. In this part, I tried to recreate the examples in section A.2.2 of the computational appendix in the reaction engineering book by Rawlings and Ekerdt. Solving a nonlinear system of equations … Continue reading →

## Waiting for an API request to complete

January 26, 2013
By

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don't know when the query will be...

## Code Pollution With Command Prompts

January 26, 2013
By

This is not the first time I have ranted about command prompts, but I cannot help ranting about them whenever I saw them in source code. In short, a piece of source code with command prompts is like a bag of cooked shrimps in the market -- it does not make sense, and an otherwise good thing is...

## Waiting for an API request to complete

January 26, 2013
By

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don't know when the query will be...

## Learning R using a Chemical Reaction Engineering Book: Part 1

January 25, 2013
By
$Learning R using a Chemical Reaction Engineering Book: Part 1$

Chemical Reactor Analysis and Design Fundamentals by J.B. Rawlings and J. G. Ekerdt is a textbook for studying Chemical Reaction Engineering. The popular open source package Octave has its origins to the reaction engineering course offered by Prof. Rawlings. This book … Continue reading →

## Lambda.r 1.1.0 released

January 25, 2013
By

This is a quick post to announce lambda.r version 1.1.0 is released and available on CRAN.1 This release has a …Continue reading »

## A simple way to cluster music

January 25, 2013
By

In my last blog, I discussed the tuneR functions that provide an option to transcribe musical notes from audio frequencies. In this blog, I’ll write about functions for comparison of audio spectrum distributions, available in the seewave library. The idea I … Continue reading →

## Slides and replay of my “Using R with Hadoop” webinar now available #rstats #hadoop

January 25, 2013
By

I owe a big “thank you” to all of you who attended my webinar yesterday “Using R with Hadoop”. Revolution Analytics partnered with us at Think Big Analytics to produce the webinar, and I owe them thanks as well. For those of you who missed it, the slides and replay are now available from Revolution

## R and foreign characters

January 25, 2013
By

Working with Russian characters can be mind-numbingly frustrating. This is true for R, as for other applications, so below I've written out the my top five tricks for making Russian inputs work in R; i believe they should be transferable to most other languages....

## Shiny 0.3.0 released

January 25, 2013
By

Version 0.3.0 of Shiny is now available on CRAN. This version of Shiny has several new features and bug fixes. Some of the changes are under the hood: for example, Shiny now uses a more efficient algorithm for scheduling the execution of reactive functions. There are also some user-facing changes: for example, the new runGitHub()

## Video: Using R with Hadoop

January 25, 2013
By

If you weren't one of the almost 2000 people who signed up for yesterday's webinar "Using R with Hadoop", the replay and slides are now available. During the webinar, Jeffrey Breen (Principal at Think Big Academy) talked about extracting analytics from data in Hadoop and covered: How to use R and Hadoop Hadoop streaming Various R packages and RHadoop...

## Regressions 101: “Significance”

January 25, 2013
By

SETUP (CAN BE SKIPPED) We start with data (how was it collected?) and the hope that Read more »

## Effects of forest management on a woodland salamander in Missouri

January 25, 2013
By

Effects of experimental forest management on a terrestrial, woodland salamander in Missouri by Daniel J. Hocking, Grant M. Connette, Christopher A. Conner, Brett R. Scheffers, Shannon E. Pittman, William E. Peterman, and Raymond D. Semlitsch. This is the first post … Continue reading →

## Object Orientation in R – Notes from a novice

January 25, 2013
By

Having posted some code to Git a few days ago and having been wholly dissatisfied with it, I began to do what I often do with code I don’t like. I started re-writing it bigger and weirder and more philosophically pure. Part of this search for Platonic code lead me to explore object oriented programming

## Tensor Algebra: Efficient Operations on Multidimensional Arrays with R

January 25, 2013
By

Multidimensional arrays are ubiquitous. Any complex problem having multivariate observables would easily generate a need to represent corresponding data in multidimensional arrays. Most of the practitioners would choose to apply operations on the...