The law of small numbers

January 28, 2013
By
The law of small numbers

In insurance, the law of large numbers (named loi des grands nombres initially by Siméon Poisson, see e.g. http://en.wikipedia.org/…) is usually mentioned to legitimate large portfolios, because of pooling and diversification: the larger the pool, the more ‘predictable’ the losses will be (in a given period). Of course, under standard statistical assumption, namely finite expected value, and independence (see http://freakonometrics.blog.free.fr/…....

Read more »

Evolution of a logistic regression

January 28, 2013
By
Evolution of a logistic regression

In my last post I showed how one can easily summarize the outcome of a logistic regression. Here I want to show how this really depends on the data-points that are used to estimate the model. Taking a cue from the evolution of a correlation I have plotted the estimated Odds Ratios (ORs) depending on

Read more »

analyze the survey of consumer finances (scf) with r

January 28, 2013
By

the survey of consumer finances (scf) tracks the wealth of american families.  every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on th...

Read more »

My template for controlling publication quality figures

January 28, 2013
By
My template for controlling publication quality figures

The following is a template that I usually start with when producing figures for publication. It allows me to control:The overall size of the figure (in inches) (WIDTH, HEIGHT)The layout of figure subplots (using the layout() function) (LO)The resoluti...

Read more »

The components garch model in the rugarch package

January 28, 2013
By
The components garch model in the rugarch package

How to fit and use the components model. Previously Related posts are: A practical introduction to garch modeling Variability of garch estimates garch estimation on impossibly long series Variance targeting in garch estimation The model The components model (created by Engle and Lee) generally works better than the more common garch(1,1) model.  Some hints about … Continue reading...

Read more »

My template for controlling publication quality figures

January 28, 2013
By
My template for controlling publication quality figures

The following is a template that I usually start with when producing figures for publication. It allows me to control:The overall size of the figure (in inches) (WIDTH, HEIGHT)The layout of figure subplots (using the layout() function) (LO)The resolution of the figure (for a .png file) (RESO)I define the overall dimensions of...

Read more »

I thought R was a letter…intro/installation

January 27, 2013
By
I thought R was a letter…intro/installation

I will make a confession. This past summer, I didn’t spend my spare time watching relentlessly addicting TV shows nor clubbing in San Francisco. Instead, I checked out figures. No, not the sort of figures you’re probably thinking about. The ones that are included in research papers and have the potential to be beautiful works of

Read more »

European Fishing

January 27, 2013
By
European Fishing

I am playing around with Eurostat data and ggplot2 a bit more. As I progress it seems the plotting gets more easy, the data pre-processing a bit more simple and the surprises on the data stay.Eurostat dataThe data used are fish_fleet (number of ships) and fish_pr (production=catch+aquaculture). After a bit of year selection, 1992 and later, I decided to...

Read more »

A slightly different introduction to R, part II

January 27, 2013
By
A slightly different introduction to R, part II

In part I, we looked at importing data into R and simple ways to manipulate data frames. Once we’ve gotten our data safely into R, the first thing we want to do is probably to make some plots. Below, we’ll make some simple plots of the made-up comb gnome data. If you want to play

Read more »

Regression tree using Gini’s index

January 27, 2013
By
Regression tree using Gini’s index

In order to illustrate the construction of regression tree (using the CART methodology), consider the following simulated dataset, > set.seed(1) > n=200 > X1=runif(n) > X2=runif(n) > P=.8*(X1<.3)*(X2<.5)+ + .2*(X1<.3)*(X2>.5)+ + .8*(X1>.3)*(X1<.85)*(X2<.3)+ + .2*(X1>.3)*(X1<.85)*(X2>.3)+ + .8*(X1>.85)*(X2<.7)+ + .2*(X1>.85)*(X2>.7) > Y=rbinom(n,size=1,P) > B=data.frame(Y,X1,X2) with one dichotomos varible (the variable of interest, ), and two continuous ones (the explanatory ones  and ). > tail(B) Y...

Read more »

Tracking Number of Historical Clusters

January 26, 2013
By
Tracking Number of Historical Clusters

In the prior post, Optimal number of clusters, we looked at methods of selecting number of clusters. Today, I want to continue with clustering theme and show historical Number of Clusters time series using these methods. In particular, I will look at the following methods of selecting optimal number of clusters: Minimum number of clusters

Read more »

ggplot2 multiple boxplots with metadata

January 26, 2013
By
ggplot2 multiple boxplots with metadata

Recently I was asked for an advice of how to plot values with an additional attached condition separating the boxplots. This turns out to be ugly in base graphics, but amazingly simple in ggplot2.

Read more »

Learning R using a Chemical Reaction Engineering Book: Part 3

January 26, 2013
By
Learning R using a Chemical Reaction Engineering Book: Part 3

In case you missed previous parts, the links to them are listed below. Part 1 Part 2 In this part, I tried to recreate the examples in section A.2.3 of the computational appendix in the reaction engineering book (by Rawlings and … Continue reading →

Read more »

Learning R using a Chemical Reaction Engineering Book: Part 2

January 26, 2013
By
Learning R using a Chemical Reaction Engineering Book: Part 2

In case you missed part 1, you can view it here. In this part, I tried to recreate the examples in section A.2.2 of the computational appendix in the reaction engineering book by Rawlings and Ekerdt. Solving a nonlinear system of equations … Continue reading →

Read more »

Waiting for an API request to complete

January 26, 2013
By

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don't know when the query will be...

Read more »

Code Pollution With Command Prompts

January 26, 2013
By

This is not the first time I have ranted about command prompts, but I cannot help ranting about them whenever I saw them in source code. In short, a piece of source code with command prompts is like a bag of cooked shrimps in the market -- it does not make sense, and an otherwise good thing is...

Read more »

Waiting for an API request to complete

January 26, 2013
By

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don't know when the query will be...

Read more »

Learning R using a Chemical Reaction Engineering Book: Part 1

January 25, 2013
By
Learning R using a Chemical Reaction Engineering Book: Part 1

Chemical Reactor Analysis and Design Fundamentals by J.B. Rawlings and J. G. Ekerdt is a textbook for studying Chemical Reaction Engineering. The popular open source package Octave has its origins to the reaction engineering course offered by Prof. Rawlings. This book … Continue reading →

Read more »

Lambda.r 1.1.0 released

January 25, 2013
By
Lambda.r 1.1.0 released

This is a quick post to announce lambda.r version 1.1.0 is released and available on CRAN.1 This release has a …Continue reading »

Read more »

A simple way to cluster music

January 25, 2013
By

In my last blog, I discussed the tuneR functions that provide an option to transcribe musical notes from audio frequencies. In this blog, I’ll write about functions for comparison of audio spectrum distributions, available in the seewave library. The idea I … Continue reading →

Read more »

Slides and replay of my “Using R with Hadoop” webinar now available #rstats #hadoop

January 25, 2013
By
Slides and replay of my “Using R with Hadoop” webinar now available #rstats #hadoop

I owe a big “thank you” to all of you who attended my webinar yesterday “Using R with Hadoop”. Revolution Analytics partnered with us at Think Big Analytics to produce the webinar, and I owe them thanks as well. For those of you who missed it, the slides and replay are now available from Revolution

Read more »

R and foreign characters

January 25, 2013
By
R and foreign characters

Working with Russian characters can be mind-numbingly frustrating. This is true for R, as for other applications, so below I've written out the my top five tricks for making Russian inputs work in R; i believe they should be transferable to most other languages....

Read more »

Shiny 0.3.0 released

January 25, 2013
By
Shiny 0.3.0 released

Version 0.3.0 of Shiny is now available on CRAN. This version of Shiny has several new features and bug fixes. Some of the changes are under the hood: for example, Shiny now uses a more efficient algorithm for scheduling the execution of reactive functions. There are also some user-facing changes: for example, the new runGitHub()

Read more »

Video: Using R with Hadoop

January 25, 2013
By

If you weren't one of the almost 2000 people who signed up for yesterday's webinar "Using R with Hadoop", the replay and slides are now available. During the webinar, Jeffrey Breen (Principal at Think Big Academy) talked about extracting analytics from data in Hadoop and covered: How to use R and Hadoop Hadoop streaming Various R packages and RHadoop...

Read more »

Regressions 101: “Significance”

January 25, 2013
By
Regressions 101: “Significance”

SETUP (CAN BE SKIPPED) We start with data (how was it collected?) and the hope that Read more »

Effects of forest management on a woodland salamander in Missouri

January 25, 2013
By
Effects of forest management on a woodland salamander in Missouri

Effects of experimental forest management on a terrestrial, woodland salamander in Missouri by Daniel J. Hocking, Grant M. Connette, Christopher A. Conner, Brett R. Scheffers, Shannon E. Pittman, William E. Peterman, and Raymond D. Semlitsch. This is the first post … Continue reading →

Read more »

Object Orientation in R – Notes from a novice

January 25, 2013
By
Object Orientation in R – Notes from a novice

Having posted some code to Git a few days ago and having been wholly dissatisfied with it, I began to do what I often do with code I don’t like. I started re-writing it bigger and weirder and more philosophically pure. Part of this search for Platonic code lead me to explore object oriented programming

Read more »

Tensor Algebra: Efficient Operations on Multidimensional Arrays with R

January 25, 2013
By

 Multidimensional arrays are ubiquitous. Any complex problem having multivariate observables would easily generate a need to represent corresponding data in multidimensional arrays. Most of the practitioners would choose to apply operations on the...

Read more »

Resolving species names when you have a lot of them

January 25, 2013
By

taxize use case: Resolving species names when you have a lot of them Species names can be a pain in the ass, especially if you are an ecologist. We ecologists aren't trained in taxonomy, yet we often end up with huge species lists. Of course we want to correct any spelling errors in the names, and get the newest...

Read more »

Sponsors