Monthly Archives: September 2013

Generate and Retrieve Many Objects with Sequential Names

September 8, 2013
By
Generate and Retrieve Many Objects with Sequential Names

While coding ensemble methods in data mining with R, e.g. bagging, we often need to generate many data and models objects with sequential names. Below is a quick example how to use assign() function to generate many prediction objects on the fly and then retrieve these predictions with mget() to do the model averaging.

Read more »

Mixed models; Random Coefficients, part 1

September 8, 2013
By
Mixed models; Random Coefficients, part 1

Continuing with my exploration of mixed models I am now at the first part of random coefficients: example 59.5 for proc mixed (page 5034 of the SAS/STAT 12.3 Manual). This means I skipped examples 59.3 (plotting the likelihood) and 59.4 (known G and R)...

Read more »

Rforecastio – Simple R Package To Access forecast.io Weather Data

September 8, 2013
By
Rforecastio – Simple R Package To Access forecast.io Weather Data

It doesn’t get much better for me than when I can combine R and weather data in new ways. I’ve got something brewing with my Nest thermostat and needed to get some current wx readings plus forecast data. I could have chosen a number of different sources or API’s but I wanted to play with

Read more »

Maximum Likelihood Estimation and the Origin of Life

September 8, 2013
By
Maximum Likelihood Estimation and the Origin of Life

# Maximum likelihood Estimation (MLE) is a powerful tool in econometrics which allows for the consistent and asymptotically efficient estimation of parameters given a correct identification (in terms of distribution) of the random variable. # It i...

Read more »

The Problem with Percentiles

September 8, 2013
By
The Problem with Percentiles

The Problem with Percentiles Percentiles (or, more accurately, quantiles) are deeply embedded in the psyche of actuaries, statisticians and similar beasts. They are referred to implicitly in the Solvency 2 directive (Article 100, Value at Risk) without explanation. They are so ingrained...

Read more »

Visualizing optimization process

September 8, 2013
By
Visualizing optimization process

One of the approaches to graph drawing is application of so called force-directed algorithms. In its simplest form the idea is to layout the nodes on plane so that all edges in the graph have approximately equal length. This problem has very intuitive ...

Read more »

Linear regression from a contingency table

September 7, 2013
By
Linear regression from a contingency table

This morning, Benoit sent me an email, about an exercise he found in an econometric textbook, about linear regression. Consider the following dataset, Here, variable X denotes the income, and Y the expenses. The goal was to fit a linear regression (actually, in the email, it was mentioned that we should try to fit an heteroscedastic model, but let...

Read more »

Vectors, Looping, and Performance

September 7, 2013
By
Vectors, Looping, and Performance

Vectors are at the heart of R and represent a true convenience. Moreover, vectors are essential for good performance especially when your are working with lots of data. We’ll explore these concepts in this posting. As a motivational example let’s generate a sequence of data from -3 to 3. We’ll also use each point as

Read more »

Vectors, Looping, and Performance

September 7, 2013
By
Vectors, Looping, and Performance

Vectors are at the heart of R and represent a true convenience. Moreover, vectors are essential for good performance especially when your are working with lots of data. We’ll explore these concepts in this posting. As a motivational example let’s generate a sequence of data from -3 to 3. We’ll also use each point as

Read more »

A bit of benchmarking with string distances

September 7, 2013
By

After my last post about the stringdist package, Zachary Mayer pointed out to me that the implementation of the Levenshtein and Jaro-Winkler distances implemented in the RecordLinkage package are about two-three times faster. His benchmark compares randomly generated character strings … Continue reading →

Read more »