The R-Podcast Screencast 1: Basic Interaction with R

March 12, 2012
By

Here is the inaugural R-Podcast Screencast: Basic Interaction with R. This screencast contains audio from episode 3 of the R-Podcast. In this screencast I demonstrate how to create a vector of numerical data, calculating means, installing and loading packages, and getting help for a function. You can find the R code demonstrated in this episode

Read more »

XYZ geographic data interpolation, part 3

March 12, 2012
By
XYZ geographic data interpolation, part 3

This will be probably be a final posting on interpolation of xyz data as I believe I have come to some conclusions to my original issues. I show three methods of xyz interpolation:1. The quick and dirty method of interpolating projected xyz points (bi-linear)2. Interpolation using Cartesian coordinates (bi-linear)3. Interpolation using spherical coordinates and...

Read more »

useR! 2012 Abstract Submission Deadline Today!

March 12, 2012
By
useR! 2012 Abstract Submission Deadline Today!

useR! 2012 is just around the corner. The deadline for talk and poster abstract submissions is today! Submit your abstract here.

Read more »

The quality of variance matrix estimation

March 12, 2012
By
The quality of variance matrix estimation

A bit of testing of the estimation of the variance matrix for S&P 500 stocks in 2011. Previously There was a plot in “Realized efficient frontiers” showing the realized volatility in 2011 versus a prediction of volatility at the beginning of the year for a set of random portfolios.  A reader commented to me privately … Continue reading...

Read more »

Compiling government positions from the Manifesto Project data with R

March 12, 2012
By
Compiling government positions from the Manifesto Project data with R

The Manifesto Project (former Manifesto Research Group, Comparative Manifestos Project) has assembled a database of ‘quantitative content analyses of parties’ election programs from more than 50 countries covering all free, democratic elections since 1945′ and is freely accessible online. The … Continue reading →

Read more »

Change in life expectancy animated with geo charts

March 12, 2012
By
Change in life expectancy animated with geo charts

The data of the World Bank is absolutely amazing. I had said this before, but their updated iPhone App gives me a reason to return to this topic. Version 3 of the DataFinder App allows you to visualise the data on your phone, including motion maps, see...

Read more »

Generating a lag/lead variables

March 11, 2012
By
Generating a lag/lead variables

A few days ago, my friend asked me is there any function in R to generate lag/lead variables in a data.frame or did similar thing as _n in stata. He would like to use that to clean-up his dataset in R. In stata help manual: _n contains the number of the current observation. Here’s an

Read more »

The R-Podcast Episode 3: Basic Interaction with R

March 11, 2012
By

In this episode: New versions of R and ggplot2 available, listener feedback, and an interactive session with R. The R code discussed in this episode will be available in our GitHub repository, see the show notes for details. There will be a companion screencast to accompany this episode which will be posted shortly. As always,

Read more »

Hindi/Devanagari presentations using orgmode, R, latex and beamer

March 11, 2012
By

I recently had to prepare a beamer presentation in hindi/devanagari. I usually use emacs-orgmode  with a lot of R source code embedded in it to prepare my beamer presentations. To adapt the entire setup to work with devanagari, this is what I needed to do.      Make orgmode export to latex using xetex rather than

Read more »

IS vs. self-normalised IS

March 11, 2012
By
IS vs. self-normalised IS

I was grading my Master projects this morning and came upon this graph: which compares the variability of an importance-sampling estimator versus its self-normalised alternative… This is an interesting case in that self-normalisation does considerably degrade the quality of the approximation in that setting. In other cases, self-normalisation may bring a clear improvement. (This reminded

Read more »

Plotting stuff on an image

March 11, 2012
By

Recently, I needed to figure out how many extension cords I was going to need to buy in order to reach parts of my field site. Wandering around in the field with a surveyor's tape was an option, but so was plotting distances on an aerial image I had of...

Read more »

Interactive function for distances in plots

March 11, 2012
By

The following R function returns the distance between two points located on a plot. The distance returned is in the same units as that of the plot.interDist     aa     dx     dy     sqrt(sum(c(dx^2, dy^2)))}

Read more »

A Julia version of the multinomial sampler

March 11, 2012
By
A Julia version of the multinomial sampler

In the previous post on RcppEigen I described an example of sampling from collection of multinomial distributions represented by a matrix of probabilities.  In the timing example the matrix was 100000 by 5 with each of the 100000 rows summing...

Read more »

An RcppEigen example

March 11, 2012
By

R is an Open Source project providing an interactive language and environment for statistical computing.  It has become the lingua franca for research in statistical methods.  Because R is an interpreted language it is comparat...

Read more »

The name

March 11, 2012
By

When I bought the first computer for use in our Statistics Department - a Vax 11/750 that cost about a quarter of a million dollars in 1983 - I was considered extravagant because I purchased and installed a second megabyte of memory for the machine.

Read more »

A Crash Course in git for Data Scientists

March 10, 2012
By

I really like git. It’s the first versioning tool I’ve ever used so I have nothing else to compare it to, but in the world of statistical model building where iteration is constant (and almost never a strict linear progression)...

Read more »

github with Multiple Accounts: An Analyst Perspective

March 10, 2012
By

After using github for data mining competitions and a project on statistical language models I found I enjoyed it some much I wanted to use it at work too. The trick is there’s a lot of overlap between what I...

Read more »

R Meets Java: An Absolute Beginners’ Introduction

March 10, 2012
By

My guess is R is most commonly integrated with C/C++ to handle heavy-duty computing. (thanks in no small part due to the productivity of Dick Eddlebuttle!) That said, if you’re like most statisticians and physical scientists and aren’t already a programming...

Read more »

Get ROAuth to work on Windows 7

March 10, 2012
By

Jeff Gentry has created a couple of really fun and handy R packages for working with Twitter data called twitteR and ROAuth. He’s also written an easy to read vignette on how to get started. As of right now (March...

Read more »

Thoughts on SPSS and R Integration

March 10, 2012
By
Thoughts on SPSS and R Integration

As part of considering SPSS as a platform for modeling I wanted to test SPSS’ integration with R. What I found out is getting SPSS to work with R isn’t embarssingly obvious. What’s worse I found it quite difficult to...

Read more »

Slides from today’s Big Data Step-by-Step Tutorials: Infrastructure series and Intro to R+Hadoop with RHadoop’s rmr

March 10, 2012
By
Slides from today’s Big Data Step-by-Step Tutorials: Infrastructure series and Intro to R+Hadoop with RHadoop’s rmr

Slides from the Boston Predictive Analytics Big Data Workshop tutorials: Big Data Step-by-Step: Infrastructure 1/3: Local VM Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2 Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily... with Whirr Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)

Read more »

"Fear of floating exchange rate" or "fear of losing international reserves".

March 10, 2012
By
"Fear of floating exchange rate" or "fear of losing international reserves".

We were recently required to do an assignment for the International Finance course where we had to investigate the policy that the emerging economies adopt towards holding international reserves. A recent research paper at the NBER by Joshua Aizen...

Read more »

Detour in taste wordclouds

March 10, 2012
By

I read Mining Twitter for consumer attitudes towards hotels in my feed of R-bloggers. That reminded me that I intended to look at generating wordclouds for salt and MSG at some point. Salt, or sodium is linked to hypertension, which is linked...

Read more »

German train monitor provides access to train delay data

March 10, 2012
By
German train monitor provides access to train delay data

The German newspaper Süddeutsche Zeitung (SZ) worked together with OpenDataCity to create an online train monitor of the German network: Zugmonitor. This is another great example of the new form of data journalism.The project provides access to data o...

Read more »

Recovering Marginal Effects and Standard Errors of Interactions Terms Pt. II: Implement and Visualize

March 9, 2012
By
Recovering Marginal Effects and Standard Errors of Interactions Terms Pt. II: Implement and Visualize

In the last post I presented a function for recovering marginal effects of interaction terms. Here we implement the function with simulated data and plot the results using ggplot2.       #---Simulate Data and Fit a linear model with an...

Read more »

Recovering Marginal Effects and Standard Errors of Interactions Terms Pt. II: Implement and Visualize

March 9, 2012
By
Recovering Marginal Effects and Standard Errors of Interactions Terms Pt. II: Implement and Visualize

In the last post I presented a function for recovering marginal effects of interaction terms. Here we implement the function with simulated data and plot the results using ggplot2.       #---Simulate Data and Fit a linear model with an...

Read more »

Recovering Marginal Effects and Standard Errors of Interactions Terms Pt. II: Implement and Visualize

March 9, 2012
By
Recovering Marginal Effects and Standard Errors of Interactions Terms Pt. II: Implement and Visualize

In the last post I presented a function for recovering marginal effects of interaction terms. Here we implement the function with simulated data and plot the results using ggplot2.       #---Simulate Data and Fit a linear model with an...

Read more »

find | xargs … Like a Boss

March 9, 2012
By

*Edit March 12* Be sure to look at the comments, especially the commentary on Hacker News - you can supercharge the find|xargs idea by using find|parallel instead.---Do you ever discover a trick to do something better, faster, or easier, and wish you c...

Read more »

Two-minute tutorials for R beginners

March 9, 2012
By

R user Anthony Damico has created "Twotorials": a series of two-minute tutorials for newcomers to R. Topics include how to download and install R, how to do simple arithmetic in r, how to work with data tables in r and many others. The tutorials are especially useful for users of R on Windows, with video demonstrations using the Windows...

Read more »