## Tutorial: Basic data processing with R

December 2, 2013
By

R can do a lot of really amazing things, but to use just about any of R's many features you need to first import your data and get it into the appropriate shape. For R beginners, this "data wrangling" task can be daunting. Fortunately, ComputerWorld's Sharon Machlis has created an in-depth tutorial for many data preparation tasks, which is...

## Visualizing systems of linear equations and linear transformations

December 2, 2013
By
$Visualizing systems of linear equations and linear transformations$

This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures …Continue reading »

December 2, 2013
By

I love to have fun with R and Twitter...there are a lot of cool things that you can do with it...so I just thought of having a small Twitter Battle application...something that will grab the number of followers and lists from two users...apply some cra...

## Twitter unfollowers with R and Rook – Revisited

December 2, 2013
By

Some time ago I wrote a post called Twitter unfollowers with R and Rook where I used R and Twitter to get a list of the people that we follow...but that doesn't follow us back...Right now...that post is obsolete as Twitter changed its API to API 1...

December 2, 2013
By

Very shortly, I'll upload the newest release of BCEA, my R package to post-process the output of a (Bayesian) health economic model and produce systematic summaries (such as graphs and tables) for a full economic evaluation and probabilistic sensitivit...

## Basic overview of the rmongodb package for R

December 2, 2013
By

I have been playing around with MongoDB quite a bit over the last few months.  Because I am much better at coding in R, I decided to write up my notes on how to use the rmongodb package. This is not a comprehensive tutorial by any stretch, but I wanted to share my notes as

## Shaping up Laplace Approximation using Importance Sampling

December 2, 2013
By

In the last post I showed how to use Laplace approximation to quickly (but dirtily) approximate the posterior distribution of a Bayesian model coded in R. This is just a short follow up where I show how to use importance sampling as an easy method to shape up the Laplace approximation in order to approximate the true...

## Speeding up model bootstrapping in GNU R

December 2, 2013
By

After my last post I have recurringly received two questions: (a) is it worthwhile to analyze GNU R speed in simulations and (b) how would simulation speed compare between GNU R and Python. In this post I want to address the former question and next ti...

## Upcoming courses: Dec 2013

December 2, 2013
By

We’re pleased to announce two upcoming in-person training opportunities: Advanced R programming. SF, Dec 16-17. Learn the most important topics from advanced R programming in person. One day one, you’ll learn about metaprograming, functional programming and object oriented programming in R, as well general best practices for programming. Taught by Hadley Wickham, RStudio’s Chief Scientist.

## Probabilities and P-Values

December 2, 2013
By

P-values seem to be the bane of a statistician’s existence.  I’ve seen situations where entire narratives are written without p-values and only provide the effects. It can also be used as a data reduction tool but ultimately it reduces the world into a binary system: yes/no, accept/reject. Not only that but the binary threshold is

## Converting C code to C++ code: An example from plyr

December 2, 2013
By

The plyr package uses a couple of small C functions to optimise a number of particularly bad bottlenecks. Recently, two functions were converted to C++. This was mostly stimulated by a segmentation fault caused by some large inputs to the split_indices() function: rather than figuring out exactly what was going wrong with the complicated C code, it was...

## Evaluating Quandl Data Quality – part II

December 2, 2013
By

This post is a more in depth analysis of Quandl futures data vs. Bloomberg data. Since my last post Quandl has updated its futures database to 200+ contracts from 68 contracts originally. For practical reasons, I limit myself here to the initial list of 60+ contracts. I’m still comparing the “Front Month” contract between the

## Recent Rcpp talks at U Chicago / Booth and U Kansas

December 1, 2013
By

In early October, I had an opportunity to talk about Rcpp and RcppArmadillo at the Statistical Computing Seminar at the Booth School of Business at the University of Chicago. And then two weeks ago, I had an invitation to talk at the Center for Re...

## 24 Days of R: Day 1

December 1, 2013
By

Last year, the good people at is.R() spent December publishing an R advent calendar. This meant that for 24 days, every day, there was an interesting post featuring analysis and some excellent visualizations in R. I think it's an interesting (if very challenging) exercise and I'm going to try to do it myself this year.

## R: Explore ARIMA(2, 2, 2) subclass family on Shiny

December 1, 2013
By

I've been thinking that it might be better to explore the Box-Jenkins ARIMA (Autoregressive Integrated Moving-Average) three-iterative modelling on Shiny. So here is what I got, this app is intended for ARIMA(2, 2, 2) subclass family only.The app has s...

## Read line by line of a file in R

December 1, 2013
By

Are you using R for data manipulation for later use with other programs, i.e., a workflow something like this:read data sets from a disk,modify the data, andwrite it back to a disk.All fine, but of data set is really big, then you will soon stumble on ...

December 1, 2013
By

## Comment on Comments in R

December 1, 2013
By

When you are busy with a lengthy project, like writing a paper, you create many objects along the way. Every time you log into the project, you need to remember what is what. In the past, each new working session … Continue reading → Related posts: R and Dropbox When you woRk, you probably have...

## JAGS model Fe concentration in rainwater including values below detection level

December 1, 2013
By

In my previous post I ignored the fact that some data are below the detection level. I would not know how to handle those in a mixed model from lme4 or nlme. However, JAGS can handle these values. Next to that I kept the usual independent variables, su...

## More Explorations with catR

December 1, 2013
By

# For the purposes of simulating computerized adaptive tests# the R package catR is unparallelled. # catR is an excellent tool for students who are curious about# how a computerized adaptive test might work. It is also useful# for testing companie...

## Analyzing the DVI Indicator

November 30, 2013
By

The DVI indicator is a well-known indicator, created by David Varadi from CSS Analytics. It was introduced in 2009 as a good predictor for the S&P 500 over the past 30 years. Its performance on the S&P 500 has been studied in the blogosphere comprehensively. None of these studies, however, contained everything I was looking

## R Syntax for Ranked Choice Voting

November 30, 2013
By

I have gotten several requests for the R syntax I used to analyze the ranked-choice voting data and create the animated GIF. Rather than just posting the syntax, I thought I might write a detailed post describing the process. Reading … Continue reading →

## The attendants of useR! 2013 around the world

November 30, 2013
By

Alex and I had a great time in Albacete this summer, where the annual useR! conference took place. Of course we were really interested in the exciting news on R development, new packages and other related topics that we hoped to hear about there, and we also wanted to present what we have created with our R packages...

## RcppCNPy 0.2.2

November 30, 2013
By

Right on the heels of release 0.2.1 of RcppCNPy, a new version 0.2.2 is now on CRAN. RcppCNPy uses the CNPY library by Carl Rogers to provide R with easy read and write access to NumPy files. The reason for the new version that I had experimented ...

## Le Monde puzzle [#842]

November 29, 2013
By

An easily phrased (and solved?) Le Monde mathematical puzzle that does not require an R code: The five triplets A,B,C,D,E are such that and Given that find the five triplets. Adding up both sets of equations shows everything solely depends upon E1… So running an R code that checks for all possible values of

November 29, 2013
By

A few months ago I did a mini project using open crime data and R to create crime visualisations. At that time, I was already thinking about a web app using Shiny but I couldn't justify the time to develop the app and then set up a server etc. Not unti...

## Unusual timing shows how random mass murder can be (or even less)

November 29, 2013
By

This post follows the original one on the headline of the USA Today I read during my flight to Toronto last month. I remind you that the unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. Which means that the difference between

## Weisberg Growth Model

November 28, 2013
By

A fishR user asked me if I had Weisberg Linear Growth Model (LGM) vignette.  In the past, the Weisberg LGM referred to the fixed-effects models described in Weisberg (1993) and implemented in software that was written in XLISP-STAT and distributed … Continue reading →