## Big Data ETL and Big Data Analysis

November 14, 2012
I was at Strata New York 2012 last month. Great conference! Thanks O'Reilly media for assembling the industry leaders and running it well.I understand it was too crowded for some of my out-of-town friends. Stepping out to the streets of mid-town Manhat...

## Textual Healing

November 14, 2012
While I know there are several awesome guides for how to make fine-tuning adjustments to plot labels/axes/ticks/text in ggplot, this is still the most common question I get from people new to ggplot: how do I change the size/font/color/position of (som...

## Expand delimited columns in R

November 14, 2012
A postdoctoral researcher asked me the other day to help him expand a vector of comma delimited values so he could do computations in R with it. I wrote an R function to solve the problem. Here is the before...

## Timeline Maps with googleVis & Twitter Bootstrap Carousel (& updated Slidify)

November 14, 2012
I've wanted to create timeline maps with interactive googleVis Geomaps for a while. These would be a nice way to quickly show the spatial distribution of some data over time. It turns out that it's pretty easy to do with a plugin for Twitter Bootstra...

## Building a Simple Web App using R

November 13, 2012
I’ve been interested in building a web app using R for a while, but never put any time into it until I was informed of the Shiny package.  It looked too easy, so I absolutely had to try it out. First you need to install the package from the command line . options(repos=c(RStudio="http://rstudio.org/_packages", getOption("repos"))) install.packages("shiny")

## Influential Data in Multilevel Regression: What are your strategies?

November 13, 2012
The application of multilevel regression models has become common practice in the field of social sciences. Multilevel regression models take into account that observations on individual respondents are nested within higher-level groups such as schools, classrooms, states, and countries. In ...

## On Box-Cox transform in regression models

November 13, 2012
$Y_i=\beta_0+\beta_1 X_i+\varepsilon_i$

A few days ago, a former student of mine, David, contacted me about Box-Cox tests in linear models. It made me look more carefully at the test, and I do not understand what is computed, to be honest. Let us start with something simple, like a linea...

## Trees with the rpart package

November 13, 2012
What are trees? Trees (also called decision trees, recursive partitioning) are a simple yet powerful tool in predictive statistics. The idea is to split the covariable space into many partitions and to fit a constant model of the response variable in each partition. In case of regression, the mean...

## Presentation at the RECENS Group

November 13, 2012
I had a great opportunity to present my work on dynamic networks for the RECENS Group today at the  Hungarian Academy of Sciences, Centre for Social Sciences.It was a great honor, many thanks for the organizers, Károly Takács and Judit Pál for giving me the chance to get some feedback on my work. The...

## My first R GUI

November 13, 2012
This post is a huge jump from the last two - this is not for beginners!! But if you've ever considered building a GUI in R, looked at some of the online documentation, gotten scared, and decided not to, read this!!! Ok here goes. Dorian Auto GUI Setup: I built this for a school project. The basic problem setup is from...

## SAP CodeJam Montreal

November 13, 2012
Thanks to an initiative of Krista Elkin, Jonathan Druker and myself ( with a lot of support from Craig Cmehil and Helena Losada ), SAP CodeJam Montreal is going live on Thursday, December 13, 2012 from 3 to 9 pm in the SAP Labs Montreal offices.This is...

## Benchmarking bigglm

November 13, 2012
By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a five node cluster outperformed a Map Reduce/ Hadoop implementation...

## The R-Podcast Episode 11: Reproducible Analysis Part 1 (Introduction)

November 13, 2012
Season 2 of the R-Podcast is up and running! This episode begins a multi-part series on reproducible analysis using R. In this episode I discuss the usage of Sweave and LaTeX for producing reproducible reports, an introduction to the capabilities of the knitr package (more episodes will be coming dedicated to this package), and my

## Can’t a plot catch a break(s)?

November 13, 2012
This post continues with the theme of how to modify plots from within ggplot; today we will specifically looking at custom axis breaks. Plots later in the week will examine the commands to change text in the plot area. The various other shortcomings o...

## analyze the consumer expenditure survey (ce) with r

November 13, 2012
the consumer expenditure survey (ce) is the primo data source to understand how americans spend money.  participating households keep a running diary about every little purchase over the year.  those diaries are then summed up into precise expenditure categories.  how else are you gonna know that the average american household spent \$34 (±2) on bacon, \$826 (±17) on cellular...

## Simulating neurons or how to solve delay differential equations in R

November 13, 2012
I discussed earlier how the action potential of a neuron can be modelled via the Hodgkin-Huxely equations. Here I will present a simple model that describes how action potentials can be generated and propagated across neurons. The tricky bit here is that I use delay differential equations (DDE) to take into account the propagation time of the signal...

November 12, 2012
I like word clouds because they are visually appealing and provide a ton of information in a small space. Ever since I saw Drew Conway’s post (LINK) I have been looking for ways to improve word clouds. One of the … Continue reading →

## Comparing Shiny with gWidgetsWWW2.rapache

November 12, 2012
(A guest post by John Verzani) A few days back the RStudio blog announced Shiny, a new product for easily creating interactive web applications (http://www.rstudio.com/shiny/). I wanted to compare this new framework to one I’ve worked on, gWidgetsWWW2.rapache – a version of …Read more »

## RStudio releases Shiny

November 12, 2012
RStudio has released a new package for R. Shiny allows R developers to build simple interactive Web-based interfaces for R scripts, using only R code (no JavaScript development required!). You can see some examples of Shiny in action in this blog post, and there are more details about Shiny's capabilities in this tutorial. Shiny was first announced to beta...

## Using R — Callling C code with Rcpp

November 12, 2012
This entry is part 12 of 12 in the series Using RIn two previous posts we described how R can call C code with .C() and the more complex yet more robust option of calling C code with .Call().  Here …   read more ...

## You can’t play a broken link

November 12, 2012
Just like James Morrison and Nelly Furtado say, you really can't play a broken string. And quite similarly, you just can't use a broken link.I always find it very annoying when, while browsing a website, I find a broken link. You fall victim to the pro...

## Some Thoughts on Teaching R to 50,000 Students

November 12, 2012
Two weeks ago I finished teaching my course Computing for Data Analysis through Coursera. Since then I’ve had some time to think about how it went, what I learned, and what I’d do differently. First off, let me say that it was a lot of fun....

## Kappa – Lambda light chain ratio

November 12, 2012
Hi everybody,Few days back I was going through the development of immunoglobulin receptors on B cells.  I came across a few facts:1.  The ratio between kappa (K) and lambda (L) light chain attached with immunoglobulin (Ig) receptor is approxi...

## The guts of a statistical factor model

November 12, 2012
Specifics of statistical factor models and of a particular implementation of them. Previously Posts that are background for this one include: Three things factor models do Factor models of variance in finance The BurStFin R package The quality of variance matrix estimation The problem Someone asked me some questions about the statistical factor model in … Continue reading...

November 12, 2012
$Portfolio Trading$

In finance and investing the term portfolio refers to the collection of assets one owns. Compared to just holding a single asset at a time a portfolio has a number of potential benefits. A universe of asset holdings within the … Continue reading →

## Introduction to R and Biostatistics (2012 version): presentation

November 12, 2012
To follow my Introducing R and Biostatistics to first year LCG students (2012 version) post,  you can now find the presentation online from my site either in presentation format, in a single webpage format, or the raw Rmd file. To prove the point that publishing to RPubs is super easy, you can also find the single...

## PDQ 6.0 is On Its Way

November 12, 2012
PDQ (Pretty Damn Quick) version 6.0.β is in the QA pipeline. Although this is a major release, cosmetically, things won't look any different when it comes to writing PDQ models. All the big changes have taken place under the hood in order to make PDQ more consistent with the R statistical environment. R version 2.15.2...

## "Sample Sets" plots (Shootout-2012)

November 11, 2012
Histograms of all the sample sets together and individuallyRaw SpectraSpectra treated with MSC (Multiple Scatter correction)Spectra treated with SG filters﻿

## How I cracked Troyis (the online flash game)

November 11, 2012
Troyis™ is an addictive online flash game where you move a chess knight through increasingly difficult puzzles. After hours and hours of playing, sometimes late into the night, I decided I'd waste even more of my time and write a little program tha...