## Learning R has really made me appreciate SAS

July 25, 2012
For the past 18 months, it seems like all I’ve heard about in the digital marketing industry is “big data”, and with that, mentions of using Hadoop and R to solve these sorts of problems.  Why are these tools the … Continue reading →Learning R has really made me appreciate SAS is an article from randyzwitch.com,...

## Really Big Objects Coming to R

July 25, 2012
I noticed in the development version of R the following note in the NEWS file: There is a subtle change in behaviour for numeric index values 2^31 and larger.  These used never to be legitimate and so were treated as NA, sometimes with a warning.

## Measuring persistence in a time series : Application of rolling window regression

During my final semester at IGIDR I did a project paper in macroeconomics involving timeseries econometrics. The concept that I focused on my study was unit root, which I have touched upon in my earlier posts. This study presents a novel...

## Displaying time series, spatial and space-time data with R

During next months I will be working on the book “Displaying time series, spatial and space-time data with R: stories …Continuar leyendo »

## Plotting 95% Confidence Bands in R

July 24, 2012
I am comparing estimates from subject-specific GLMMs and population-average GEE models as part of a publication I am working on. As part of this, I want to visualize predictions of each type of model including 95% confidence bands. First I … Continue reading →

## RcppClassic 0.9.2

July 24, 2012
Similar to yesterday's post about RcppGSL, we have another pure maintenance release to announce, this time of RcppClassic, the package supporting the deprecated older classic Rcpp API defined in the earlier 2005 to 2006 releases, is now on CRAN. Ther...

## Civic Data Challenge closes July 29

July 24, 2012
There's only a few days left to enter the Civic Data Challenge: entries are due before midnight EST on July 29 to qualify for the \$100,000 in prizes. The competition, open to US residents, challenges particpants to applications and visualizations from civic health data. Prizes will be awarded by a panel of prestigious judges. Looks like a great opportunity...

## How to tell when error bars correspond to a significant p-value

July 24, 2012
Can you tell when error bars based on 95 % CIs or standard errors correspond to a significant p-value? Don’t fret if you think it’s hard, a study from 2005 showed that researchers in psychogoly, behavior neuroscience and medicine had a hard time judging when error bars from two independent groups signified a significant difference.

## get UCSC images for a list of regions in batch

July 24, 2012
Here is my working R code for the task. It can be simplified as 3 lines.# example of controling individual track#theURL="http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&wgRna=hide&cpgIslandExt=pack&ensGene=hide&mrna=hide&intronEst=hi...

## The Failure of Asset Allocation – Bonds Are An Imperfect Hedge

July 24, 2012
US investors were spoiled by US Treasuries which acted as a near perfect hedge to stocks during the 2008-2009 crisis.  However, in real crisis, bonds rarely offer any comfort, and asset allocation fails (see post Death Spiral of a Country and IMF ...

## What’s wrong with LOESS for palaeo data?

July 24, 2012
$What’s wrong with LOESS for palaeo data?$

Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight “signal” in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its form from the data themselves rather than having … Continue reading →

## Williams designs with 5 products

July 24, 2012
In a previous post I created small Williams designs for an even number of products. This worked very well, also because the number of permutations could be restricted significantly due to symmetry. Unfortunately this does not work so well with an odd n...

## renaming data frame columns in lists

July 24, 2012
Renaming the columns of data frames which are stored in lists of lists Renaming the columns of data frames which are stored in lists of lists OK, so the scenario is as follows: we have a...

## RcppGSL 0.2.0

July 23, 2012
Earlier today, a minor update / maintenance release of RcppGSL---our interface package between R and the GNU GSL using our Rcpp package for seamless R and C++ integration---arrived on on CRAN. It contains a number of minor changes to accomodate chan...

## Faster R in Hadoop: rmr 1.3 now available

July 23, 2012
The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. Fast C implementations for serialization and deserialization from...

## Deploy Rook Apps with rApache: Part I

July 23, 2012
Since rApache 1.1.15 you’ve been able to deploy you Rook applications like so: # Run the Rook application named 'app'. On each request, the expression # 'Rook::Server\$call(app)' is evaluated in an environment populated by # rookapp.R. 'app' is expected to be found in that environment. <Location /test/RookApp> SetHandler r-handler ...

## Success does not require understanding

July 23, 2012
I took part in the second Data Science London Hackathon last weekend (also my second hackathon) and it was a very different experience compared to the first hackathon. Once again Carlos and his team really looked after us. The data was released 24 hours before the competition started and even though I had spent less

## How to write a rapport template

July 23, 2012
This post will show an introduction for the users how to produce a template, so how to produce similar results, like those one can see on rapport's homepage or in our forthcoming reporting web application.The post was written from the view of a Windows user, if problems were came up because you use...

## Estimating required hospital bed capacity

July 23, 2012
Estimating required hospital bed capacity requires a thorough analysis. There are a lot of ways of approaching a capacity requirement problem, but I think we can agree that a simple spreadsheet analysis just won't cut it. The approach described in this post makes use of discrete-event simulation and, just to  Read...

## Music Data Hackathon 2012 – Beginner’s view

July 23, 2012
When I first heard of the existence of Hackathons (receive a data set, predict the response as good as possible, win money. All within 24 hours), I had two thoughts:1. Wow, that sounds greats. Like a huge game for intelligent people.2. My skills are no...

## Modeling Trick: Impact Coding of Categorical Variables with Many Levels

July 23, 2012
One of the shortcomings of regression (both linear and logistic) is that it doesn’t handle categorical variables with a very large number of possible values (for example, postal codes). You can get around this, of course, by going to another modeling technique, such as Naive Bayes; however, you lose some of the advantages of regression Related posts:

## Computing the degree of dependency (jointness) among explanatory variables using BMS

July 23, 2012
﻿Capturing the dependence between explanatory variables in the posterior distribution while implementing a Bayesian analysis is crucial. Taking such a dependence into account reveals the sensitivity of posterior distributions of parameters to depen...

## Third year wrap-up

July 23, 2012
July marks the end of three years of blogging for us. By our count, we've posted 121 examples across the first three years. We aim to be helpful and interesting.As always, it's hard to get a sense of our readership. At the time we wrote this, Feedbur...

## XLConnect 0.2-0

July 23, 2012
Mirai Solutions GmbH (http://www.mirai-solutions.com) is very pleased to announce the release of XLConnect 0.2-0, which can be found at CRAN. As one of the updates, XLConnect has moved to the newest release of Apache POI: 3.8. Also, the lazy evaluation … Continue reading →

## R Optimization Test

I have tested several R optimization functions before: nlm, optim(Nelder-Mead), optim(BFGS), optim(SANN), nlminb, optim (L-BFGS-B) for a eight-parameter Vasicek interest rate model, overall I find that for my setting, nlminb is the best and all R func...

## Two Free Sets of Methods Lectures

July 23, 2012
I provide links to two (free, publicly available) graduate level political methodology classes by Justin Esarey (Rice University) and Gary King (Harvard University). Both classes focus on statistical theory and modeling in R.

## A comparison of some heuristic optimization methods

July 23, 2012
A simple portfolio optimization problem is used to look at several R functions that use randomness in various ways to do optimization. Orientation Some optimization problems are really hard. In these cases sometimes the best approach is to use randomness to get an approximate answer. Once you decide to go down this route, you need … Continue reading...

