## Learning R has really made me appreciate SAS

July 25, 2012
By

For the past 18 months, it seems like all I’ve heard about in the digital marketing industry is “big data”, and with that, mentions of using Hadoop and R to solve these sorts of problems.  Why are these tools the … Continue reading →Learning R has really made me appreciate SAS is an article from randyzwitch.com,...

Read more »

## Really Big Objects Coming to R

July 25, 2012
By

I noticed in the development version of R the following note in the NEWS file: There is a subtle change in behaviour for numeric index values 2^31 and larger.  These used never to be legitimate and so were treated as NA, sometimes with a warning.

Read more »

## Measuring persistence in a time series : Application of rolling window regression

During my final semester at IGIDR I did a project paper in macroeconomics involving timeseries econometrics. The concept that I focused on my study was unit root, which I have touched upon in my earlier posts. This study presents a novel...

Read more »

## Displaying time series, spatial and space-time data with R

During next months I will be working on the book “Displaying time series, spatial and space-time data with R: stories …Continuar leyendo »

Read more »

## Plotting 95% Confidence Bands in R

July 24, 2012
By

I am comparing estimates from subject-specific GLMMs and population-average GEE models as part of a publication I am working on. As part of this, I want to visualize predictions of each type of model including 95% confidence bands. First I … Continue reading →

Read more »

## RcppClassic 0.9.2

July 24, 2012
By

Similar to yesterday's post about RcppGSL, we have another pure maintenance release to announce, this time of RcppClassic, the package supporting the deprecated older classic Rcpp API defined in the earlier 2005 to 2006 releases, is now on CRAN. Ther...

Read more »

## Civic Data Challenge closes July 29

July 24, 2012
By

There's only a few days left to enter the Civic Data Challenge: entries are due before midnight EST on July 29 to qualify for the \$100,000 in prizes. The competition, open to US residents, challenges particpants to applications and visualizations from civic health data. Prizes will be awarded by a panel of prestigious judges. Looks like a great opportunity...

Read more »

## How to tell when error bars correspond to a significant p-value

July 24, 2012
By

Can you tell when error bars based on 95 % CIs or standard errors correspond to a significant p-value? Don’t fret if you think it’s hard, a study from 2005 showed that researchers in psychogoly, behavior neuroscience and medicine had a hard time judging when error bars from two independent groups signified a significant difference.

Read more »

## get UCSC images for a list of regions in batch

July 24, 2012
By

Here is my working R code for the task. It can be simplified as 3 lines.# example of controling individual track#theURL="http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&wgRna=hide&cpgIslandExt=pack&ensGene=hide&mrna=hide&intronEst=hi...

Read more »

## The Failure of Asset Allocation – Bonds Are An Imperfect Hedge

July 24, 2012
By

US investors were spoiled by US Treasuries which acted as a near perfect hedge to stocks during the 2008-2009 crisis.  However, in real crisis, bonds rarely offer any comfort, and asset allocation fails (see post Death Spiral of a Country and IMF ...

Read more »

## What’s wrong with LOESS for palaeo data?

July 24, 2012
By
$What’s wrong with LOESS for palaeo data?$

Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight “signal” in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its form from the data themselves rather than having … Continue reading →

Read more »

## Williams designs with 5 products

July 24, 2012
By

In a previous post I created small Williams designs for an even number of products. This worked very well, also because the number of permutations could be restricted significantly due to symmetry. Unfortunately this does not work so well with an odd n...

Read more »

## renaming data frame columns in lists

July 24, 2012
By

Renaming the columns of data frames which are stored in lists of lists Renaming the columns of data frames which are stored in lists of lists OK, so the scenario is as follows: we have a...

Read more »

## What’s wrong with LOESS for palaeo data?

July 24, 2012
By

Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight “signal” in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its form from the data themselves rather than having to be specified a priori by the user. There are generally two things that a user has...

Read more »

## RcppGSL 0.2.0

July 23, 2012
By

Earlier today, a minor update / maintenance release of RcppGSL---our interface package between R and the GNU GSL using our Rcpp package for seamless R and C++ integration---arrived on on CRAN. It contains a number of minor changes to accomodate chan...

Read more »

## Faster R in Hadoop: rmr 1.3 now available

July 23, 2012
By

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. Fast C implementations for serialization and deserialization from...

Read more »

## Deploy Rook Apps with rApache: Part I

July 23, 2012
By

Since rApache 1.1.15 you’ve been able to deploy you Rook applications like so: # Run the Rook application named 'app'. On each request, the expression # 'Rook::Server\$call(app)' is evaluated in an environment populated by # rookapp.R. 'app' is expected to be found in that environment. <Location /test/RookApp> SetHandler r-handler ...

Read more »

## Success does not require understanding

July 23, 2012
By

I took part in the second Data Science London Hackathon last weekend (also my second hackathon) and it was a very different experience compared to the first hackathon. Once again Carlos and his team really looked after us. The data was released 24 hours before the competition started and even though I had spent less

Read more »

## How to write a rapport template

July 23, 2012
By

This post will show an introduction for the users how to produce a template, so how to produce similar results, like those one can see on rapport's homepage or in our forthcoming reporting web application.The post was written from the view of a Windows user, if problems were came up because you use...

Read more »

## Estimating required hospital bed capacity

July 23, 2012
By

Estimating required hospital bed capacity requires a thorough analysis. There are a lot of ways of approaching a capacity requirement problem, but I think we can agree that a simple spreadsheet analysis just won't cut it. The approach described in this post makes use of discrete-event simulation and, just to  Read...

Read more »

## Music Data Hackathon 2012 – Beginner’s view

July 23, 2012
By

When I first heard of the existence of Hackathons (receive a data set, predict the response as good as possible, win money. All within 24 hours), I had two thoughts:1. Wow, that sounds greats. Like a huge game for intelligent people.2. My skills are no...

Read more »

## Estimating required hospital bed capacity

July 23, 2012
By

Estimating required hospital bed capacity requires a thorough analysis. There are a lot of ways of approaching a capacity requirement problem, but I think we can agree that a simple spreadsheet analysis just won't cut it. The approach described in this post makes use of discrete-event simulation and, just to clarify, makes abstraction from a lot of variables which should be taken into consideration in...

Read more »

## Modeling Trick: Impact Coding of Categorical Variables with Many Levels

July 23, 2012
By

One of the shortcomings of regression (both linear and logistic) is that it doesn’t handle categorical variables with a very large number of possible values (for example, postal codes). You can get around this, of course, by going to another modeling technique, such as Naive Bayes; however, you lose some of the advantages of regression Related posts:

Read more »

## Computing the degree of dependency (jointness) among explanatory variables using BMS

July 23, 2012
By

﻿Capturing the dependence between explanatory variables in the posterior distribution while implementing a Bayesian analysis is crucial. Taking such a dependence into account reveals the sensitivity of posterior distributions of parameters to depen...

Read more »

## Third year wrap-up

July 23, 2012
By

July marks the end of three years of blogging for us. By our count, we've posted 121 examples across the first three years. We aim to be helpful and interesting.As always, it's hard to get a sense of our readership. At the time we wrote this, Feedbur...

Read more »

## XLConnect 0.2-0

July 23, 2012
By

Mirai Solutions GmbH (http://www.mirai-solutions.com) is very pleased to announce the release of XLConnect 0.2-0, which can be found at CRAN. As one of the updates, XLConnect has moved to the newest release of Apache POI: 3.8. Also, the lazy evaluation … Continue reading →

Read more »

## R Optimization Test

I have tested several R optimization functions before: nlm, optim(Nelder-Mead), optim(BFGS), optim(SANN), nlminb, optim (L-BFGS-B) for a eight-parameter Vasicek interest rate model, overall I find that for my setting, nlminb is the best and all R func...

Read more »

## Two Free Sets of Methods Lectures

July 23, 2012
By

I provide links to two (free, publicly available) graduate level political methodology classes by Justin Esarey (Rice University) and Gary King (Harvard University). Both classes focus on statistical theory and modeling in R.

Read more »

## A comparison of some heuristic optimization methods

July 23, 2012
By

A simple portfolio optimization problem is used to look at several R functions that use randomness in various ways to do optimization. Orientation Some optimization problems are really hard. In these cases sometimes the best approach is to use randomness to get an approximate answer. Once you decide to go down this route, you need … Continue reading...

Read more »