Universal portfolio, part 9

July 25, 2012
By
Universal portfolio, part 9

Part 8 was discussing the distribution of the absolute wealth of the Universal Portfolio across all possible tuples of length 2, 3 and 4.However, comparing the absolute wealth against some reference, especially against simple portfolio selection algor...

Read more »

Getting rasters into shape from R

July 25, 2012
By
Getting rasters into shape from R

Today I needed to convert a raster to a polygon shapefile for further processing and plotting in R. I like to keep my code together so I can easily keep track of what I’ve done, so it made sense to … Continue reading →

Read more »

Another R mention in the NYT

July 25, 2012
By

The R language gets a brief mention in an article in yesterday's New York Times on automated bond trading: The traders here are mostly educated in math or physics, often outside the United States, and their desks are piled high with textbooks like the “R Graphs Cookbook,” for working with obscure computer programming languages. R an obscure programming language?...

Read more »

Hierarchical Cluster Analysis (ChemoSpec) – 03

July 25, 2012
By
Hierarchical Cluster Analysis (ChemoSpec) – 03

It is clear that we can discriminate between olive oil and sunflower oil, but let´s see the reason for the sub-clusters in the sunflower oil.Samples sflw6da, sflw7da, sflw8da, sflw9da, sflw10da are refined sunflower, so it is filtered and processed, t...

Read more »

Heaviside Signal Detection Part 1: Informed non-parametric testing

July 25, 2012
By
Heaviside Signal Detection Part 1: Informed non-parametric testing

Steps may be frequently found in geophysical datasets, specifically timeseries (e.g. GPS).  A common approach to estimating the size of the offset is to assume (or estimate) what the statistical structure of the noise is and estimate the size and … Continue reading →

Read more »

Inspirational Stack Overflow Dendrogram Applied to Currencies

July 25, 2012
By
Inspirational Stack Overflow Dendrogram Applied to Currencies

When I saw the answer to this Stack Overflow question, I immediately remembered working on my old post Clustering with Currencies and Fidelity Funds and just had to try to apply this technique.  As I should have guessed, it worked with only a mini...

Read more »

Revolution Newsletter: July 2012

July 25, 2012
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full July edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Quick Start Program for Hadoop: Revolution Analytics makes it easy for data analysts and...

Read more »

Long-vector kludge in R

July 25, 2012
By
Long-vector kludge in R

Just recently, I found out that R is limited to 32-bit integers, even on 64-bit hardware. Bummer, huh? As a consequence, the maximum size of a vector is 2^31-1. To be fair, dealing with numeric types across machine architectures is hard. A fixed repr...

Read more »

The housing bubble: Where are we?

July 25, 2012
By
The housing bubble: Where are we?

Last spring we looked at the state of the housing bubble in the US. The question of readers' minds then was "where is it going next"? It's been more than a year, so let's have a look, above. The post The housing bubble: Where are we? appeared first on Decision Science News.

Read more »

Learning R has really made me appreciate SAS

July 25, 2012
By

For the past 18 months, it seems like all I’ve heard about in the digital marketing industry is “big data”, and with that, mentions of using Hadoop and R to solve these sorts of problems.  Why are these tools the … Continue reading →Learning R has really made me appreciate SAS is an article from randyzwitch.com,...

Read more »

Really Big Objects Coming to R

July 25, 2012
By

I noticed in the development version of R the following note in the NEWS file: There is a subtle change in behaviour for numeric index values 2^31 and larger.  These used never to be legitimate and so were treated as NA, sometimes with a warning.

Read more »

Measuring persistence in a time series : Application of rolling window regression

Measuring persistence in a time series : Application of rolling window regression

During my final semester at IGIDR I did a project paper in macroeconomics involving timeseries econometrics. The concept that I focused on my study was unit root, which I have touched upon in my earlier posts. This study presents a novel...

Read more »

Displaying time series, spatial and space-time data with R

Displaying time series, spatial and space-time data with R

During next months I will be working on the book “Displaying time series, spatial and space-time data with R: stories …Continuar leyendo »

Read more »

Plotting 95% Confidence Bands in R

July 24, 2012
By
Plotting 95% Confidence Bands in R

I am comparing estimates from subject-specific GLMMs and population-average GEE models as part of a publication I am working on. As part of this, I want to visualize predictions of each type of model including 95% confidence bands. First I … Continue reading →

Read more »

RcppClassic 0.9.2

July 24, 2012
By

Similar to yesterday's post about RcppGSL, we have another pure maintenance release to announce, this time of RcppClassic, the package supporting the deprecated older classic Rcpp API defined in the earlier 2005 to 2006 releases, is now on CRAN. Ther...

Read more »

Civic Data Challenge closes July 29

July 24, 2012
By

There's only a few days left to enter the Civic Data Challenge: entries are due before midnight EST on July 29 to qualify for the $100,000 in prizes. The competition, open to US residents, challenges particpants to applications and visualizations from civic health data. Prizes will be awarded by a panel of prestigious judges. Looks like a great opportunity...

Read more »

How to tell when error bars correspond to a significant p-value

July 24, 2012
By
How to tell when error bars correspond to a significant p-value

Can you tell when error bars based on 95 % CIs or standard errors correspond to a significant p-value? Don’t fret if you think it’s hard, a study from 2005 showed that researchers in psychogoly, behavior neuroscience and medicine had a hard time judging when error bars from two independent groups signified a significant difference.

Read more »

get UCSC images for a list of regions in batch

July 24, 2012
By

Here is my working R code for the task. It can be simplified as 3 lines.# example of controling individual track#theURL="http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&wgRna=hide&cpgIslandExt=pack&ensGene=hide&mrna=hide&intronEst=hi...

Read more »

The Failure of Asset Allocation – Bonds Are An Imperfect Hedge

July 24, 2012
By
The Failure of Asset Allocation – Bonds Are An Imperfect Hedge

US investors were spoiled by US Treasuries which acted as a near perfect hedge to stocks during the 2008-2009 crisis.  However, in real crisis, bonds rarely offer any comfort, and asset allocation fails (see post Death Spiral of a Country and IMF ...

Read more »

What’s wrong with LOESS for palaeo data?

July 24, 2012
By
What’s wrong with LOESS for palaeo data?

Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight “signal” in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its form from the data themselves rather than having … Continue reading →

Read more »

Williams designs with 5 products

July 24, 2012
By
Williams designs with 5 products

In a previous post I created small Williams designs for an even number of products. This worked very well, also because the number of permutations could be restricted significantly due to symmetry. Unfortunately this does not work so well with an odd n...

Read more »

renaming data frame columns in lists

July 24, 2012
By
renaming data frame columns in lists

Renaming the columns of data frames which are stored in lists of lists Renaming the columns of data frames which are stored in lists of lists OK, so the scenario is as follows: we have a...

Read more »

What’s wrong with LOESS for palaeo data?

July 24, 2012
By
What’s wrong with LOESS for palaeo data?

Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight “signal” in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its form from the data themselves rather than having to be specified a priori by the user. There are generally two things that a user has...

Read more »

RcppGSL 0.2.0

July 23, 2012
By

Earlier today, a minor update / maintenance release of RcppGSL---our interface package between R and the GNU GSL using our Rcpp package for seamless R and C++ integration---arrived on on CRAN. It contains a number of minor changes to accomodate chan...

Read more »

Faster R in Hadoop: rmr 1.3 now available

July 23, 2012
By

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. Fast C implementations for serialization and deserialization from...

Read more »

Deploy Rook Apps with rApache: Part I

July 23, 2012
By

Since rApache 1.1.15 you’ve been able to deploy you Rook applications like so: # Run the Rook application named 'app'. On each request, the expression # 'Rook::Server$call(app)' is evaluated in an environment populated by # rookapp.R. 'app' is expected to be found in that environment. <Location /test/RookApp> SetHandler r-handler ...

Read more »

Success does not require understanding

July 23, 2012
By

I took part in the second Data Science London Hackathon last weekend (also my second hackathon) and it was a very different experience compared to the first hackathon. Once again Carlos and his team really looked after us. The data was released 24 hours before the competition started and even though I had spent less

Read more »

How to write a rapport template

July 23, 2012
By
How to write a rapport template

This post will show an introduction for the users how to produce a template, so how to produce similar results, like those one can see on rapport's homepage or in our forthcoming reporting web application.The post was written from the view of a Windows user, if problems were came up because you use...

Read more »

Estimating required hospital bed capacity

July 23, 2012
By
Estimating required hospital bed capacity

Estimating required hospital bed capacity requires a thorough analysis. There are a lot of ways of approaching a capacity requirement problem, but I think we can agree that a simple spreadsheet analysis just won't cut it. The approach described in this post makes use of discrete-event simulation and, just to  Read...

Read more »