A tiny RCurl headache ;)

July 4, 2012
By

As more and more data go online (plus we love Google Drive) we are forced to connect to our data over the net. We mostly do this via RCurl (but we could do this using RGoogleDocs as well).In that case all that is required to get the data into R is the two lines of ...read more

Read more »

A new open journal on Data Science

July 4, 2012
By

Springer has introduced a new open, peer-reviewed journal focused on Data Science: EPJ Data Science. What makes this a Data Science journal is novel uses of statistics, data analysis, computer techniques and public data sources to research a topic in another domain, rather than methodological research. Here are a few examples of the papers you'll find in the journal:...

Read more »

Alternative to Monte Carlo Testing

July 4, 2012
By
Alternative to Monte Carlo Testing

When we backtest a strategy on a portfolio, it is a simple analysis of a single period in time. There are ways to “stress test” a strategy such as monte carlo, random portfolios, or shuffling the returns in a random order. I could never really wrap my head around monte carlo and shuffling the returns … Continue reading...

Read more »

Three Questions about a Matrix of Coefficient Plots

July 4, 2012
By

It's Independence Day in the U.S., so I am taking the day off, but I received the following request for advice and thought I'd pass it along to my readers. I wonder if you could help – I am trying to create 9 different coefficient plots , which repr...

Read more »

A tutorial on outlier detection techniques

July 4, 2012
By
A tutorial on outlier detection techniques

by Yanchang Zhao, RDataMining.com There is an excellent tutorial on outlier detection techniques, presented by Hans-Peter Kriegel et al. at ACM SIGKDD 2010. It presents many popular outlier detection algorithms, most of which were published between mid 1990s and 2010, … Continue reading →

Read more »

The Higgs boson: 5-sigma and the concept of p-values

July 4, 2012
By
The Higgs boson: 5-sigma and the concept of p-values

Why are physicists talking about 5-sigma, and what's it got to do with statistics? In this short post I'll explain what 5-sigma is and why it's not a measure of how certain scientist are that they've found the Higgs boson

Read more »

Glmnet_1.8 uploaded to CRAN

July 4, 2012
By

(by Trevor Hastie) Glmnet_1.8 uploaded to CRAN – This is a major revision, with two additional models included. 1) Multiresponse regression – family=”mgaussian” Here we have a matrix of M responses, and we fit a series of linear models in parallel. We use a group-lasso penalty on the set of M coefficients for each variable. This means they are...

Read more »

To the Basics: Bayesian Inference on A Binomial Proportion

July 4, 2012
By
To the Basics: Bayesian Inference on A Binomial Proportion

Think of something observable – countable – that you care about with only one outcome or another. It could be the votes cast in a two-way election in your town, or the free throw shots the center on your favorite...

Read more »

Example of Factor Attribution

July 3, 2012
By
Example of Factor Attribution

In the prior post, Factor Attribution 2, I have shown how Factor Attribution can be applied to decompose fund’s returns in to Market, Capitalization, and Value factors, the “three-factor model” of Fama and French. Today, I want to show you a different application of Factor Attribution. First, let’s run Factor Attribution on each the stocks

Read more »

RcppBDT 0.2.0

A new release of the RcppBDT package appeared on CRAN earlier today. RcppBDT uses Rcpp, and in particular the nifty Rcpp modules feature of wrapping C++ code for R just by declaring the (class or function) interfaces. It uses this to bring in some useful functions from Boost Date.Time to R so that one can do things like R> library(RcppBDT) R> sapply(2012:2016, function(year) +...

Read more »

The role of Statistics in the Higgs Boson discovery

July 3, 2012
By
The role of Statistics in the Higgs Boson discovery

News is starting to leak that the Large Hadron Collider may have accomplished its primary mission of confirming the existence of the hypothesised and heretofore elusive subatomic particle, the Higgs Boson. And sure, billions of Euros worth of state-of-the-art high-energy machinery and an army of experimental and theoretical physicists probably had something to do with the discovery. But did...

Read more »

An Improvement to Coefficient Plots

July 3, 2012
By

I recently posted about coefficient plots, discussing my approach and providing some example R code to create the graphs. I had the good fortune of hearing Amanda Driscoll give a talk recently, and she made a small, but really nice … Continue rea...

Read more »

Combining ggplot Images

July 3, 2012
By
Combining ggplot Images

The ggplot2 package provides an excellent platform for data visualization. One (minor) drawback of this package is that combining ggplot images into one plot, like the par() function does for regular plots, is not a straightforward procedure. Fortunately, R user Stephen Turner has kindly provided a function called “arrange” that does exactly this. The function,

Read more »

Blog with Knitr and Jekyll

July 3, 2012
By
Blog with Knitr and Jekyll

The knitr package provides an easy way to embed R code in a Jekyll-Bootstrap blog post. The only required input is an R Markdown source file. The name of the source file used to generate this post is 2012-07-03-knitr-jekyll.Rmd, available here. Steps taken to build this post are as follows: Step 1 Create a Jekyll-Boostrap blog if...

Read more »

Applying a function successively in R

July 3, 2012
By
Applying a function successively in R

At the R in Finance conference Paul Teetor gave a fantastic talk about Fast(er) R Code. Paul mentioned the common higher-order function Reduce, which I hadn't used before. Reduce allows me to apply a function successively over a vector. What does that...

Read more »

RcppArmadillo 0.3.2.3

Conrad releaser version 3.2.3 of Armadillo a few days ago, and the corresponding RcppArmadillo package 0.3.2.3 is now CRAN. (For these keeping score 3.2.1 never was a full release, and 3.2.2 containing fixes for a build issue that did not affect the ...

Read more »

A big list of the things R can do

July 2, 2012
By

R is an incredibly comprehensive statistics package. Even if you just look at the standard R distribution (the base and recommended packages), R can do pretty much everything you need for data manipulation, visualization, and statistical analysis. And for everything else, there's more than 5000 packages on CRAN and other repositories, and the big-data capabilities of Revolution R Enterprise....

Read more »

precise pangolin (Ubuntu 12.04)

July 2, 2012
By
precise pangolin (Ubuntu 12.04)

Following the crash of my hard drive right before leaving Kyoto, I bought a cheap Compaq Presario CQ57 to reinstall Ubuntu 12.04 over the weekend (and have a laptop available before leaving for Australia…)  It took about one hour to install from the DVD and everything seems to be working out of the box. The

Read more »

Graphics Artifacts from Quarterly Commentary

July 2, 2012
By
Graphics Artifacts from Quarterly Commentary

For my Q2 2012 commentary, I tried multiple graphs to illustrate the disconnect of the US stock markets with the rest of the world.  I think I finally settled on this simple Excel bar graph populated by Bloomberg data, but I thought some might lik...

Read more »

Project Euler — problem 11

July 2, 2012
By
Project Euler — problem 11

It’s been a while since I solved one Euler problem last time. Has been busy. Now I’m back and continue to solve the next problem, which is to find the maximum. Let’s take a look at the 11th problem: What … Continue reading →

Read more »

Citing R or SAS

July 2, 2012
By
Citing R or SAS

One of us recently read a colleague's first draft of a paper, in which she had written: "All analyses were done in R 2.14.0." We assume we're preaching to the converted here, when we say that the enormous amount of work that goes into R needs to be re...

Read more »

My first competition at Kaggle

July 2, 2012
By
My first competition at Kaggle

For me Kaggle becomes a social network for data scientist, as stackoverflow.com or github.com for programmers. If you are data scientist, machine learner or statistician you better off to have a profile there, otherwise you do not exist. Nevertheless, I won’t bet on rosy future for data scientist as journalists suggest (sexy job for next

Read more »

Popularity of R continues

July 2, 2012
By
Popularity of R continues

No doubt those that read my blog know that the tools I use to do my Industrial Engineering and Operations Research work heavily rely on the open source side of software.  That is why I try to support as many open source projects such as COIN-OR, G...

Read more »

Moving beyond hopeless graphics

July 2, 2012
By

I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit The post Moving...

Read more »

Random portfolios versus Monte Carlo

July 2, 2012
By
Random portfolios versus Monte Carlo

What is the difference between Monte Carlo — as it is usually defined in finance — and random portfolios? The meaning of “Monte Carlo” The idea of “Monte Carlo” is very simple.  It is a fancy word for “simulation”. As usual, it is all too possible to find incredibly muddied explanations of such a simple … Continue reading...

Read more »

Simple distribution plot in R

July 2, 2012
By
Simple distribution plot in R

Plot the distribution of a sample as bars and add a histogram line for visualizing the sample characteristics. No related posts.

Read more »

MatLab, SAS, STATA, SPSS, Excel users: Try R, damn it!

July 2, 2012
By
MatLab, SAS, STATA, SPSS, Excel users: Try R, damn it!

Due to my work with a multitude of statistical packages in my career I may be able to evaluate a lot of them. I’ve first used Excel for my calculations as most of the normal users do. I like the idea behind a spreadsheet and the combination of data and click-to-do functions. Nevertheless I’ve often

Read more »

Olive vs. Sunflower oil Spectra – 002 (ChemoSpec)

July 1, 2012
By
Olive vs. Sunflower oil Spectra – 002 (ChemoSpec)

I add other data set of “sunflower oil” to import together with the olive oil into ChemoSpec R package. Before, as I showed in a video (Preparing spectra to import into ChemoSpec), every sample has been acquired with a NIR instrument (in transmitta...

Read more »

Visualizing uncertainty using Jackknife

July 1, 2012
By
Visualizing uncertainty using Jackknife

Once again, I (re)discovered last week at the Rmetrics conference that old toolds can be extremely interesting to illustrate complex ideas, like uncertainty in fnancial markets, and stock prices. For instance a 99.5% quantile: we look for the scena...

Read more »