Installing the RMySQL package on Windows 7

October 25, 2011
By

So you want to get statistical? Nowadays one of the ways to go is to use R, mostly in combination with ggplot2 for generating the plots. These plots and graphs however need some data, for that we use data sources. There are a lot of data sources availa...

Read more »

Example 9.11: Employment plot

October 25, 2011
By
Example 9.11: Employment plot

A facebook friend posted the picture reproduced above-- it makes the case that President Obama has been a successful creator of jobs, and also paints GW Bush as a president who lost jobs. Another friend pointed out that to be fair, all of Bush's presi...

Read more »

Consecutive number and lottery

October 25, 2011
By
Consecutive number and lottery

Recently, I have been reading odd things about strategies to win at the lottery. E.g. or I wrote something a long time ago, but maybe it would be better to write another post. First, it is easy to get data on the French lotteries, including dra...

Read more »

Longitudinal analysis: autocorrelation makes a difference

October 25, 2011
By
Longitudinal analysis: autocorrelation makes a difference

Back to posting after a long weekend and more than enough rugby coverage to last a few years. Anyway, back to linear models, where we usually assume normality, independence and homogeneous variances. In most statistics courses we live in a … Continue reading →

Read more »

Expected shortfall (CVaR) and Conditional Drawdown at Risk (CDaR) risk measures

October 25, 2011
By
Expected shortfall (CVaR) and Conditional Drawdown at Risk (CDaR) risk measures

In the Maximum Loss and Mean-Absolute Deviation risk measures post I started the discussion about alternative risk measures we can use to construct efficient frontier. Another alternative risk measures I want to discuss are Expected shortfall (CVaR) and Conditional Drawdown at Risk (CDaR). I will use methods presented in Comparative Analysis of Linear Portfolio Rebalancing

Read more »

Email Netiquette

October 25, 2011
By
Email Netiquette

A short piece of web-scrapping I sent as a reminder to my colleague. If you run it the result should be something like... Datatata!

Read more »

Sabermetrics Meets R Meetup

October 25, 2011
By

I just ran across this post at Big Computing. On November 14th, there will be an R User meet-up in Washington, DC (Tyson's Corner) led by Mike Driscoll about using R for sabermetric analysis (linked here). I will actually be home in Maryland for a co...

Read more »

Pair trading strategy : how to use "PairTrading" package

October 25, 2011
By
Pair trading strategy : how to use "PairTrading" package

Mr.Ishikawa(my old friend) and I developed "PairTrading" package, and uploaded it on CRAN.This article shows you how you can use it.The pair trading is a market neutral trading strategy and gives traders a chance to profit regardless of market conditions. The idea of this strategy is quite simple. 1 : Select two stocks(or any assets) moving similarly 2 : Short...

Read more »

Approximate Bayesian computational methods on-line

October 25, 2011
By
Approximate Bayesian computational methods on-line

Fig. 4 – Boxplots of the evolution of ABC approximations to the Bayes factor. The representation is made in terms of frequencies of visits to models MA(1) and MA(2) during an ABC simulation when ε corresponds to the 10,1,.1,.01% quantiles on the simulated autocovariance distances. The data is a time

Read more »

Machine Learning Ex 5.1 – Regularized Linear Regression

October 25, 2011
By
Machine Learning Ex 5.1 – Regularized Linear Regression

The first part of the Exercise 5.1 requires to implement a regularized version of linear regression. Adding regularization parameter can prevent the problem of over-fitting when fitting a high-order polynomial. Read More: 194 Words Totally

Read more »

Vanilla C code for the Stochastic Simulation Algorithm

October 24, 2011
By
Vanilla C code for the Stochastic Simulation Algorithm

The Gillespie stochastic simulation algorithm (SSA) is the gold standard for simulating state-based stochastic models. If you are a R buff, a SSA novice and want to get quickly up and running stochastic models (in particular ecological models) that are not … Continue reading →

Read more »

Simple Heatmap in R with Formula One Dataset

October 24, 2011
By
Simple Heatmap in R with Formula One Dataset

Now, that the 2011 F1 season is over I decided to quickly scrub the Formula 1 data of the F1.com website, such as the list of drivers, ordered by the approximate amount of salary driver is getting (top list driver is making the most, approx. 30MM) and position at the end of each race. There

Read more »

One week left to enter the $20,000 "Applications of R" contest

October 24, 2011
By
One week left to enter the $20,000 "Applications of R" contest

The deadline to enter the "Applications of R in Business" contest is just a week away. To qualify for $20,000 in prizes from Revolution Analytics, your entry must be submitted to inside-r.org by midnight PST on October 31. Note that this doesn't have to be your final submission: as long as you've entered a draft version, you can still...

Read more »

Two seasonal investors – R snippet

October 24, 2011
By
Two seasonal investors – R snippet

In “A tale of 2 Seasonal Investors“, the Big Picture discusses the simple idea of comparing two simple investment approaches: being exposed to the market 6 months every year (from November to April), as opposed to investing in the other 6 months of every year (from May to October). Going back 50 years in the

Read more »

NYT on Big Data and R

October 24, 2011
By

In the New York Times' "Bits" blog today, Quentin Hardy offers recollections on Big Data talks at the recent Web 2.0 Summit. He begins with a definition of Big Data: Big Data is really about ... the benefits we will gain by cleverly sifting through it to find and exploit new patterns and relationships. You see it now in...

Read more »

Show me your WAR face!

October 24, 2011
By
Show me your WAR face!

Below is a chart of the top 20 offensive players based on FanGraphs WAR for the 2011 season.  The various features and their corresponding metric are clear in the image. I’ve also included the leader and last place for each … Continue reading →

Read more »

XLConnect 0.1-7

October 24, 2011
By
XLConnect 0.1-7

Mirai Solutions GmbH (http://www.mirai-solutions.com) is pleased to announce the availability of XLConnect 0.1-7. This release includes a number of improvements and new features: Performance improvements when writing large xlsx files New workbook data extraction & replacement operators [, [<-, [[, … Continue reading →

Read more »

Parameter vs. Observation Dimension?

October 24, 2011
By
Parameter vs. Observation Dimension?

Bill Bolstad's response to Xi'an's review of his book Understanding Computational Bayesian Statistics included the following comment, which I found interesting: Frequentist p-values are constructed in the parameter dimension using a probability distribution defined only in the observation dimension. Bayesian credible intervals are constructed in the parameter dimension using a probability distribution in the parameter

Read more »

R Tutorial Series: Exploratory Factor Analysis

October 24, 2011
By
R Tutorial Series: Exploratory Factor Analysis

Exploratory factor analysis (EFA) is a common technique in the social sciences for explaining the variance between several measured variables as a smaller set of latent variables. EFA is often used to consolidate survey data by revealing the groupings ...

Read more »

A Simple Example for the Use of Shapefiles in R

October 24, 2011
By
A Simple Example for the Use of Shapefiles in R

A simple example for drawing an occurrence-map (polygons with species' points) with the R-packages maptools and sp using shapefiles.HERE is the example data.Read more »

Read more »

How to compute portfolio returns badly

October 24, 2011
By
How to compute portfolio returns badly

For those who naturally compute portfolio returns correctly here are some lessons in how to do it wrong. The data Random portfolios were generated from constituents of the S&P 500 with constraints: long-only exactly 20 assets in the portfolio no more than 10% weight for any asset (just for fun) the sum of the 5 … Continue reading...

Read more »

Machine Learning Ex4 – Logistic Regression

October 24, 2011
By
Machine Learning Ex4 – Logistic Regression

Exercise 4 required implementing Logistic Regression using Newton's Method. The dataset in use is 80 students and their grades of 2 exams, 40 students were admitted to college and the other 40 students were not. We need to implement a binary classification model to estimates college admission based on the student's scores on...

Read more »

Isarithmic Maps of Public Opinion Data

October 24, 2011
By
Isarithmic Maps of Public Opinion Data

As a follow-up to my isarithmic maps of county electoral data, I have attempted to experiment with extending the technique in two ways. First, where the electoral maps are based on data aggregated to the county level, I have sought to generalize the method to accept individual responses for which only zip code data is … Continue reading →

Read more »

Normality tests don’t do what you think they do

October 23, 2011
By
Normality tests don’t do what you think they do

Last week a question came up on Stack Overflow about determining whether a variable is distributed normally. Some of the answers reminded me of a common and pervasive misconception about how to apply tests against normality. I felt the topic was general enough to reproduce my comments here (with minor edits). Misconception: If your statistical analysis requires normality, it is

Read more »

understanding computational Bayesian statistics: a reply from Bill Bolstad

October 23, 2011
By
understanding computational Bayesian statistics: a reply from Bill Bolstad

Bill Bolstad wrote a reply to my review of his book Understanding computational Bayesian statistics last week and here it is, unedited except for the first paragraph where he thanks me for the opportunity to respond, “so readers will see that the book has some good features beyond having a “nice cover”.” (!) I simply processed

Read more »

The Zipf and Zipf-Mandelbrot distributions

The Zipf and Zipf-Mandelbrot distributions

In my last few posts, I have been discussing some of the consequences of the slow decay rate of the tail of the Pareto type I distribution, along with some other, closely related notions, all in the context of continuously distributed data.  Today’s post considers the Zipf distribution for discrete data, which has come to be extremely popular as...

Read more »

Using Sweave with XeLaTeX

October 23, 2011
By
Using Sweave with XeLaTeX

Using R with LaTeX via Sweave is a great way to create reproducible output. However, using specific fonts, e.g. your corporate fonts, can be painful with pdflatex. Over the last few weeks I have fallen in love with the TeX formatXeLaTeX and its XeTeX e...

Read more »

A Little Webscraping-Exercise…

October 22, 2011
By
A Little Webscraping-Exercise…

In R it's quite easy to pull out anything from a webpage and I'll show a little exercise in doing so.Here I retrieve all blog addresses from R-bloggers by the function readLines() and some subsequent data processing.Read more »

Read more »

Principal component analysis : Use extended to Financial economics : Part 2

October 22, 2011
By

My previous post talked about how we can employ PCA on the data for multiple stock returns to reduce the number of variables in explaining the variance of the underlying data. But the idea was greeted with skepticism by many. A caveat to the applicatio...

Read more »