A Simple Example for the Use of Shapefiles in R

October 24, 2011
By
A Simple Example for the Use of Shapefiles in R

A simple example for drawing an occurrence-map (polygons with species' points) with the R-packages maptools and sp using shapefiles.HERE is the example data.Read more »

Read more »

How to compute portfolio returns badly

October 24, 2011
By
How to compute portfolio returns badly

For those who naturally compute portfolio returns correctly here are some lessons in how to do it wrong. The data Random portfolios were generated from constituents of the S&P 500 with constraints: long-only exactly 20 assets in the portfolio no more than 10% weight for any asset (just for fun) the sum of the 5 … Continue reading...

Read more »

Machine Learning Ex4 – Logistic Regression

October 24, 2011
By
Machine Learning Ex4 – Logistic Regression

Exercise 4 required implementing Logistic Regression using Newton's Method. The dataset in use is 80 students and their grades of 2 exams, 40 students were admitted to college and the other 40 students were not. We need to implement a binary classification model to estimates college admission based on the student's scores on...

Read more »

Isarithmic Maps of Public Opinion Data

October 24, 2011
By
Isarithmic Maps of Public Opinion Data

As a follow-up to my isarithmic maps of county electoral data, I have attempted to experiment with extending the technique in two ways. First, where the electoral maps are based on data aggregated to the county level, I have sought to generalize the method to accept individual responses for which only zip code data is … Read more

Read more »

Normality tests don’t do what you think they do

October 23, 2011
By
Normality tests don’t do what you think they do

Last week a question came up on Stack Overflow about determining whether a variable is distributed normally. Some of the answers reminded me of a common and pervasive misconception about how to apply tests against normality. I felt the topic was general enough to reproduce my comments here (with minor edits). Misconception: If your statistical analysis requires normality, it is

Read more »

understanding computational Bayesian statistics: a reply from Bill Bolstad

October 23, 2011
By
understanding computational Bayesian statistics: a reply from Bill Bolstad

Bill Bolstad wrote a reply to my review of his book Understanding computational Bayesian statistics last week and here it is, unedited except for the first paragraph where he thanks me for the opportunity to respond, “so readers will see that the book has some good features beyond having a “nice cover”.” (!) I simply processed

Read more »

The Zipf and Zipf-Mandelbrot distributions

The Zipf and Zipf-Mandelbrot distributions

In my last few posts, I have been discussing some of the consequences of the slow decay rate of the tail of the Pareto type I distribution, along with some other, closely related notions, all in the context of continuously distributed data.  Today’s post considers the Zipf distribution for discrete data, which has come to be extremely popular as...

Read more »

Using Sweave with XeLaTeX

October 23, 2011
By
Using Sweave with XeLaTeX

Using R with LaTeX via Sweave is a great way to create reproducible output. However, using specific fonts, e.g. your corporate fonts, can be painful with pdflatex. Over the last few weeks I have fallen in love with the TeX formatXeLaTeX and its XeTeX e...

Read more »

A Little Webscraping-Exercise…

October 22, 2011
By
A Little Webscraping-Exercise…

In R it's quite easy to pull out anything from a webpage and I'll show a little exercise in doing so.Here I retrieve all blog addresses from R-bloggers by the function readLines() and some subsequent data processing.Read more »

Read more »

Principal component analysis : Use extended to Financial economics : Part 2

October 22, 2011
By

My previous post talked about how we can employ PCA on the data for multiple stock returns to reduce the number of variables in explaining the variance of the underlying data. But the idea was greeted with skepticism by many. A caveat to the applicatio...

Read more »

Support Vector Machine with GPU, Part II

October 21, 2011
By
Support Vector Machine with GPU, Part II

In our last tutorial on SVM training with GPU, we mentioned a necessary step to pre-scale the data with rpusvm-scale, and to reverse scaling the prediction outcome. This cumbersome procedure is now simplified with the latest RPUSVM. read more

Read more »

High-schoolers celebrate World Statistics Day

October 21, 2011
By
High-schoolers celebrate World Statistics Day

Rose Hoffmann, AP Statistics teacher at Catholic Memorial High School in Waukesha, WI sent the following note to the Revolution Analytics team: In August 2010, my husband who is a statistician attended the American Statistical convention. Your company gave out the flying monkey with a black cape ... He gave me the monkey since it was my first year...

Read more »

Teaching with R: the switch

October 21, 2011
By

There are several blog posts, websites (and even books) explaining the transition from using another statistical system (e.g. SAS, SPSS, Stata, etc) to relying on R. Most of that material treats the topic from the point of view of i- … Continue reading →

Read more »

ggplot2 for big data

October 21, 2011
By
ggplot2 for big data

(Hadley Wickham, author of ggplot2 and several other R packages, guest blogs today about forthcoming big-data improvements to his R graphics package -- ed.) Hi! I'm Hadley Wickham and I'm guest posting on the Revolutions blog to give you a taste of some of the visualisation work that my research team and I worked on this summer. This work...

Read more »

Backtesting Part 4: random strategies

October 21, 2011
By
Backtesting Part 4: random strategies

Note: This post is NOT financial advice!  This is just a fun way to explore some of the capabilities R has for importing and manipulating data.   In part 2, we found that our 200-day high, hold 100 days strategy yielded average annual return...

Read more »

Predictability of stock returns : Using runs.test()

October 21, 2011
By

Financial market is interesting place, you find people taking positions (buying/selling) based on their expectations of what the security prices would be and are rewarded/penalized according to the accuracy of their expectations. The beauty of financia...

Read more »

Volume by Price charts with R – first attempt

October 21, 2011
By
Volume by Price charts with R – first attempt

I stumbled upon this chart in the R Graph Gallery, which got me thinking someone could come up with a Volume by Price chart using R. Such charts can be useful to determine support and resistance levels, as they illustrate amount of volume for different price ranges. Below is my first attempt at this. Note

Read more »

Generating sets of permutations

October 21, 2011
By
Generating sets of permutations

In previous posts I discussed how to generate a single permutation from a fully-randomised or restricted permutation design using shuffle(). Here I want to briefly mention the shuffleSet() function and illustrate it’s usage. Every time you call shuffle() it has to interpret the … Continue reading →

Read more »

le Monde puzzle [#745]

October 20, 2011
By
le Monde puzzle [#745]

The puzzle in Le Monde this weekend is not that clear (for a change!), so I may be confused in the following exposition: Three card players are betting with a certain (and different) number of chips each, between 4 and 9. After each game, the looser doubles the number of chips of the winner (while

Read more »

Since My Last Trip to Disney

October 20, 2011
By
Since My Last Trip to Disney

My family is off to DisneyWorld for a week, so there will not be any posts while I am there. However, I thought it would be interesting to see how Disney stock has done since my last trip September 2010.Maybe since Disney has done so poorly, the crowd...

Read more »

Slides for Revolution R Enterprise: 100% R and more

October 20, 2011
By

If you haven't yet taken a look at Revolution R Enterprise but wanted to know what is adds to open-source R, the slides below from yesterday's webinar will give you a quick overview: A recorded replay with audio of the me giving the presentation is also available at the link below. Revolution Analytics Webinars: Revolution R Enterprise: 100% R...

Read more »

Spatial correlation in designed experiments

October 20, 2011
By
Spatial correlation in designed experiments

Last Wednesday I had a meeting with the folks of the New Zealand Drylands Forest Initiative in Blenheim. In addition to sitting in a conference room and having nice sandwiches we went to visit one of our progeny trials at … Continue reading →

Read more »

Shipping Mix

October 20, 2011
By
Shipping Mix

With a fresh pile of historical global shipping data, we came back to the flow visualizations that illustrated tangible supply lines that facilitate global trade.  This time we've isolated two types of shipping vessels, cargo and tanker, in order ...

Read more »

Queueing up in R, continued

October 20, 2011
By
Queueing up in R, continued

Shown above is a queueing simulation. Each diamond represents a person. The vertical line up is the queue; at the bottom are 5 slots where the people are attended. The size of each diamond is proportional to the log of the time it will take them to be attended. Color is used to tell one

Read more »

postdoctoral positions in Paris

October 20, 2011
By
postdoctoral positions in Paris

There is a call for postdoctoral positions supported by the Paris Mathematical Sciences Foundation. The deadline is December 13 and the on-line application is available. If you are interested in working with me on Bayesian statistics  (model choice, time series model) or computational methods (SMC, MCMC, ABC, &c.) thru this call, please contact me at

Read more »

Does the S&P 500 exhibit seasonality through the year?

October 20, 2011
By
Does the S&P 500 exhibit seasonality through the year?

Are there times of the year when returns are better or worse? Abnormal Returns prompted this question with “SAD and the Halloween indicator” in which it is claimed that the US market tends to outperform from about Halloween until April. Data The data consisted of 15,548 daily returns of the S&P 500 starting in 1950.  … Continue reading...

Read more »

Confidence interval diagram in R

October 19, 2011
By
Confidence interval diagram in R

This code shows how to easily plot a beautiful confidence interval diagram in R. First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from … Continue reading →

Read more »

R. I. P. EMA

October 19, 2011
By
R. I. P. EMA

That’s right, I am moving away from exponential moving averages. Originally, I decided to use them somewhat arbitrary, probably because they tend to swing faster. Last night, after spending two and half hours debugging an issue which yet again turned out to be a particular property of these averages, I made my mind. I am

Read more »

Minimum Investment and Number of Assets Portfolio Cardinality Constraints

October 19, 2011
By
Minimum Investment and Number of Assets Portfolio Cardinality Constraints

The Minimum Investment and Number of Assets Portfolio Cardinality Constraints are practical constraints that are not easily incorporated in the standard mean-variance optimization framework. To help us impose these real life constraints, I will introduce extra binary variables and will use mixed binary linear and quadratic programming solvers. Let’s continue with our discussion from Introduction

Read more »