Confidence interval diagram in R

October 19, 2011
By
Confidence interval diagram in R

This code shows how to easily plot a beautiful confidence interval diagram in R. First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from … Continue reading →

Read more »

R. I. P. EMA

October 19, 2011
By
R. I. P. EMA

That’s right, I am moving away from exponential moving averages. Originally, I decided to use them somewhat arbitrary, probably because they tend to swing faster. Last night, after spending two and half hours debugging an issue which yet again turned out to be a particular property of these averages, I made my mind. I am

Read more »

Minimum Investment and Number of Assets Portfolio Cardinality Constraints

October 19, 2011
By
Minimum Investment and Number of Assets Portfolio Cardinality Constraints

The Minimum Investment and Number of Assets Portfolio Cardinality Constraints are practical constraints that are not easily incorporated in the standard mean-variance optimization framework. To help us impose these real life constraints, I will introduce extra binary variables and will use mixed binary linear and quadratic programming solvers. Let’s continue with our discussion from Introduction

Read more »

the Wang-Landau algorithm reaches the flat histogram in finite time

October 19, 2011
By
the Wang-Landau algorithm reaches the flat histogram in finite time

Pierre Jacob and Robin Ryder (from Paris-Dauphine, CREST, and Statisfaction) have just arXived (and submitted to the Annals of Applied Probability) a neat result on the Wang-Landau algorithm. (This algorithm, which modifies the target in a sort of reweighted partioned sampling to achieve faster convergence, has always been perplexing to me.)  They show that some

Read more »

Support Vector Machines in R (a course by Lutz Hamel)

October 19, 2011
By

Support vector machines (SVM’s) are the “big iron” of the data mining world, especially suited for extreme data intensive tasks like image classification, biosequence processing, handwriting recognition, etc. Dr. Lutz Hamel, author of “Knowledge Discovery with Support Vector Machines”, presents his online course “Introduction to Support Vector Machines In R” November 18 – December 16. “Support Vector Machines in...

Read more »

Web-friendly visualizations in R

October 19, 2011
By

Aleks points me to this new tool from Wojciech Gryc. Right now I save my graphs as pdfs or pngs and then upload them to put them on the web. I expect I’ll still be doing this for awhile—I like having full control of what my graphs look like—but Gryc’s default plots might be useful The post Web-friendly...

Read more »

On R, bloggers, politics, sex, alcohol and rock & roll

October 19, 2011
By
On R, bloggers, politics, sex, alcohol and rock & roll

Yesterday morning at 7 am I was outside walking the dog before getting a taxi to go to the airport to catch a plane to travel from Christchurch to Blenheim (now I can breath after reading without a pause). It … Continue reading →

Read more »

The R-Files: Paul Teetor

October 19, 2011
By
The R-Files: Paul Teetor

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Paul Teetor Profession: Quantitative developer (freelance) Nationality: American Years Using R: 7 Known for: Author of R Cookbook (O’Reilly Media, 2011) An active member of the R community, Paul Teetor is a quantitative developer and statistical consultant based in the...

Read more »

Studying market reactions after consecutive gains (losses)

October 19, 2011
By
Studying market reactions after consecutive gains (losses)

Arthur Charpentier used R to denote a broken record of the CAC 40 when it went 11 consecutive days with negative returns. Question: What happens to the market after runs of positive or negative returns? Will the market tank or soar after n days of gains/losses? First, a little dissection of historical data (S&P 500

Read more »

How does Matt kemp become Andre Dawson?

October 18, 2011
By
How does Matt kemp become Andre Dawson?

While reading this article over at Fangraphs I was inspired to ask myself “what would Matt Kemp have to do between now and then end of his career to be seriously considered for the Hall of Fame?”.  This question comes … Continue reading →

Read more »

Fusion Tables by Google

October 18, 2011
By

Google's Fusion Tables look impressive, for those who want to try geo-visualizations of their data. You don't need much programming experience to be able to use it.For those who want to try it out, here's a nice intro that Kathyrn Hurley presented at the recent SVCC (Silicon Valley Code Camp). When combined with ShpEscape (note spelling) it becomes...

Read more »

Generating restricted permutations with permute

October 18, 2011
By
Generating restricted permutations with permute

In a previous post I introduced the permute package and the function shuffle(). In that post I got as far as replicating R’s base function sample(). Here I’ll briefly outline how shuffle() can be used to generate restricted permutations. shuffle() … Continue reading →

Read more »

130/30 Porfolio Construction

October 18, 2011
By
130/30 Porfolio Construction

The 130/30 funds were getting lots of attention a few years ago. The 130/30 fund is a long/short portfolio that for each $100 dollars invested allocates $130 dollars to longs and $30 dollars to shorts. From portfolio construction perspective this simple idea is no so simple to implement. Let’s continue with our discussion from Introduction

Read more »

Large applications of linear mixed models

October 18, 2011
By
Large applications of linear mixed models

In a previous post I summarily described our options for (generalized to varying degrees) linear mixed models from a frequentist point of view: nlme, lme4 and ASReml-R†, followed by a quick example for a split-plot experiment. But who is really … Continue reading →

Read more »

ACM Data Mining Camp 2011: Report

October 18, 2011
By

(By Joseph Rickert.) In San Jose topics like big data, map reduce, predictive models, mobile analytics and crowdsourcing draw a crowd even on a Saturday. So it turned out that the ACM data Mining Camp and "un-conference" was a very "happening" way to spend a Saturday. Over 500 people attended the event at the Ebay "Town Hall" on North...

Read more »

Generating restricted permutations with permute

October 18, 2011
By

In a previous post I introduced the permute package and the function shuffle(). In that post I got as far as replicating R’s base function sample(). Here I’ll briefly outline how shuffle() can be used to generate restricted permutations.

Read more »

Short selling, volatility and bubbles

October 17, 2011
By
Short selling, volatility and bubbles

Yesterday, I wrote a post (in French) about short-selling in financial market since some journalists claimed that it was well-known that short -selling does increase volatility on financial market. Not only in French speaking journals actually, sin...

Read more »

Get the Basics right – Suggestion for R Beginners

October 17, 2011
By

I am always looking for suggestions on how to get better at R, esp. for beginners. So when I see someone who's gotten adept at it, I ask them how they got there.This weekend, at the Bay Area ACM Data Mining Camp, one person gave me what seemed like a g...

Read more »

Revolution Newsletter: October 2011

October 17, 2011
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full October edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Applications of R Contest: Deadline October 31. Revolution Analytics is offering $20,000 in prizes...

Read more »

R Tools for FEC Campaign Finance Disclosure Data

October 17, 2011
By
R Tools for FEC Campaign Finance Disclosure Data

For my first contribution to the blog, I wanted to make some kind of enlightening visualization of campaign finance disclosure data from the Federal Election Commission’s website. It looks like they’re working on some new, easy-to-use data dumps here, but … Continue reading →

Read more »

Lattice when modeling, ggplot when publishing

October 17, 2011
By
Lattice when modeling, ggplot when publishing

When working in research projects I tend to fit several, sometimes quite a few, alternative models. This model fitting is informed by theoretical considerations (e.g. quantitative genetics, experimental design we used, our understanding of the process under study, etc.) but … Continue reading →

Read more »

Software for Research, Part 3: [R], RStudio and ggplot2 for Statistics

October 17, 2011
By
Software for Research, Part 3: [R], RStudio and ggplot2 for Statistics

is an excellent open-source statistics language. It's cross-platform and free and I think it will eventually displace proprietary stat's packages due to its rapid development, speed and ease of use. So there's no time like the present to get used...

Read more »

Colors in R

October 17, 2011
By
Colors in R

One of my favorite R packages that I use all the time is the RColorBrewer package. The package has been around for a while now and is written/maintained by Erich Neuwirth. The guts of the package are based on Cynthia Brewer’s very cool work on the us...

Read more »

Example 9.10: more regression trees and recursive partitioning with "partykit"

October 17, 2011
By
Example 9.10: more regression trees and recursive partitioning with "partykit"

We discuss recursive partitioning, a technique for classification and regression using a decision tree in section 6.7.3 of the book. Support for these methods is available within the rpart package. Torsten Hothorn and Achim Zeileis have extended the ...

Read more »

Backtesting a Simple Stock Trading Strategy: Part 3

October 17, 2011
By
Backtesting a Simple Stock Trading Strategy: Part 3

Note: This post is NOT financial advice!  This is just a fun way to explore some of the capabilities R has for importing and manipulating data.   In a previous post, I examined a simple stock trading strategy: Find the high point over the la...

Read more »

Tikz Nodes

October 17, 2011
By
Tikz Nodes

Nodes are used in tikz to place content in a picture as part of a LaTeX document. Fast Tube by Casper When creating a tikz picture the origin is assumed to be at (0,0) and objects are placed with positioning relative to the origin on the picture. If we wanted to add a grid with

Read more »

Installing rgdal on a Mac

October 16, 2011
By

So, installing rgdal, which is an important R package for spatial data analysis can be a bit of a pain on the mac. Here are two ways to make it happen.   The Easy Way In R run: install.packages('rgdal',repos="http://www.stats.ox.ac.uk/pub/RWin") The Hard Way Download and install GDAL 1.8 Complete and  PROJ framework v4.7.0-2   from: http://www.kyngchaos.com/software/frameworks%29 Download the latest version of rgdal from CRAN.

Read more »

Running SQL Queries in R With the SQLDF Package

October 16, 2011
By

  The sqldf package can be used to run sql queries on R data frames. The user simply needs to specify a sql statement enclosed by quotation marks within the sqldf() function. In the follow R code, you see various ways of using the sqldf package to run sql queries on R data frames. The sql

Read more »

Geo-doodlers – Paul Butler and FlowingData

October 16, 2011
By
Geo-doodlers – Paul Butler and FlowingData

I found this great R-Visualization example via an R-Blogger post that xingmowang made. (One more good reason for why it is important to read lots of field-related blogs!)Here's the image:If this was merely eye-candy, I would have enjoyed it, but not in...

Read more »