Super Sam Fuld Needs Your Help (with Foul Ball stats)

July 13, 2011
By

I was pleasantly surprised to have my recreational reading about baseball in the New Yorker interrupted by a digression on statistics. Sam Fuld of the Tampa Bay Rays, was the subjet of a Ben McGrath profile in the 4 July 2011 issue of the New Yorker, in an article titled Super Sam. After quoting a minor-league...

Read more »

A word of warning about grep, which and the like

July 13, 2011
By
A word of warning about grep, which and the like

I’ve often selected columns or rows of a data frame using grep or which, based on some property. That is inherently sound, but the trouble comes when you wish to remove rows or columns based on that grep or which call, e.g., which would remove columns with a .1 in the name. This is fine

Read more »

Plotting git statistics

July 13, 2011
By
Plotting git statistics

Here’s a funny story – friend of my, avid gamer at that time, was going downhill on a bicycle when wonderful idea flashed his mind: I need to save the current status… Just in case if I crash, I will start again from the top of the hill. If you are a developer (quantitative or

Read more »

SAS, R and categorical variables

July 13, 2011
By
SAS, R and categorical variables

One of the disappointing problems in SAS (as I need PROC MIXED for some analysis) is to recode categorical variables to have a particular reference category. In R, my usual tool, this is rather easy both to set and to modify using the  relevel command available in base R (in the stats package). My understanding

Read more »

Measuring the EIU Democracy Index (with Polity IV)

July 12, 2011
By
Measuring the EIU Democracy Index (with Polity IV)

Yet again, I have conjured up an (academically) unusual dataset on democracy! This time it’s the Economist Intelligence Unit’s Democracy Index, a weird little gem.  The dataset is the basis for a paper the Economist publishes every two years.  Because of this biannuality, there is data estimating the “Democratic-ness” of the world’s countries for 2006,

Read more »

A surprising(?) prediction about the S&P 500

July 12, 2011
By
A surprising(?) prediction about the S&P 500

Financial analyst Greg Troccoli was a lone wolf when he predicted in July 2010 that “If the Index held at or above our proprietary support zone (1000.00- 950.00 region), it would eventually trade to a new historical high within 12 - 18 months (July- December 2011 timeframe)”. For reference, the S&P500 all-time high was 1565.15, and it closed...

Read more »

About Fig. 4 of Fagundes et al. (2007)

July 12, 2011
By
About Fig. 4 of Fagundes et al. (2007)

Yesterday, we had a meeting of our EMILE network on statistics for population genetics (in Montpellier) and we were discussing our respective recent advances in ABC model choice. One of our colleagues mentioned the constant request (from referees) to include the post-ABC processing devised by Fagundes et al. in their 2007 ABC paper. (This paper

Read more »

I wish I knew everything about R. I wish I could vectorise in my…

July 12, 2011
By
I wish I knew everything about R. I wish I could vectorise in my…

I wish I knew everything about R. I wish I could vectorise in my sleep. I wish there were perfect R packages out there to solve all my data transformation problems. I wish there were perfect data. If I were Paul Graham, would I ever write code like the...

Read more »

Yet another reason to avoid loops in R

July 12, 2011
By
Yet another reason to avoid loops in R

In some previous posts I have mentioned my struggles with the performance of the computations needed to implement the ARMA strategies in practice. Finally I have found a worthy solution, and as usual, there is a programming pattern to learn from it – avoid loops in R. My first approach was to optimize the algorithms.

Read more »

What is your favorite R feature?

What is your favorite R feature?

R (www.r-project.org) is a free and strongly functional language and environment for statistical computing. You can explore data sets, make graphical displays of data, run statistical simulations and many more. If you never used R you should give it a try! R beginners: There are many courses, slides and tutorials available for R beginners. We

Read more »

RTextTools Improvements Underway

Since RTextTool's unveiling at the 2011 Cap Conference in Catania, the development team has been busy working on refinements to the package. This includes a number of changes to simplify the API, improve analytics, decrease memory use, and increase functionality. We've added support for another low-memory algorithm (GLMNET) in addition to the

Read more »

Drawdown Control Can Also Determine Ending Wealth

July 11, 2011
By
Drawdown Control Can Also Determine Ending Wealth

As an extension to yesterday’s post Just Arriving is Not Enough, I wanted to show how minimizing drawdown is a much better technique to help control comfort and potentially increase ending wealth.  CHTTX was one of the best performers of the fou...

Read more »

R from source

July 11, 2011
By

The following are notes for myself. I like to use the bleeding edge version of R: svn checkout https://svn.r-project.org/R/trunk/ r-devel cd r-devel ./tools/rsync-recommended ## use the following to update sources: svn update ## pre-reqs sudo apt-get build-dep r-base #sudo apt-get install gcc g++ gfortran libreadline-dev libx11-dev xorg-dev #sudo apt-get install texlive texinfo ./configure make sudo... Read more »

In case you missed it: June Roundup

July 11, 2011
By

In case you missed them, here are some articles from June of particular interest to R users. Highlights of presentations from the R/Finance 2011 conference. Trulia uses R and statistical models to map local crime. Resources for data mining with R. K-means clustering on large data sets with the RevoScaleR package. Revolution Analytics' CTO David Champagne writes on real-time...

Read more »

The foundations of Statistics: a simulation-based approach

July 11, 2011
By
The foundations of Statistics: a simulation-based approach

“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be `imperfectly linear’.” page 128 This book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in  areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using

Read more »

Sir Sun Drop

July 11, 2011
By
Sir Sun Drop

Okay so one of my best friends Sir Kris "Wespro" Wesslen has started a new blog and i think it's so hilariously decked out with pompous amounts of hilarity that even a blind and brainless mouse would chuckle out of amusement. Please check it out here. ...

Read more »

XLConnect 0.1-5

July 11, 2011
By
XLConnect 0.1-5

Mirai Solutions GmbH (http://www.mirai-solutions.com) is pleased to announce the availability of XLConnect 0.1-5. This release adds the following new features: Support for setting/getting cell formulas. See methods set/getCellFormula. Support for setting/getting the force formula recalculation flag on worksheets. See methods … Continue reading →

Read more »

The Road to Default: Debt Ratio Comparison’s With Previous Episodes

July 11, 2011
By
The Road to Default: Debt Ratio Comparison’s With Previous Episodes

In 2009, Carmen M. Reinhart and Kenneth S. Rogoff wrote a book titled ,"This Time Is Different" about debt and financial crisis. One of their charts will provide a benchmark for us in our analysis.  This chart can be found on page 121 of the book ...

Read more »

You can scrap it and write something better but let me keep R ;)

July 11, 2011
By

Ross Ikaha (via Xi'an -- thanks ;) ) gives a nice example to show why R is basically impossible to optimize: > f = function() { > if (runif(1) > 0.5) { > x = 10 > } > ...

Read more »

Example 9.2: Transparency and bivariate KDE

July 11, 2011
By
Example 9.2:  Transparency and bivariate KDE

In Example 9.1, we showed a binning approach to plotting bivariate relationships in a large data set. Here we show more sophisticated approaches: transparent overplotting and formal two-dimensional kernel density estimation. We use the 10,000 simulat...

Read more »

Tamino’s Method: Regional Temperatures

July 11, 2011
By
Tamino’s Method: Regional Temperatures

Tamino over at  Open Mind has a new post detailing his approach for calculating temperature averages. See his post here. His method is based on the Berkeley method as he notes and he uses it primarily for calculating regional or local temperature averages. Read his post for the math details behind the approach. I got

Read more »

Creating 3D geographical plots in R using RGL

July 11, 2011
By
Creating 3D geographical plots in R using RGL

I've been playing around with the rgl package in the last week, as part of an ongoing quest to come up with nice-looking (but more importantly, useful) data vizualisations. It's a nice little package, and once you've run through the excell...

Read more »

Testing an S&P 500 prediction

July 10, 2011
By
Testing an S&P 500 prediction

If a particular prediction comes true, how surprised should we be? The prediction The page that sparked my curiosity tells of a prediction made a year ago that the S&P 500 would beat its historic high by the end of 2011.  It says that at the point the prediction was made, the level of the … Continue reading...

Read more »

Reproducible blogging

July 10, 2011
By

As a fact-based blog, the posts here contain very often diagrams and data tables. To enable you to reproduce the results and insights, I include the computations as computer code.Most blogposts I write are markdown text combined (or weaved) with computer code written in the R language. I created a small package mdtools that puts the...

Read more »

Now I’m R-Blogging

July 10, 2011
By

Today a lot of great mails arrived at my inbox. In one of them I was reading I’ve just added your feed to the site. Where did this mail come from? The sender of the email was Tal Galili. He is a researcher in BioStatistics at the Tel Aviv University, very active around the internet.

Read more »

Migrating from SPSS/Excel to R

July 10, 2011
By
Migrating from SPSS/Excel to R

In this post, I give an outline for those interested in migrating from using SPSS and Excel for data processing/analysis …Continue reading »

Read more »

Heatmap tables done better, in Sweave and latex

July 10, 2011
By
Heatmap tables done better, in Sweave and latex

  I wrote before about using heatmap tables to combine the strengths of tables and graphics for nominal data. Here is a neat approach using Sweave and latex to produce an effect like in the picture. This latex code is self-contained. Just save it as myfile.Rnw, run Sweave(myfile.Rnw) from inside R and then pdflatex myfile.tex

Read more »

Heatmap tables done better, in Sweave and latex

July 10, 2011
By
Heatmap tables done better, in Sweave and latex

  I wrote before about using heatmap tables to combine the strengths of tables and graphics for nominal data. Here is a neat approach using Sweave and latex to produce an effect like in the picture. This latex code is self-contain...

Read more »

The Road to Default: Who’s getting the most screwed?

July 10, 2011
By
The Road to Default: Who’s getting the most screwed?

Let's take a look at who gets the most screwed (who loses the most money) when bond prices collapse and the United States defaults.Well until recently only about 55% of treasury's were held domestically. The rest was externally held by places like Japa...

Read more »