ICD code – search looping

July 15, 2011
By
ICD code – search looping

Following on from my earlier post on creating a table of ICD codes in R, here is how I am currently counting these codes and storing the codes in a dataframe: Firstly create a dataframe to store the results in: hosp_count <- as.data.frame(matrix(ncol=length(icd_codes))) names(hosp_count) <- names(icd_codes) Counting Occurences: Then start to loop through your dataset with

Read more »

One-liners which make me love R: Make your data dance (Hans Rosling style) with googleVis #rstats

July 14, 2011
By
One-liners which make me love R: Make your data dance (Hans Rosling style) with googleVis #rstats

This inaugural post in my "one-liners which make me love R" series highlights the googleVis package which makes it easy to use the Google Visualization API from R. Thanks to googleVis, just one line of R generates the 165 lines of HTML and (mostly) JavaScript required to create a Hans Rosling-style motion chart for some sample data.

Read more »

MCMC and faster Gibbs Sampling using Rcpp

July 14, 2011
By

Sanjog Misra, who uses Rcpp for Monte Carlo Markov Chain (MCMC) analyses in quantitative marketing, kindly set me a short example of Rcpp use.The example is based on a blog post by Darren Wilkinson which itself discusses and compares the suitabilit...

Read more »

What is your favorite R feature? (part 2)

What is your favorite R feature? (part 2)

This week in our blog we started a list of great R code (www.r-project.org) snippets: http://cloudnumbers.com/what-is-your-favorite-r-feature We are going to extend this list with several more nice R features. Please feel free to add comments with your favorite R code snippets. Descriptive statistics: A huge set of tools to describe and explore data is available

Read more »

Revolution Newsletter: July 2011

July 14, 2011
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you read the full July edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Beta Test Revolution R Enterprise 5.0. Are you running R in a Microsoft environment? Revolution...

Read more »

Sam Fuld, Bob Carpenter, and Statistical Inference Blog

July 14, 2011
By
Sam Fuld, Bob Carpenter, and Statistical Inference Blog

Here is a quick post responding to a request by Bob Carpenter at one of my favorite nerd blogs: Statistical Modeling, Causal Inference and Social Science. While a lot of the Bayesian theory is out of my league, Dr. Gelman really makes you think about ...

Read more »

CRdata vs. Cloudnumbers

July 14, 2011
By
CRdata vs. Cloudnumbers

Cloudnumbers and CRdata are two new cloud computing services.I tested the two services with a very simple script. The script simply creates a dataframe of 10000 numbers via rnorm, and assigns them to a factor of one of two levels (a or b). I then take ...

Read more »

R meets XKCD

July 14, 2011
By
R meets XKCD

Being a big fan of XKCD and, of course, of the R programming language, I thought that a package which allows to display my favorite strips  would something (useless) but cool! So, mimicking the approach (and the code) of the fortunes pac...

Read more »

ICD codes – Analysing hospitilisations

July 14, 2011
By
ICD codes – Analysing hospitilisations

A brief first post on what I hope will be a series of posts on analysing hospitilisation data, which is recorded using ICD codes (International Statistical Classification of Diseases and Related Health Problems) Initially here is an R file. This can be read in and will create a list, 218 long, forming groupings using sub

Read more »

More Thoughts on US Death Spiral

July 13, 2011
By
More Thoughts on US Death Spiral

What troubles me most about today’s environment is the persistent belief that crisis large or small results in a US dollar rally and lower Treasury rates. However, what happens if the US dollar and US Treasury rates are the source of the crisis? Then...

Read more »

Paul Murrell on Incorporating Images in R Charts

July 13, 2011
By
Paul Murrell on Incorporating Images in R Charts

Thanks to everyone at who attended last night's Bay Area R User Group meeting, and a special thanks to our hosts Socialize (a company that makes a mobile SDK for application developers that increases user engagement) who were very generous in letting the group use their San Francisco digs for the meeting. Reflexive thanks also go to the Revolution...

Read more »

tracking Australian election betting markets again (now with sparklines)

July 13, 2011
By

The header of my blog (above) shows the latest prices on offer in some of Australia’s election betting markets.  I convert the prices to an implied probability of ALP win (factoring out the bookie’s profit margin, the so-called “overround”). I’m using some Javascript by John Resig to make Tufte-ish sparklines, although the Google version of sparklines

Read more »

Super Sam Fuld Needs Your Help (with Foul Ball stats)

July 13, 2011
By

I was pleasantly surprised to have my recreational reading about baseball in the New Yorker interrupted by a digression on statistics. Sam Fuld of the Tampa Bay Rays, was the subjet of a Ben McGrath profile in the 4 July 2011 issue of the New Yorker, in an article titled Super Sam. After quoting a minor-league...

Read more »

A word of warning about grep, which and the like

July 13, 2011
By
A word of warning about grep, which and the like

I’ve often selected columns or rows of a data frame using grep or which, based on some property. That is inherently sound, but the trouble comes when you wish to remove rows or columns based on that grep or which call, e.g., which would remove columns with a .1 in the name. This is fine

Read more »

Plotting git statistics

July 13, 2011
By
Plotting git statistics

Here’s a funny story – friend of my, avid gamer at that time, was going downhill on a bicycle when wonderful idea flashed his mind: I need to save the current status… Just in case if I crash, I will start again from the top of the hill. If you are a developer (quantitative or

Read more »

SAS, R and categorical variables

July 13, 2011
By
SAS, R and categorical variables

One of the disappointing problems in SAS (as I need PROC MIXED for some analysis) is to recode categorical variables to have a particular reference category. In R, my usual tool, this is rather easy both to set and to modify using the  relevel command available in base R (in the stats package). My understanding

Read more »

Measuring the EIU Democracy Index (with Polity IV)

July 12, 2011
By
Measuring the EIU Democracy Index (with Polity IV)

Yet again, I have conjured up an (academically) unusual dataset on democracy! This time it’s the Economist Intelligence Unit’s Democracy Index, a weird little gem.  The dataset is the basis for a paper the Economist publishes every two years.  Because of this biannuality, there is data estimating the “Democratic-ness” of the world’s countries for 2006,

Read more »

A surprising(?) prediction about the S&P 500

July 12, 2011
By
A surprising(?) prediction about the S&P 500

Financial analyst Greg Troccoli was a lone wolf when he predicted in July 2010 that “If the Index held at or above our proprietary support zone (1000.00- 950.00 region), it would eventually trade to a new historical high within 12 - 18 months (July- December 2011 timeframe)”. For reference, the S&P500 all-time high was 1565.15, and it closed...

Read more »

About Fig. 4 of Fagundes et al. (2007)

July 12, 2011
By
About Fig. 4 of Fagundes et al. (2007)

Yesterday, we had a meeting of our EMILE network on statistics for population genetics (in Montpellier) and we were discussing our respective recent advances in ABC model choice. One of our colleagues mentioned the constant request (from referees) to include the post-ABC processing devised by Fagundes et al. in their 2007 ABC paper. (This paper

Read more »

I wish I knew everything about R. I wish I could vectorise in my…

July 12, 2011
By
I wish I knew everything about R. I wish I could vectorise in my…

I wish I knew everything about R. I wish I could vectorise in my sleep. I wish there were perfect R packages out there to solve all my data transformation problems. I wish there were perfect data. If I were Paul Graham, would I ever write code like the...

Read more »

Yet another reason to avoid loops in R

July 12, 2011
By
Yet another reason to avoid loops in R

In some previous posts I have mentioned my struggles with the performance of the computations needed to implement the ARMA strategies in practice. Finally I have found a worthy solution, and as usual, there is a programming pattern to learn from it – avoid loops in R. My first approach was to optimize the algorithms.

Read more »

What is your favorite R feature?

What is your favorite R feature?

R (www.r-project.org) is a free and strongly functional language and environment for statistical computing. You can explore data sets, make graphical displays of data, run statistical simulations and many more. If you never used R you should give it a try! R beginners: There are many courses, slides and tutorials available for R beginners. We

Read more »

RTextTools Improvements Underway

Since RTextTool's unveiling at the 2011 Cap Conference in Catania, the development team has been busy working on refinements to the package. This includes a number of changes to simplify the API, improve analytics, decrease memory use, and increase functionality. We've added support for another low-memory algorithm (GLMNET) in addition to the

Read more »

Drawdown Control Can Also Determine Ending Wealth

July 11, 2011
By
Drawdown Control Can Also Determine Ending Wealth

As an extension to yesterday’s post Just Arriving is Not Enough, I wanted to show how minimizing drawdown is a much better technique to help control comfort and potentially increase ending wealth.  CHTTX was one of the best performers of the fou...

Read more »

R from source

July 11, 2011
By

The following are notes for myself. I like to use the bleeding edge version of R: svn checkout https://svn.r-project.org/R/trunk/ r-devel cd r-devel ./tools/rsync-recommended ## use the following to update sources: svn update ## pre-reqs sudo apt-get build-dep r-base #sudo apt-get install gcc g++ gfortran libreadline-dev libx11-dev xorg-dev #sudo apt-get install texlive texinfo ./configure make sudo... Read more »

In case you missed it: June Roundup

July 11, 2011
By

In case you missed them, here are some articles from June of particular interest to R users. Highlights of presentations from the R/Finance 2011 conference. Trulia uses R and statistical models to map local crime. Resources for data mining with R. K-means clustering on large data sets with the RevoScaleR package. Revolution Analytics' CTO David Champagne writes on real-time...

Read more »

The foundations of Statistics: a simulation-based approach

July 11, 2011
By
The foundations of Statistics: a simulation-based approach

“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be `imperfectly linear’.” page 128 This book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in  areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using

Read more »

Sir Sun Drop

July 11, 2011
By
Sir Sun Drop

Okay so one of my best friends Sir Kris "Wespro" Wesslen has started a new blog and i think it's so hilariously decked out with pompous amounts of hilarity that even a blind and brainless mouse would chuckle out of amusement. Please check it out here. ...

Read more »

XLConnect 0.1-5

July 11, 2011
By
XLConnect 0.1-5

Mirai Solutions GmbH (http://www.mirai-solutions.com) is pleased to announce the availability of XLConnect 0.1-5. This release adds the following new features: Support for setting/getting cell formulas. See methods set/getCellFormula. Support for setting/getting the force formula recalculation flag on worksheets. See methods … Continue reading →

Read more »