An R interface to the Google Prediction API

December 10, 2010
By

An the New York R User Group* last night, 100 R users heard Ni Wang and Max Lin talk explain how "R is one of the important tools used by analysts and engineers at Google for analyzing data". During the talk, Lin revealed that Google plans to make "R more integrated with internal machine learning algorithms and infrastructure", and...

Read more »

An R interface to the Google Prediction API

December 10, 2010
By

An the New York R User Group* last night, 100 R users heard Ni Wang and Max Lin talk explain how "R is one of the important tools used by analysts and engineers at Google for analyzing data". During the talk, Lin revealed that Google plans to make "R more integrated with internal machine learning algorithms and infrastructure", and...

Read more »

Interesting volatility measurement

December 10, 2010
By

Long time ago I stumbled across interesting volatility measurement at quantifiableedges.blogspot.com. The idea is following: take 3-day historical volatility of S&P 500 index and divide that by 10-day historical volatility. Then mark all points which are less that 0.25 and measure the volatility of 3 following days. On average, the volatility of following 3 days

Read more »

R: Basic R Skills – Splitting and Plotting

December 10, 2010
By
R: Basic R Skills – Splitting and Plotting

I am giving a short R course next year, so I am going to make a series of blog posts to help get my thoughts and example code in order. The aim is to introduce people with little or no experience of R to the language with self contained examp...

Read more »

R: Basic R Skills – Splitting and Plotting

December 10, 2010
By
R: Basic R Skills – Splitting and Plotting

I am giving a short R course next year, so I am going to make a series of blog posts to help get my thoughts and example code in order. The aim is to introduce people with little or no experience of R to the language with self contained examp...

Read more »

Once again, chart critics and graph gurus welcome

December 10, 2010
By
Once again, chart critics and graph gurus welcome

HOW TO DISPLAY A LINE PLOT WITH COUNT INFORMATION? In a previously-mentioned paper Sharad and your DSN editor are writing up, there is the above line plot with points. The area of each point shows the count of observations. It’s done in R with ggplot2 (hooray for Hadley). We generally like this type of plot,

Read more »

Truly random [again]

December 9, 2010
By
Truly random [again]

“The measurement outputs contain at the 99% confidence level 42 new random bits. This is a much stronger statement than passing or not passing statistical tests, which merely indicate that no obvious non-random patterns are present.” arXiv:0911.3427 As often, I bought La Recherche in the station newsagent for the wrong reason! The cover of the

Read more »

Illustrating CFAs – Graphviz

December 9, 2010
By
Illustrating CFAs – Graphviz

So after yesterdays post you probably ran this fancy new confirmatory factor analysis (CFA) – showed your friends all the cool fit stats and… nothing. As important as doing things right is being able to let others know that. For CFA the method of choice to illustrate the connections between variables are path diagrams these

Read more »

Choosing colors for your charts with RColorBrewer

December 9, 2010
By
Choosing colors for your charts with RColorBrewer

If you're creating a bar chart in R, how do you decide what colors the bars should be? Or if you're creating an image plot, what range of images should you use? The colors you choose can not only affect the viewer's interpretation of the graphic, it can also determine its aesthetic appeal, too. That's where the RColorBrewer package...

Read more »

Choosing colors for your charts with RColorBrewer

December 9, 2010
By
Choosing colors for your charts with RColorBrewer

If you're creating a bar chart in R, how do you decide what colors the bars should be? Or if you're creating an image plot, what range of images should you use? The colors you choose can not only affect the viewer's interpretation of the graphic, it can also determine its aesthetic appeal, too. That's where the RColorBrewer package...

Read more »

Learning R

December 9, 2010
By
Learning R

I have had to be primarily self taught in R and I still have a long way to go.  I like R way better than SAS but the documentation in SAS is way better (that's what happens when you pay people to do it full time).  However, there are innumera...

Read more »

New version of solaR (0.21)

December 9, 2010
By
New version of solaR (0.21)

The version 0.21 of the solaR package is now available at CRAN. This package provides a set of calculation methods of solar radiation and performance of photovoltaic systems. The package has been uploaded to CRAN under the GPL-3 license. solaR is now able to calculate from both daily and sub-daily irradiation values. Besides, there are

Read more »

All together now – Confirmatory Factor Analysis in R

December 8, 2010
By

Describing multivariate data is not easy. Especially, if you think that statisticians have not developed any new tools after the ANOVA and principal component analysis (PCA). For social and experimental scientists the most important new technique are structural equation models that combine measurement models (that substitute reliability analysis and PCA) and structural models (that substitute

Read more »

Slides from Revolution R: 100% R and More

December 8, 2010
By

If you missed today's webcast on Revolution R Enterprise: 100% R and more, the slides from the presentation are now available for download, and a replay of the webcast (in WMV format) will be available at that same link very soon. And if you missed some of the links I mentioned in the presentation, here they are for your...

Read more »

Slides from Revolution R: 100% R and More

December 8, 2010
By

If you missed today's webcast on Revolution R Enterprise: 100% R and more, the slides from the presentation are now available for download, and a replay of the webcast (in WMV format) will be available at that same link very soon. And if you missed some of the links I mentioned in the presentation, here they are for your...

Read more »

Interesting Posts at Rational Past Time Related to My Previous Strike Zone Map Post

December 8, 2010
By
Interesting Posts at Rational Past Time Related to My Previous Strike Zone Map Post

J-Doug at Rational Pastime has some cool posts looking at umpire strike zones at his site (and cross-posted at Beyond the Boxscore). I was curious about this issue as well with some work I've been doing here in the office (which I'll refrain from talk...

Read more »

New paper: Survival analysis

December 8, 2010
By
New paper: Survival analysis

Each year I try to carry out some statistical consultancy to give me experience in other areas of statistics and also to provide teaching examples. Last Christmas I was approached by a paediatric consultant from the RVI who wanted to carry out prospective survival analysis. The consultant, Bruce  Jaffray, had performed Nissen fundoplication surgery on

Read more »

cumsum ( rnorm(50), lend="butt", lwd=12, type="h" ) Cumulative…

December 8, 2010
By
cumsum ( rnorm(50), lend="butt", lwd=12, type="h" )
Cumulative…

cumsum ( rnorm(50), lend="butt", lwd=12, type="h" ) Cumulative sum of 50 draws from a normal distribution. File this under mysteries of the Central Limit Theorem.

Read more »

Fantasy football (oops, soccer)

December 8, 2010
By
Fantasy football (oops, soccer)

Recently a colleague asked if I could use R/statistics to form a dream soccer team from a pool of soccer players, given basic player information like name, club, cost, points.The idea is to form a team with your preferred configuration of number of def...

Read more »

Fantasy football (oops, soccer)

December 8, 2010
By
Fantasy football (oops, soccer)

Recently a colleague asked if I could use R/statistics to form a dream soccer team from a pool of soccer players, given basic player information like name, club, cost, points.The idea is to form a team with your preferred configuration of number of def...

Read more »

R: Using RColorBrewer to colour your figures in R

December 8, 2010
By
R: Using RColorBrewer to colour your figures in R

RColorBrewer is an R packages that uses the work from http://colorbrewer2.org/ to help you choose sensible colour schemes for figures in R. For example if you are making a boxplot with eight boxes, what colours would you use, or if you are drawing...

Read more »

R: Using RColorBrewer to colour your figures in R

December 8, 2010
By
R: Using RColorBrewer to colour your figures in R

RColorBrewer is an R packages that uses the work from http://colorbrewer2.org/ to help you choose sensible colour schemes for figures in R. For example if you are making a boxplot with eight boxes, what colours would you use, or if you are drawing...

Read more »

Google AI Challenge: Scores/Rank by Language

December 8, 2010
By
Google AI Challenge: Scores/Rank by Language

A quick follow up to the previous post: about the the scores in the 2010 Google AI competition relative to programming language.  The chart above makes each language visible and discrete - and the scales are the same.library(ggplot2)df<- read.c...

Read more »

Google AI Challenge: Scores/Rank by Language

December 8, 2010
By
Google AI Challenge: Scores/Rank by Language

A quick follow up to the previous post: about the the scores in the 2010 Google AI competition relative to programming language.  The chart above makes each language visible and discrete - and the scales are the same.library(ggplot2)df<- read.c...

Read more »

inline 0.3.8

December 7, 2010
By

Romain pushed verion 0.3.8 of inline to CRAN earlier today, and I just updated the Debian package. This version adds an internal performance enhancement which is obtained by making due with fewer reads. The short NEWS file entry follows: 0.3.8 2...

Read more »

Big Data Logistic Regression with R and ODBC

December 7, 2010
By
Big Data Logistic Regression with R and ODBC

Recently I've been doing a lot of work with predictive models using logistic regression.  Logistic regression is great for determing probable outcomes of a independent binary target variable.  R is a great tool for accomplishing this task.&nb...

Read more »

Big Data Logistic Regression with R and ODBC

December 7, 2010
By
Big Data Logistic Regression with R and ODBC

Recently I've been doing a lot of work with predictive models using logistic regression.  Logistic regression is great for determing probable outcomes of a independent binary target variable.  R is a great tool for accomplishing this task.&nb...

Read more »

R Workflow

December 7, 2010
By
R Workflow

When working with R you end up using a large number of datasets, packages, functions, objects, output files, workspaces, etc.  It can get a bit overwhelming trying to keep everything organized.  That is why a consistent, well-organized workf...

Read more »

Bayesian model selection

December 7, 2010
By
Bayesian model selection

Last week, I received a box of books from the International Statistical Review, for reviewing them. I thus grabbed the one whose title was most appealing to me, namely Bayesian Model Selection and Statistical Modeling by Tomohiro Ando. I am indeed interested in both the nature of testing hypotheses or more accurately of assessing models,

Read more »