We recently created an AMI for Amazon's EC2 cloud computing service. Users with AWS accounts can access the public AMI by searching ami-817eb8e8. The AMI is based off of Drew Conway's excellent AMI, but with R 2.13 loaded and RTextTools and

We recently created an AMI for Amazon's EC2 cloud computing service. Users with AWS accounts can access the public AMI by searching ami-817eb8e8. The AMI is based off of Drew Conway's excellent AMI, but with R 2.13 loaded and RTextTools and

At the JSM conference last week, I stopped by a great poster by Steve Salaga and Brian Mills, graduate students at University of Michigan's Department of Sport Management. The guys were clearly hockey fans, and had channelled their enthusiasm for a sport into an interesting statistical analysis of game and player data from the NHL. One analysis, based on...

It is well known that linkage disequilibrium (LD) decays with distance. Several functions have been proposed to estimate such decay. Among the most widely used are the Hill and Weir (1) formula for describing the decay of r2 and a formula proposed by Abecasis (2) for describing the decay of D’. I wrote R functions

The first 6 trading days of August have been bad for the major indices, but how variable is that across portfolios? To answer that, two sets of random portfolios were generated from the constituents of the S&P 500. The trading days are 2011 August 1 — 5 and 8. The returns of the indices for … Continue reading...

For whatever reason, Apple decided not to include OpenMPI in Mac OS X Lion (it was supported in Leopard and Snow Leopard). I found this out the hard way after doing a clean install of Lion. Here are steps to install OpenMPI and get it working with the Rmpi package in R. One benefit of

At last month's R user group meeting in Melbourne, the theme was "Experiences with using SAS and R in insurance and banking". There, Hong Ooi from ANZ (Australia and New Zealand Banking Group) gave a presentation on "Experiences with using R in credit risk". I didn't get to see the presentation myself, but the slides tell a great story...

Followinga few entries on sports here and there, I was wondering what kind of law follow the running records with respect to the distance. The data are available on Wikipedia, or here for a tidied version. It collects 18 distances, from 100 meters to 100 kilometers. A log-log scale is in order: It is nice

The Open Governing Index is a new measure developed by VisionMobile, that rates open-source projects regarding their governance process. The index has four facets, described thoroughly in the "Open Governance Index" publication, and briefly below. access - These criteria assess the availability of source code, a permissive license, developer support mechanisms, a roadmap, and openness

We have been consistently impressed by and enjoyed the wealth of R wisdom available on the R-bloggers aggregation site. Therefore Win-Vector LLC is granting the right to reformat and redistribute (with attribution and link) our blog‘s R content in the R-bloggers site and feeds. We hope to see our R content shared through this network. Related posts:

When dealing with transaction cost analysis, a stock’s volume is assumed to be stable or foreseeable. However, there is different picture, then we are dealing with an illiquid stock. It is relatively easy to forecast the volume of a liquid stock, because trading volume has high autocorrelation – the volumes at t and t+1 are correlated. For

Wikimania 2011 came to a close yesterday. For those of you unfamiliar with Wikimania it may be described as UseR for Wikipedia, Wikimedia and MediaWiki all rolled into one. The conference brings together staff, volunteer editors, volunteer developers and users of MediaWiki projects. Of specific interest to R Bloggers readers may be the sessions on…

Introduction Effect estimation is an important task in modern research. An example is the identification of risk factors for disease and the qualification of medical treatments. Usually, researchers are interested in estimating the global, common effect. Since actual effects tend to differ across populations, estimates based on sample of a particular population seldomly generalize well.

Usability. I am not an expert in Human-Computer Interaction (HCI) at all. Worse, I make the crappiest looking interfaces, typically. So, that's said. Usability. Wikipedia writes that "sability is the ease of use and learnability of a ...

My last two posts have been about mixture models, with examples to illustrate what they are and how they can be useful. Further discussion and more examples can be found in Chapter 10 of Exploring Data in Engineering, the Sciences, and Medicine. One important topic I haven’t covered is how to fit mixture models to datasets like the Old Faithful geyser...

Programmers should definitely know how to use R. I don’t mean they should switch from their current language to R, but they should think of R as a handy tool during development.Again and again I find myself working with Java code like the following. td.linenos { background-color: #f0f0f0; padding-right: 10px; } span.lineno { background-color: #f0f0f0; Related posts:

I got a paper (unavailable online) to referee about testing for the order (i.e. the number of components) of a normal mixture. Although this is an easily spelled problem, namely estimate k in I came to the conclusion that it is a kind of ill-posed problem. Without a clear definition of what a component is,

Revolution Analytics is hosting several hands-on R training classes over the next few months, with in-person instruction from two leading package authors and experts from the R community. Diethelm Würtz from ETH Zurich will give a two-day master class on Portfolio Selection and Optimization in Practice. Prof Würtz leads the Rmetrics project, and will provide in-depth instruction on using...

Several years ago Gerd Gigerenzer wrote: “Statistical rituals largely eliminate statistical thinking in the social sciences. Rituals are indispensable for identification with social groups, but they should be the subject rather than the procedure of science. Statistical rituals largely eliminate … Continue reading →

Ever have a regression model where the coefficients don't make sense? I've been trying to predict electricity and gas consumption from daily activity schedules but a simple linear regression kept saying that demands should go down the more an activity is performed. Fortunately I found the nnls package and show here how you can use it to...

While my time at the 2011 Joint Statistical Meetings was short--I unfortunately missed some presentations I would have like to have attended--it was a great experience. The collection of academics and professionals is very different from the other con...