Patricia and I have cleaned up some of the R and Bugs code and collected the data for almost all the examples in ARM. See here for links to zip files with the code and data....

Patricia and I have cleaned up some of the R and Bugs code and collected the data for almost all the examples in ARM. See here for links to zip files with the code and data....

Drew Conway, PhD student in NYU's Department of Politics, provides an introduction to mining social graph data from the Internet that focuses on the technical, substantive and ethical concerns related to this type of analysis.

Armadillo Armadillo is a C++ linear algebra library aiming towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matr...

A longer run of the R code of yesterday with a million sudokus produced the following qqplot. It does look ok but no perfect. Actually, it looks very much like the graph of yesterday, although based on a 100-fold increase in the number of simulations. Now, if I test the adequation with a basic chi-square

Update x6 (Jul 27): so I guess people want pitch counts. The data @ MLB seems to only give the pitch count of the end result and the strikes/balls/outs of the particular pitch. Of course you can combine them to get the pitch count. Stupid WordPress comments strip out necessary HTML to properly display code,

Most embarrassingly, Liaosa Xu from Virginia Tech sent the following email almost a month ago and I forgot to reply: I have a question regarding your example 7.11 in your book Introducing Monte Carlo Methods with R. To further decompose the uniform simulation by sampling a and b step by step, how you determine the

More playing around with R. To create the graph above, I sampled 100 times from two different normal distributions, then plotted the ratio of times that the first distribution beat the second one on the y-axis. The second distribution always had a mean of 0, the mean of first distribution went from 0 to 4,

I hadn't heard of the CloudAsia 2010 conference before, but from the programme the workshop Master Class on HPC Application For Life Sciences looked like it was interesting. One workshop session in particular caught my eye: Practical Parallel Computing in R by Xie Chao and Tan Tin Wee (from the National University of Singapore). The workshop notes (PDF) provide...

Developing web-friendly data visualizations is not very difficult, though as far as I know, a package that allows one to do this directly in R does not exist (e-mail me if you know of one). As someone who has been developing lots of data-oriented software tools, it's always nice to post visualizations online. To facilitate

My wife Mary and my Dad Wesley and I took a hike this weekend (5/14/10) to the House Mountain state recreation area in Knox county, Tennessee. The hike was about 3.8 miles with a total elevation gain of around 1000 feet (940.23ft by GPS). The plot below gives the elevation profile over the course of

Robin Ryder pointed out to me that 3 is indeed the absolute minimum one could observe because of the block constraint (bon sang, mais c’est bien sûr !). The distribution of the series of 3 digits being independent over blocks, the theoretical distribution under uniformity can easily be simulated: #uniform distribution on the block diagonal

Romain and I are happy to announce the release of Rcpp version 0.8.0. It has been uploaded to CRAN. A Debian upload is delayed until the now-required inline package is accepted into Debian. The source package is also available from here. This release ...

Romain and I are happy to announce the release of Rcpp version 0.8.0. It has been uploaded to CRAN. A Debian upload is delayed until the now-required inline package is accepted into Debian. The source package is also available from here. This rel...

(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web...

For those not familiar with the major-league baseball in the US (and despite living here for more than 10 years, I still include myself in that category), the games usually played in series: team A visits the home of team B, and the two teams play two or more games against each other on successive days. It's common wisdom...

The R-Sessions are a series of blog entries on using R. A large part consists of an R-manual I once wrote. Other posts include some tricks I found out, as well as entries detailing functions and packages I wrote for ...

Long-time readers of the Stubborn Mule will know that charts are a regular feature here. Almost all of these charts were produced using the R statistical software package which, in my view, produces far superior results to the most commonly used graphing tool: Excel. As a community service to help rid the world of horrible

After thinking about random sudokus for a few more weeks, I eventually came to read the paper by Newton and DeSalvo about the entropy of sudoku matrices. As written earlier, if we consider (as Newton and DeSakvo) a uniform distribution where the sudokus are drawn uniformly over the set of all sudokus, the entropy of

In this post I present a 34-minute video on using R. The video is based on an analysis of 1924 to 2006 Winter Olympic Medals that I presented previously in text form. The video aims to to show what an interactive session in R might look like using ...

In this post I present a 34-minute video on using R. The video is based on an analysis of 1924 to 2006 Winter Olympic Medals that I presented previously in text form. The video aims to to show what an interactive session in R might look like using ...

Reza Seirafi from Virginia Tech sent me the following email about Bayesian Core, which alas is pointing out a real typo in the reversible jump acceptance probability for the mixture model: With respect to the expression provided on page 178 for the acceptance probability of the split move, I was wondering if the omission of

There are situations in regression modelling where robust methods could be considered to handle unusual observations that do not follow the general trend of the data set. There are various packages in R that provide robust statistical methods which are summarised on the CRAN Robust Task View. As an example of using robust statistical estimation in