Updated R code and data for ARM

May 19, 2010
By

Patricia and I have cleaned up some of the R and Bugs code and collected the data for almost all the examples in ARM. See here for links to zip files with the code and data....

Read more »

Mining and Analyzing Online Social Graph Data

May 19, 2010
By

Drew Conway, PhD student in NYU's Department of Politics, provides an introduction to mining social graph data from the Internet that focuses on the technical, substantive and ethical concerns related to this type of analysis.

Read more »

Random [uniform?] sudokus [corrected]

May 19, 2010
By
Random [uniform?] sudokus [corrected]

As the discrepancy in the sum of the nine probabilities seemed too blatant to be attributed to numerical error given the problem scale, I went and checked my R code for the probabilities and found a choose(9,3) instead of a choose(6,3) in the last line… The fit between the true distribution and the

Read more »

RcppArmadillo 0.2.1

May 19, 2010
By

Armadillo Armadillo is a C++ linear algebra library aiming towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matr...

Read more »

Random [uniform?] sudokus

May 19, 2010
By
Random [uniform?] sudokus

A longer run of the R code of yesterday with a million sudokus produced the following qqplot. It does look ok but no perfect. Actually, it looks very much like the graph of yesterday, although based on a 100-fold increase in the number of simulations. Now, if I test the adequation with a basic chi-square

Read more »

LSPM Joint Probability Tables

May 18, 2010
By
LSPM Joint Probability Tables

I've received several requests for methods to create joint probability tables for use in LSPM's portfolio optimization functions.  Rather than continue to email this example to individuals who ask, I post it here in hopes they find it via a Google...

Read more »

MLB Baseball Pitching Matchups ~ downloading pitch f/x data using the XML package in R [updatedx6]

May 18, 2010
By
MLB Baseball Pitching Matchups ~ downloading pitch f/x data using the XML package in R [updatedx6]

Update x6 (Jul 27): so I guess people want pitch counts. The data @ MLB seems to only give the pitch count of the end result and the strikes/balls/outs of the particular pitch. Of course you can combine them to get the pitch count. Stupid WordPress comments strip out necessary HTML to properly display code,

Read more »

robot (SPX) DNA Management Techniques

May 18, 2010
By
robot (SPX) DNA Management Techniques

Yes, this is related to trading, but no, it is not my thesis on why the Euro is going to parity. Instead, it is sort of a workshop for robot(SPX) developers on how to organize their digital DNA. As you begin to use programming as a money extraction tool on the markets, you'll soon find...

Read more »

Confusing slice sampler

May 18, 2010
By
Confusing slice sampler

Most embarrassingly, Liaosa Xu from Virginia Tech sent the following email almost a month ago and I forgot to reply: I have a question regarding your example 7.11 in your book Introducing Monte Carlo Methods with R.  To further decompose the uniform simulation by sampling a and b step by step, how you determine the

Read more »

R: Dueling normals

May 18, 2010
By
R: Dueling normals

More playing around with R. To create the graph above, I sampled 100 times from two different normal distributions, then plotted the ratio of times that the first distribution beat the second one on the y-axis. The second distribution always had a mean of 0, the mean of first distribution went from 0 to 4,

Read more »

Parallel Computing with R for Life Sciences

May 18, 2010
By

I hadn't heard of the CloudAsia 2010 conference before, but from the programme the workshop Master Class on HPC Application For Life Sciences looked like it was interesting. One workshop session in particular caught my eye: Practical Parallel Computing in R by Xie Chao and Tan Tin Wee (from the National University of Singapore). The workshop notes (PDF) provide...

Read more »

Prototype: Web-Friendly Visualizations in R

May 18, 2010
By

Developing web-friendly data visualizations is not very difficult, though as far as I know, a package that allows one to do this directly in R does not exist (e-mail me if you know of one). As someone who has been developing lots of data-oriented software tools, it's always nice to post visualizations online. To facilitate

Read more »

JAGS 2.1.0 and rjags 2.1.0 are released

May 17, 2010
By
JAGS 2.1.0 and rjags 2.1.0 are released

JAGS 2.1.0 is now available from Sourceforge.  You will find the source as well as binary packages for Windows and Mac OS X. Binary packages for Debian are available through the usual Debian channels, and packages for RPM-based Linux distributions … Continue reading →

Read more »

House Mountain Hike

May 17, 2010
By
House Mountain Hike

My wife Mary and my Dad Wesley and I took a hike this weekend (5/14/10) to the House Mountain state recreation area in Knox county, Tennessee. The hike was about 3.8 miles with a total elevation gain of around 1000 feet (940.23ft by GPS). The plot below gives the elevation profile over the course of

Read more »

Random sudokus [test]

May 17, 2010
By
Random sudokus [test]

Robin Ryder pointed out to me that 3 is indeed the absolute minimum one could observe because of the block constraint (bon sang, mais c’est bien sûr !). The distribution of the series of 3 digits being independent over blocks, the theoretical distribution under uniformity can easily be simulated: #uniform distribution on the block diagonal

Read more »

Rcpp 0.8.0

May 17, 2010
By

Romain and I are happy to announce the release of Rcpp version 0.8.0. It has been uploaded to CRAN. A Debian upload is delayed until the now-required inline package is accepted into Debian. The source package is also available from here. This release ...

Read more »

Rcpp 0.8.0

Romain and I are happy to announce the release of Rcpp version 0.8.0. It has been uploaded to CRAN. A Debian upload is delayed until the now-required inline package is accepted into Debian. The source package is also available from here. This rel...

Read more »

Lambda Distribution

May 17, 2010
By

(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web...

Read more »

Winning the first game in a baseball series: a harbinger, or not?

May 17, 2010
By
Winning the first game in a baseball series: a harbinger, or not?

For those not familiar with the major-league baseball in the US (and despite living here for more than 10 years, I still include myself in that category), the games usually played in series: team A visits the home of team B, and the two teams play two or more games against each other on successive days. It's common wisdom...

Read more »

Example 7.37: calculation of Hotelling’s T^2

May 17, 2010
By
Example 7.37: calculation of Hotelling’s T^2

Hotelling's T^2 is a multivariate statistic used to compare two groups, where multiple outcomes are observed for each subject. Here we demonstrate how to calculate Hotelling's T^2 using R and SAS, and test the code using a simulation study then apply ...

Read more »

Index of the R-Sessions

May 17, 2010
By

The R-Sessions are a series of blog entries on using R. A large part consists of an R-manual I once wrote. Other posts include some tricks I found out, as well as entries detailing functions and packages I wrote for ...

Read more »

Hitting the Big Data Ceiling in R

May 16, 2010
By
Hitting the Big Data Ceiling in R

As a true R fan, I like to believe that R can do anything, no matter how big, how small or how complicated: there is some way to do it in R. I decided to approach my large, sparse matrix problem with this attitude. But here I sit a broken man. There is no “native” big data support built into...

Read more »

Graphing using R

May 16, 2010
By
Graphing using R

Long-time readers of the Stubborn Mule will know that charts are a regular feature here. Almost all of these charts were produced using the R statistical software package which, in my view, produces far superior results to the most commonly used graphing tool: Excel. As a community service to help rid the world of horrible

Read more »

Random sudokus

May 16, 2010
By
Random sudokus

After thinking about random sudokus for a few more weeks, I eventually came to read the paper by Newton and DeSalvo about the entropy of sudoku matrices. As written earlier, if we consider (as Newton and DeSakvo) a uniform distribution where the sudokus are drawn uniformly over the set of all sudokus, the entropy of

Read more »

A 34 Minute Video on Using R to Analyse Winter Olympic Medal Data

May 16, 2010
By

In this post I present a 34-minute video on using R. The video is based on an analysis of 1924 to 2006 Winter Olympic Medals that I presented previously in text form. The video aims to to show what an interactive session in R might look like using ...

Read more »

A 34 Minute Video on Using R to Analyse Winter Olympic Medal Data

May 16, 2010
By

In this post I present a 34-minute video on using R. The video is based on an analysis of 1924 to 2006 Winter Olympic Medals that I presented previously in text form. The video aims to to show what an interactive session in R might look like using ...

Read more »

Emulating Internet Traffic in Load Tests

May 15, 2010
By
Emulating Internet Traffic in Load Tests

One of the recurring questions in the GCaP class last week was: How can we make web-application load tests more representative of real Internet traffic? The sticking point is that conventional load-test simulators like LoadRunner, JMeter, and httperf, ...

Read more »

Typo in Bayesian Core [again]

May 15, 2010
By
Typo in Bayesian Core [again]

Reza Seirafi from Virginia Tech sent me the following email about Bayesian Core, which alas is pointing out a real typo in the reversible jump acceptance probability for the mixture model: With respect to the expression provided on page 178 for the acceptance probability of the split move, I was wondering if the omission of

Read more »

Linear regression models with robust parameter estimation

May 15, 2010
By

There are situations in regression modelling where robust methods could be considered to handle unusual observations that do not follow the general trend of the data set. There are various packages in R that provide robust statistical methods which are summarised on the CRAN Robust Task View. As an example of using robust statistical estimation in

Read more »