Email Previously I looked at how to simulate Gaussian processes in R, following the methods in Rasmussen and Williams. But now that Andrew Gelman et al. (of

I was recently contacted by a reader with two very specific questions and I thought that this would be a good topic to publicity respond to. He would like to simulate his data:I have firm level data and the model is discrete choice with the main expla...

Here's what I came up with to compare word counts in two pieces of text. If you got any idea, I'd love to learn about alternatives!## a function that compares word counts in two textswordcount ...

The most recent edition of the Revolution Newsletter is now available. In case you missed it, the news section is below, and you can read the full August edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. What is R? Has anyone ever asked you,...

Except for maybe the t test, a contender for the title “most used and abused statistical test” is Pearson’s correlation test. Whenever someone wants to check if two variables relate somehow it is a safe bet (at least in psychology) that the first thing to be tested is the strength of a Pearson’s correlation. Only if that doesn’t...

A couple of days ago, I gave a talk to the Chicago R Users Group which is run ever-so-smoothly by Paul Teetor and Chase Carpenter. The talk provided a brief introduction to Rcpp for R and C++ integration. Slides are now up on my talks / presentation...

The CoCo Matrix (correlation coefficient matrix) is a script for R that takes a table headed with multiple variables and calculates the correlation coefficients between each of the variables, determines which are statistically significant, and represents them visually in a grid-plot. I created the CoCo Matrix to cross correlate a table with a large number of

STAN is a new system for Bayesian inference, similar to BUGS and JAGS. I’ve played with it a bit and it’s quite promising, it really has the potential to make MCMC less of a pain (on simple models). I’ve written a short introduction to fitting psychometric functions using STAN and R, in case that’s useful

A couple of days ago, I gave a talk to the Chicago R Users Group which is run ever-so-smoothly by Paul Teetor and Chase Carpenter. The talk provided a brief introduction to Rcpp for R and C++ integration. Slides are now up on my talks / presentation...

Over the past number of years, I have noted that spatial econometric methods have been gaining popularity. This is a welcome trend in my opinion, as the spatial structure of data is something that should be explicitly included in the empirical modelling procedure. Omitting spatial effects assumes that the location co-ordinates for observations are unrelated

Maximum-Likelihood Estimation (MLE) is a statistical technique for estimating model parameters. It basically sets out to answer the question: what model parameters are most likely to characterise a given set of data? First you need to select a model for the data. And the model must have one or more (unknown) parameters. As the name

I want to build a bit more experience in REML, so I decided to redo some of the SAS examples in R. This post describes the results of example 59.1 (page 5001, SAS(R)/STAT User guide 12.3 link). Following the list from freshbiostats I will analyze ...

For some reason, authors occasionally present linear model results with vague or unintelligible interaction effects. One way to be vague when presenting interaction effects is to provide only a table of model coefficients, including no information on the range of covariate values observed, and no plots to aid in interpretation. Here’s an example: Suppose you have discovered a statistically significant...

I’ve seen some creative visualisations of issues surrounding the Australian election recently though not as many maps as I expected. ‘ggplot2′ is the go-to package for plotting in R so I thought I’d see if I could plot the Australian electoral divisions with ggplot2. By using the Australian Electoral Commission’s GIS mapping coordinates and mutilating

In anticipation of a new R library from School of Data data diva @mihi_tr that will wrap the OpenSpending API and providing access to OpenSpending.org data directly from within R, I thought I’d start doodling around some ideas raised in Identifying Pieces in the Spending Data Jigsaw. In particular, common payment values, repayments/refunds and “balanced

By popular demand, I updated the Fantasy Football Draft Optimizer shiny app with two changes: The app now takes into account how many teams are in your league when estimating The post Update to Fantasy Football Draft Optimizer shiny app appeared first on Fantasy Football Analytics.

I recently attended ScienceOnline Climate, a conference in Washington, D.C. at AAAS. You may have heard of the ScienceOnline annual meeting in North Carolina - this was one of their topical meetings focused on Climate Change. I moderated a session on working with data from the web in R, focusing on climate data. Search Twitter for...

Hello, today I'm going to show you the difference of using two different common performance measures (useful not only for Machine Learning purposes, is useful in every scientific field). Until now, I have found more the accuracy values than F scores in...

If, like me, you've ever had a sandwich from a dubious deli and then been laid up for days afterwards, you know that food poisoning is no trifling matter. In the past, local authorities would only ever learn of such public health issues if they get reported to the authorities by the victim (or the victim's doctor). But that...

In Chapter 11, equivocal zones were briefly discussed. The idea is that some classification errors are close to the probability boundary (i.e. 50% for two class outcomes). If this is the case, we can create a zone where we the samples are predicted as "equivocal" or "indeterminate" instead of one of the class levels. This only works if the...

A new minor release 3.910.0 of Armadillo came out a few days ago. A new RcppArmadillo release 0.3.910.0 was provided rightaway, and after a brief back-and-forth with CRAN (mostly having to do with the non-standard vignette corresponding to our CSD...

To cut a long story short, I always wanted to write professional-looking documents (technical reports and potentially my thesis) with R codes. No more copy and paste. No more Microsoft Word. At the same time, I don't feel comfortable with LaTeX. Somehow I found a workaround with knitr, xtable, R Markdown...

The 2011 Census Open Atlas project has been put on hold recently as various other research projects have intervened – more on these soon. However, over the summer Chris Brunsdon and I have taken a research trip to Ritsumeikan University (Japan) where we visited Keiji Yano and Tomoki Nakaya. As part of this trip I began developing a census atlas for