Spearman’s Rho Rank Correlation There are generally three types of correlation that a researcher may encounter: Pearson’s r, Kendall’s Tau, and Spearman’s Rho. They each have their own uses and applications depending on the da...

Spearman’s Rho Rank Correlation There are generally three types of correlation that a researcher may encounter: Pearson’s r, Kendall’s Tau, and Spearman’s Rho. They each have their own uses and applications depending on the da...

It's a wonderful thing when people make interesting data sets available to the public. When Thomas Jones wrote a paper in Econometrics about the growth of US retail giant Walmart, he made the data he collected about every Walmart store opening in history (location and date) available to the public. Since then, several people have used different techniques to...

Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language The post A...

So I was trying to figure out a fast way to make matrices with randomly allocated 0 or 1 in each cell of the matrix. I reached out on Twitter, and got many responses (thanks tweeps!). Here is the solution I came up with. See if you can tell why it...

Rather belatedly, I got around to posting a series of posts summarising the Formula One season to date: F1 2012 Mid-Season Review – Grid/Classification Analysis: for example, how do the drivers’ grid and final classifications compare? F1 2012 Mid-Season Review – Pit Stops: for example, how does pit stop performance across the teams compare? F1

I have resisted learning the popular R graphics package, ggplot2. I dismissed ggplot2 as primarily useful for exploratory graphics and rationalized my avoidance of ggplot2 by assuming that it would require just as many (or more) lines of code as the R base package to whip the default plots into publication-quality figures. The few times

By Earl F Glynn | Franklin Center A comparison of US Census voting age population data in Virginia to voter registration data shows only one locality, Surry County, with about 100% of the voting age population registered to vote. Six other localities have about 95% of their voting age population registered: Craig County, Isle of

You've probably heard (or seen in TV shows) how the unique pattern of rifling in a gunbarrel generates forensic evidence: microscopic scoring on the bullets left at the scene of the crime can be linked to the shooter by matching the marks to the firearm. What you might not know is that the same technique can be applied to...

After a sunny weekend to unpack and unwind, I am now back to my normal schedule, on my way to Paris-Dauphine for an R (second-chance) exam. Except for confusing my turn signal for my wiper, thanks to two weeks of intensive driving in four Australian states!, things are thus back to “normal”, meaning that I

A proper introduction to the soilDB package is now available here. Installation and basic usage are covered. More detailed, task-specific documentation on aqp and soilDB will be available soon. read more

I’m sure that Carl Bacon sighs deeply when he reads such headlines, but it is clearly appropriate in this case. Perhaps you remember that I proposed a Google Summer of Code project for 2012 around a considerable code contribution to PerformanceAnalytics from Diethelm Wuertz at ETHZ. That code was focused on adding a large number

I work in an environment dominated by SAS, and I am looking to integrate R into our environment. Why would I want to do such a thing? First, I do not want to get rid of SAS. That would not only take away most of our investment in SAS training and hiring good quality SAS programmers, but...

RStudio and knitr are an excellent conbination for generating dynamic reports. But in this blog, I will show you how to generate HTML-style presentaion using R only. OK, I confess that we still need something else: deck.js and markdown and R.utils. ...

R Packages All Well maintained? There are so many R packages, can they all be trusted? or are they well maintained? To answer this question, we just need to take a look of their archive histories. If a package has many versions, we can take that as th...

R Packages growth Curve Why R is so popular? There are a lot of reasons, such as: easy to learn and convenient to use, active community, open source, etc. Another important reason is the numerous contributed packages. Up to yesterday, there are 4033 R...

Most of regression methods assume that response variables follow some exponential distribution families, e.g. Guassian, Poisson, Gamma, etc. However, this assumption was frequently violated in real world by, for example, zero-inflated overdispersion problem. A number of methods were developed to deal with such problem, and among them, Quasi-Poisson and Negative Binomial are the most popular methods perhaps due to that...

My coworkers at Fred Hutchinson regularly use the development version of R (i.e., R-devel) and have urged me to do the same. This post details how I have set up the development version of R on our Linux server, which I use remotely because it is much faster than my Mac. First, I downloaded the R-devel source into ~/local/, which...

In our article How robust is logistic regression? we pointed out some basic yet deep limitations of the traditional full-step Newton-Raphson or Iteratively Reweighted Least Squares methods of solving logistic regression problems (such as in R‘s standard glm() implementation). In fact in the comments we exhibit a well posed data fitting problem that can not Related posts:

The Quick-and-Dirty Summary I was recently asked to participate in a proposed SXSW panel that will debate the question, “Will Data Scientists Be Replaced by Tools?” This post describes my current thinking on that question as a way of (1) convincing you to go vote for the panel’s inclusion in this year’s SXSW and (2)

RealClimate.org used the R language and data from the National Snow and Ice Data Center to create this chart showing the extent of Arctic sea-ice in each year since satellite observations began in 1978, and the current extent of ice coverage (in red). Even though there are several weeks of annual melting yet to come, the area of ice...