In this short post I take a look at how to use R and ggplot2 to visualize effect sizes (Cohen’s d) and how to shade the overlapping area of two distributions.

The common approach to estimating a binary dependent variable regression model is to use either the logit or probit model. Both are forms of generalized linear models (GLMs), which can be seen as modified linear regressions that allow the dependent variable to originate from non-normal distributions. The coefficients in a linear regression model are marginal

We quite regularly use genetic algorithms to optimise over the ad-hoc functions we develop when trying to solve problems in applied mathematics. However it’s a bit disconcerting to have your algorithm roam through a high dimensional solution space while not being able to picture what it’s doing or how close one solution is to another. … Continue reading...

Today I received confirmation that the delayed fifth volume in the Developments in Palaeoenvironmental Research series has been published. The book is titled Data Handling and Numerical methods, though it covers more of the latter and, IMHO, is far more interesting than the dry title would suggest (who gets excited by Data Handling? Well, one or two people perhaps...

A recent post on the Junkcharts blog looked at US weather dataand the importance of explaining scales (which in this case went up to 118). Ultimately, it turns out that 118 is the rank of the data compared to the previous 117 years of data (in ascending order, so that 118 is the highest). At … Continue reading...

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full April edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Spring Webinar Series. Our Spring Webinar Series features presentations from Revolution Analytics staff and...

For many of my latest data blogs, I needed historical weather data to perform data mash-ups to pin-point the cause. For example, for my continued exploration into the airlines/airports historical data using SAP HANA and R, I wanted to find out wh...