Over on the Clastic Detritus blog, Brian Romans posted a nice introduction to plotting in R. At the end of his post, Brian mentioned he would like to colour in areas under the data curve corresponding to particular ranges of … Continue reading →

A previous post showed how to compute eigenvalues using the Armadillo library via RcppArmadillo. Here, we do the same using Eigen and the RcppEigen package. #include <RcppEigen.h> // ] using Eigen::Map; ...

The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is

In the first two parts of this series, I looked at the basics of the interface I created between rmathlib and kdb+. In this post, I’ll go through some of the convenience functions I wrote to emulate some basic R … Continue reading →

Over on the Clastic Detritus blog, Brian Romans posted a nice introduction to plotting in R. At the end of his post, Brian mentioned he would like to colour in areas under the data curve corresponding to particular ranges of grain sizes. The comment area on a blog isn’t really amenable to giving a full answer to the...

How to replicate a google gauge chart in R? Google charts has several options to produce nice graphics. Most of them have their equivalent function in R and can be quickly replicated, but some of them require a bit of programming. For instance, take the google gauge charts which I really like: A gauge is … Continue reading...

In two weeks (on January 24), Think Big Analytics' Jeffrey Breen will present a new webinar on using R with Hadoop. Here's the webinar description: R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies...

Following on from the last post on integrating some rmathlib functionality with kdb+, here is a sample walkthrough of how some of the functionality can be used, including some of the R-style wrappers I wrote to emulate some of the … Continue reading →

Code used: MineClearingSimulationWithKafons.r TRANSCRIPT OF VIDEO: Hello, I’m Matt Asher with StatisticsBlog.com. This video is about my attempt to simulate a landmine clearing device built by Massoud Hassani called the Mine Kafon. I’ve put a link to his webpage at StatisticsBlog.com, I highly recommend checking out the video. Hassani’s device looks like this: It’s a

One of the many things I keep avoiding is statistics. I’ve never really been convinced about the 5% significance level thing; as far as I can tell, hardly anything that’s interesting normally distributes; all the counting that’s involved just confuses me; and I never really got to grips with confidently combining probabilities. I find a

R’s formula interface is sweet but sometimes confusing. ANOVA is seldom sweet and almost always confusing. And random (a.k.a. mixed) versus fixed effects decisions seem to hurt peoples’ heads too. So, let’s dive into the intersection of these three. I’m aware that there are lots of packages for running ANOVA models that make things nicer

As mentioned in the Appendix of Modern Actuarial Risk Theory, “R (and S) is the ‘lingua franca’ of data analysis and statistical computing, used in academia, climate research, computer science, bioinformatics, pharmaceutical industry, customer analytics, data mining, finance and by some insurers. Apart from being stable, fast, always up-to-date and very versatile, the chief advantage of R is that...

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. One of my main motivations to install R is Sweave. The Sweave is a literate programming language which integrates LaTeX and R code. The main idea of the Sweave is to combine data analysis code...

In the natural sciences, it is common to have incomplete or unevenly sampled time series for a given variable. Determining cycles in such series is not directly possible with methods such as Fast Fourier Transform (FFT) and may require some degree of interpolation to fill in gaps. An alternative is the Lomb-Scargle method (or...

Here’s a video how the modFit function from the FME package optimizes parameters for an oscillation. A Nelder-Mead-optimizer (R function optim) finds the best fitting parameters for an undampened oscillator. Minimum was found after 72 iterations, true parameter eta was -.05: Evolution of parameters in optimization process from Felix Schönbrodt on Vimeo. More on estimating

Yesterday’s post started to explore the nice additions which the new C++11 standard is bringing to the language. One particularly interesting feature are lambda functions which resemble the anonymous functions R programmers have enjoyed all along...

One issue I continuously encounter when starting to work with a new dataset is that of the codebook. In general, I prefer to load a codebook into R like any other data source, specifically as a data frame. And ideally, one data frame to provides the variable names with descriptions and any other meta data available, and a separate...

The go-to bible for this data scientist and many others is The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Each of the authors is an expert in machine learning / prediction, and in some cases invented the techniques we turn to today to make sense of big data: ensemble...

At the time of the creation of this blog, Cronbach’s 1951 piece on coefficient alpha has 18,132 citations according to google scholar. The main use of coefficient alpha is to assess internal consistency reliability of a test or survey. Although it may have been forgotten, the proof Cronbach demonstrated established that coefficient alpha is the mean of all split...

Factor Analysis of Baseball's Hall of Fame VotersRecently, Nate Silver wrote a post which analyzed how voters who voted for and against Barry Bonds for Baseball's Hall of Fame differed. Not surprisingly, those who voted for Bonds were more likely to vote for other suspected steroids users (like Roger Clemens). This got...

A trackback from Martin Hawksey’s recent post on Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet), which demonstrates how to pull WordPress stats into a Google Spreadsheet and generates charts and reports therein, reminded me of the WordPress stats API. So here’s a quick function for pulling WordPress reports into R. (Code