I'd like to introduce a package that simulates regression models. This includes both single level and multilevel (i.e. hierarchical or linear mixed) models up to two levels of nesting. The package produces a unified framework to simulate all types of c...

Users new to the Rcpp family of functionality are often impressed with the performance gains that can be realized, but struggle to see how to approach their own computational problems. Many of the most impressive performance gains are demonstrated with seemingly advanced statistical methods, advanced C++–related constructs, or both. Even when users are able to understand how various demonstrated features operate in isolation, examples...

Last week, we released the RMOA package at CRAN (http://cran.r-project.org/web/packages/RMOA). It is an R package to allow building streaming classification and regression models on top of MOA. MOA is the acronym of 'Massive Online Analysis' and it is the most popular open source framework for data stream mining which is being developed at the University of Waikato: http://moa.cms.waikato.ac.nz....

Space, a wise man once said, is the final frontier. Not the Buzz Alrdin/Light Year, Neil deGrasse Tyson kind (but seriously, have you seen Cosmos?). Geographic space. Distances have been finding their way into metrics since the cavemen (probably). GIS seem to make nearly every science way more fun…and accurate! Most of my research deals with

This post is a review of the “GENERALIZED DOUBLE PARETO SHRINKAGE” Statistica Sinica (2012) paper by Armagan, Dunson and Lee. Consider the regression model (Y=Xbeta+varepsilon) where we put a generalized double pareto distribution as the prior on the regression coefficients (beta). The GDP distribution has density $$begin{equation} f(beta|xi,alpha)=frac{1}{2xi}left( 1+frac{|beta|}{alphaxi} right)^{-(alpha+1)}. label{} end{equation}$$ GDP as Scale The post

(skip to the shiny app) Model building is very often an iterative process that involves multiple steps of choosing an algorithm and hyperparameters, evaluating that model / cross validation, and optimizing the hyperparameters. I find a great aid in this process, for classification tasks, is not only to keep track of the accuracy across models, »more

Reader Annisa Mike asked in a comment on an early post about power calculation for logistic regression with an interaction. This is a topic that has come up with increasing frequency in grant proposals and article submissions. We'll begin by showing how to simulate data with the interaction, and in our next post...

When getting started in machine learning, it's often helpful to see a worked example of a real-world problem from start to finish. But it can be hard to find an example with the "right" level of complexity for a novice. Here's what I look for: uses r...