Blog Archives

How long is the average dissertation?

April 15, 2013
By

The best part about writing a dissertation is finding clever ways to procrastinate. The motivation for this blog comes from one of the more creative ways I’ve found to keep myself from writing. I’ve posted about data mining in the past and this post follows up on those ideas using a topic that is relevant

A nifty line plot to visualize multivariate time series

April 1, 2013
By

A few days ago a colleague came to me for advice on the interpretation of some data. The dataset was large and included measurements for twenty-six species at several site-year-plot combinations. A substantial amount of effort had clearly been made to ensure every species at every site over several years was documented. I don’t pretend

Animating neural networks from the nnet package

March 19, 2013
By

My research has allowed me to implement techniques for visualizing multivariate models in R and I wanted to share some additional techniques I’ve developed, in addition to my previous post. For example, I think a primary obstacle towards developing a useful neural network model is an under-appreciation of the effects model parameters have on model

Visualizing neural networks from the nnet package

March 4, 2013
By

Neural networks have received a lot of attention for their abilities to ‘learn’ relationships among variables. They represent an innovative technique for model fitting that doesn’t rely on conventional assumptions necessary for standard models and they can also quite effectively handle multivariate response data. A neural network model is very similar to a non-linear regression

Data fishing: R and XML part 3

February 18, 2013
By

I’ve recently posted two blogs about gathering data from web pages using functions in R. Both examples showed how we can create our own custom functions to gather data about Minnesota lakes from the Lakefinder website. The first post was an example showing the use of R to create our own custom functions to get

Collinearity and stepwise VIF selection

February 5, 2013
By
$Collinearity and stepwise VIF selection$

Collinearity, or excessive correlation among explanatory variables, can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. For example, forward or backward selection of variables could produce inconsistent results, variance partitioning analyses may be unable to identify unique sources of variation, or parameter estimates may include substantial amounts

Data fishing: R and XML part 2

January 21, 2013
By

I’m constantly amazed at what can be done using free software, such as R, and more importantly, what can be done with data that are available on the internet. In an earlier post, I confessed to my sedentary lifestyle immersed in code, so my opinion regarding the utility of open-source software is perhaps biased. None

Breaking the rules with spatial correlation

January 7, 2013
By

Students in any basic statistics class are taught linear regression, which is one of the simplest forms of a statistical model. The basic idea is that a ‘response’ variable can be mathematically related to one or any number of ‘explanatory’ variables through a linear equation and a normally distributed error term. With any statistical tool,

Stealing from the internet: Part 1

December 20, 2012
By

Well, not stealing but rather some handy tools for data mining… About a year ago I came across the package XML as I was struggling to get some data from various web pages. The purpose of this blog is to describe how this package can be used to quickly gather data from the internet. I’ll