Blog Archives

Data fishing: R and XML part 3

February 18, 2013
By
Data fishing: R and XML part 3

I’ve recently posted two blogs about gathering data from web pages using functions in R. Both examples showed how we can create our own custom functions to gather data about Minnesota lakes from the Lakefinder website. The first post was an example showing the use of R to create our own custom functions to get

Read more »

Collinearity and stepwise VIF selection

February 5, 2013
By
Collinearity and stepwise VIF selection

Collinearity, or excessive correlation among explanatory variables, can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. For example, forward or backward selection of variables could produce inconsistent results, variance partitioning analyses may be unable to identify unique sources of variation, or parameter estimates may include substantial amounts

Read more »

Data fishing: R and XML part 2

January 21, 2013
By
Data fishing: R and XML part 2

I’m constantly amazed at what can be done using free software, such as R, and more importantly, what can be done with data that are available on the internet. In an earlier post, I confessed to my sedentary lifestyle immersed in code, so my opinion regarding the utility of open-source software is perhaps biased. None

Read more »

Breaking the rules with spatial correlation

January 7, 2013
By
Breaking the rules with spatial correlation

Students in any basic statistics class are taught linear regression, which is one of the simplest forms of a statistical model. The basic idea is that a ‘response’ variable can be mathematically related to one or any number of ‘explanatory’ variables through a linear equation and a normally distributed error term. With any statistical tool,

Read more »

Stealing from the internet: Part 1

December 20, 2012
By
Stealing from the internet: Part 1

Well, not stealing but rather some handy tools for data mining… About a year ago I came across the package XML as I was struggling to get some data from various web pages. The purpose of this blog is to describe how this package can be used to quickly gather data from the internet. I’ll

Read more »