<!-- Styles for R syntax highlighter In this post I outline how count data may be modelled using a negative binomial distribution in order to more accurately present trends in time series count data than using linear methods. I also show how to...

I’ve been doodling… Following a query about the possible purchase of Twitter followers for various public figure accounts (I need to get my head round what the problem is with that exactly?!), I thought I’d have a quick look at some stats around follower groupings… I started off with a data grab, pulling down the

A new Rcpp master class is scheduled for March 9 in New York. The format will an updated version of the one-day workshops I have given at the University of Rochester in 2010, in San Franciso in 2011 (organised by Revolution Analytics) and at the UseR...

I’ve recently posted two blogs about gathering data from web pages using functions in R. Both examples showed how we can create our own custom functions to gather data about Minnesota lakes from the Lakefinder website. The first post was an example showing the use of R to create our own custom functions to get

Introduction Many scientists are concerned about normality or non-normality of variables in statistical analyses. The following and similar sentiments are often expressed, published or taught: "If you want to do statistics, then everything needs to be normally distributed." "We normalized…Read more →

In a previous post, I used R to process data from the Lahman database to calculate index values that compare a team's run production to the league average for that year. For the purpose of that exercise, I started the sequence at 1947, but for what follows I re-ran the code with the time period...

One of the topics emphasized in Exploring Data in Engineering, the Sciences and Medicine is the damage outliers can do to traditional data characterizations. Consequently, one of the procedures to be included in the ExploringData package is FindOutliers, described in this post. Given a vector of numeric values, this procedure supports four different methods for identifying possible outliers.Before...

Yesterday's Introduction to R for Data Mining webinar was a record setter, with more than 2000 registrants and more than 700 attending the live session presented by Joe Rickert. If you missed it, I've embedded the video replay below, and Joe's slides (with links to many useful resources) are also available. During the webinar, Joe demoed several examples of...

e-mails with the latest R posts.

(You will not see this message again.)