# Blog Archives

## Manual variable selection using the dropterm function

May 12, 2010
By

When fitting a multiple linear regression model to data a natural question is whether a model can be simplified by excluding variables from the model. There are automatic procedures for undertaking these tests but some people prefer to follow a more manual approach to variable selection rather than pressing a button and taking what comes

## Book Review – Modern Applied Statistics with S by W. N. Venables and B. D. Ripley (Springer 2003)

May 9, 2010
By

Order this book from Amazon Modern Applied Statistics with S (Fourth Edition) is one of the oldest and most popular books on Applied Statistics using R and S-plus. A large number of topics in Applied Statistics are covered in this book and it is certainly not for the faint hearted. A sound knowledge of

## Using the update function during variable selection

May 9, 2010
By

When fitting statistical models to data where there are multiple variables we are often interested in adding or removing terms from our model and in cases where there are a large number of terms it can be quicker to use the update function to start with a formula from a model that we have already

## Displaying data using level plots

May 3, 2010
By

A level plot is a type of graph that is used to display a surface in two rather than three dimensions – the surface is viewed from above as if we were looking straight down and is an alternative to a contour plot – geographic data is an example of where this type of graph

## Analysis of Covariance – Extending Simple Linear Regression

April 28, 2010
By

The simple linear regression model considers the relationship between two variables and in many cases more information will be available that can be used to extend the model. For example, there might be a categorical variable (sometimes known as a covariate) that can be used to divide the data set to fit a separate linear

## Summarising data using box and whisker plots

April 25, 2010
By

A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%)

## Simple Linear Regression

April 23, 2010
By

One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there

## Book Review – ggplot 2: Elegant Graphics for Data Analysis by Hadley Wickham (Springer 2009)

April 20, 2010
By

Order this book from Amazon This book is written by the author of the ggplot2 package for R, which is a package with a design inspired by the grammar of graphics and can remove some of the effort required to put together impressive graphs. The book is just under 200 pages and covers a

## R and Tolerance Intervals

April 19, 2010
By

Confidence intervals and prediction intervals are used by statisticians on a regular basis. Another useful interval is the tolerance interval that describes the range of values for a distribution with confidence limits calculated to a particular percentile of the distribution. The R package tolerance can be used to create a variety of tolerance intervals of

## Summarising data using scatter plots

April 18, 2010
By

A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is