# Blog Archives

## Merge by City and State in R

February 20, 2014
By

Often, you'll need to merge two data frames based on multiple variables. For this example, we'll use the common case of needing to merge by city and state.First, you need to read in both your data sets:# import city coordinate data:coords <- read.cs...

## ggplot Fit Line and Lattice Fit Line in R

February 13, 2014
By

Let's add a fit line to a scatterplot!Fit Line in Base GraphicsHere's how to do it in base graphics:ols <- lm(Temp ~ Solar.R, data = airquality)summary(ols)plot(Temp ~ Solar.R, data = airquality)abline(ols)Fit line in base graphics in RFit Line in...

## Compare Regression Results to a Specific Factor Level in R

February 6, 2014
By

Including a series of dummy variables in a regression in R is very simple. For example,ols <- lm(weight ~ Time + Diet, data = ChickWeight)summary(ols) The above regression automatically includes a dummy variable for all but the first level of the factor of the Diet variable.Call:lm(formula = weight ~ Time...

## Check if a Variable Exists in R

December 5, 2013
By

If you use attach, it is easy to tell if a variable exists. You can simply use exists to check:>attach(df)>exists("varName") TRUEHowever, if you don't use attach (and I find you generally don't want to), this simple solution doesn't ...

## Table as an Image in R

October 24, 2013
By

Usually, it's best to keep tables as text, but if you're making a lot of graphics, it can be helpful to be able to create images of tables.PNG tableCreating the TableAfter loading the data, let's first use this trick to put line breaks between the leve...

## Line Breaks Between Words in Axis Labels in ggplot in R

October 17, 2013
By

Sometimes when plotting factor variables in R, the graphics can look pretty messy thanks to long factor levels. If the level attributes have multiple words, there is an easy fix to this that often makes the axis labels look much cleaner.Without Line Br...

## Custom Legend in R

October 10, 2013
By

This particular custom legend was designed with three purposes: To effectively bin values based on a theoretical minimum and maximum value for that variable (e.g. -1 and 1 or 0 and 100) To use a different interval notation than the default To handle NA values Even though this particular legend was designed with those needs, it should be simple to extrapolate from...

## Perform a Function on Each File in R

September 26, 2013
By

Sometimes you might have several data files and want to use R to perform the same function across all of them. Or maybe you have multiple files and want to systematically combine them into one file without having to open each file and manually copy the...

## Truncate by Delimiter in R

September 19, 2013
By

Sometimes, you only need to analyze part of the data stored as a vector. In this example, there is a list of patents. Each patent has been assigned to one or more patent classes. Let's say that we want to analyze the dataset based on only the first pat...

September 12, 2013
By

I often find it beneficial to check to see whether or not a dataset is already loaded into R at the beginning of a file. This is particularly helpful when I'm dealing with a large file that I don't want to load repeatedly, and when I might be using the...