Reading data, and a graph

June 25, 2009

(This article was first published on Learning R, and kindly contributed to R-bloggers)

Using Microsoft Excel I’m collecting aggregate data, by state, of various social, political, and economic indicators. I export them into a tab-delimited file called ‘states.txt’ (pretty clever, I know.) I’ve got data on education expenditures, firearm deaths per capita, median household income, etc. I’d like to do some analysis and graphing of these data to see if there are any patterns of interest.

First, I import the data into R with the following command:

states <- read.table("C:/Data/states.txt", header=TRUE, sep = "\t")

This creates a data frame called ‘states’, and reads the text file into it. The command ‘header = TRUE’ tells R that the first row of data contains variable names, and ‘sep = “\t”‘ tells the program that the file is tab-demimited.

Next I ‘attach’ the data frame with the following command:

attach (states)

OK, now that I’ve got my data into R, what can I do with it?

First, I’ll run some correlations and see what’s going on.

cor (read2children, publicedexp)
[1] 0.4211508

This tells me that the correlation between public expenditures on education and the percentage of children below the age of five who are read to daily is 0.42. It is unsurprising that there’s a strong relationship between the two. It’s also likely that a third variable, household income, might be related.

cor (hincome, publicedexp)
[1] 0.6547179
cor (read2children, hincome)
[1] 0.4094883

These relationships are even stronger.

Let’s look at the data a different way, by using scatterplots to see the relationships.

plot is the R command for, well, plotting. It’s very powerful and you can do lots of things with it. First I’ll do a very simple scatterplot of education expenditures and children read to.

plot (publicedexp, read2children)

Figure 1. A very basic scatterplot

Next time we’ll work on making it better looking.

To leave a comment for the author, please follow the link and comment on their blog: Learning R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)