Posted on June 25, 2009 by Jim in Uncategorized | 0 Comments
[This article was first published on Learning R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Using Microsoft Excel I’m collecting aggregate data, by state, of various social, political, and economic indicators. I export them into a tab-delimited file called ‘states.txt’ (pretty clever, I know.) I’ve got data on education expenditures, firearm deaths per capita, median household income, etc. I’d like to do some analysis and graphing of these data to see if there are any patterns of interest.
First, I import the data into R with the following command:
states <- read.table("C:/Data/states.txt", header=TRUE, sep = "\t")
This creates a data frame called ‘states’, and reads the text file into it. The command ‘header = TRUE’ tells R that the first row of data contains variable names, and ‘sep = “\t”‘ tells the program that the file is tab-demimited.
Next I ‘attach’ the data frame with the following command:
OK, now that I’ve got my data into R, what can I do with it?
First, I’ll run some correlations and see what’s going on.
cor (read2children, publicedexp)  0.4211508
This tells me that the correlation between public expenditures on education and the percentage of children below the age of five who are read to daily is 0.42. It is unsurprising that there’s a strong relationship between the two. It’s also likely that a third variable, household income, might be related.