I’m hardly the first person you would want to talk to about learning statistics in R. But if you’re bent on teaching yourself R, and you’ve ended up at my blog, here are some resources I found useful. (No opinions here about whether R is good/bad better/worse than Excel, Minitab, Matlab, Octave, SPSS, Stata, SAS, or others.)
R Rroject is the mothership.
Rstudio is an IDE for R, which provides a better GUI for some basic tasks. Most of what you’d expect from a modern IDE: syntax highlighting, GUI commands for loading and saving data, setting the working directory, separate panes for help files.
UCLA tutorials are a well written introduction to basic data entry, functions, and graphics in R. There are similar tutorials for Stata and other languages here as well.
Quick-R is a blog and a book written by a statistician for people switching from SPSS and Stata to R. Excellent and concise website detailing all of the basics: data entry, functions, plots, and how to think about all of the above.
R help list and archives are a way to ask questions of experienced users. You’ll get excellent help here, but it’s important to respect the etiquette. Basically, (1) read the package manual, (2) work up a minimal example with your question, and (3) be extremely precise about the data you have and the data you want, as opposed to the way you’re trying to solve that problem. This will become clearer if you read a few discussions in the archives.
StackExchange is a glorified bulletin board for programmers exchanging help and (frequently great) advice. Search the archives before posting new questions–the guys that hang out here hate duplicate postings. But it’s easier to navigate than the R help archives.
Spoetry explains some of the syntax and style of R. It’s a longish treatise that includes innumerable gems such as the use of subscripts in R.
You’ll need to get used to reading the manuals for packages you want to use, generally speaking. And reverse engineering code is particularly useful because syntax standards are pretty weak across packages.
R has thousands of packages. Benefits: lightweight software, highly extensible, possible for anyone to code updates anywhere in the world. Drawbacks: hard to know which libraries you need, and whether your favorites are best in class or obsolete.
What packages should you use in R? That’s sort of a moving target. (Depends what you want to d0.) So CRAN does organize some collections called task views. Notably SocialSciences, Econometrics, Spatial, Bayesian, TimeSeries, Survival.
I’ve found myself frequently installing these in addition to the more obvious ones.
- foreign–read/write data formats from SPSS, Stata, Matlab, SAS, dbase, etc.
- vcd–visualizing categorical data
- lattice–trellis plots
- Hmisc–grab bag of useful odds and ends
- reshape–lets you choose whether observations are organized in lots of columns or one big column
- ggplot2–simplifies plotting commands
- plyr–simplifies reshape commands
- Zelig–grandiose ambitions to unify model specification
- datasets–lots of built-in datasets you can play with
- ISOcodes–uniform country abbreviations
- sp–spatial statistics
- statnet–network statistics, graphs, topology
Some really key concepts are hard to appreciate at the beginning. For example, why there are different data types in R. These turn out to be very useful, but at first they seem like a pain. The data types include scalars, vectors, matrices, arrays, lists, and data frames. Vectors can take on a number of types: numerics, characters, factors, and ordered factors. Matrices and arrays are composed of a single type of data with 2 or more dimensions. Data frames are the things most like a Stata dataset, where the different columns of a data frame can contain different types of data (numbers, strings, qualitative data, etc.).