R you ready for this? Statistics for free!

May 7, 2011

(This article was first published on Paleocave Blog » R, and kindly contributed to R-bloggers)

If you’ve listened to the show for a while or if you’ve been reading the paleocave blog from the beginning (like when we actually used to update it regularly), then you might know that I’m rather fascinated with statistics. Imagine my delight a few years ago when I found out that one of the most powerful statistical tools available (the one that most of the cool kids use) was available for free! That tool is called R.  It’s a great tool but a terrible name.  R is named both for the developers Robert Gentleman and Ross Ihaka (Robert and Ross), and as a sort of pun because it was an open source rewrite of the S language. That’s cool, I guess, but R as a name is horrible search engine optimization. Oh well, keeps out the riff-raff I suppose.

The vast majority of people would call R a programming language. Real computer programmers (the kind of people that argue about Ruby vs Perl) will tell you it’s not really a ‘language,’ it’s a ‘programming environment.’ Whatever, I don’t think I really know the difference.  Don’t get intimidated, because it’s pretty easy to do as much or as little as you want in R.

R screen shot

What’s a pirate’s favorite statistical programming language?

I know what you’re thinking.  “I don’t want to mess with that. I want something with a point and click interface and dropdown menus.”  You probably do ­– now, but once you see what the possibilities are your curiosity will be piqued and you’ll learn how to do more than a point and click interface ever could (plus this is free, remember).  Think of point and click sort of like public transportation.  Right now you just want a way to get to the grocery store because it’s too far to walk. Are you going to learn to drive or just take the bus?  You take the bus, less time and resources required.  But later, you learn to drive and realize you can go anywhere you want.  Maybe you occasionally still take the bus when it’s really convenient but sometimes you want to go someplace nobody else ever goes.

You’re still skeptical – I know, I was too.  Here’s a hook. When I show this to many people, they start sitting up straight and listening.  The hook is the histogram -that old statistical standby.  Ever try to make one in Excel?  It’s basically impossible. Download R, install it, open it, there’s some legalish text at the top of the screen and then a prompt that looks like this >

First let’s assign a data set a name. Type “data” “=” “c” and open some parentheses “(“ inside of those parentheses, type in your data points separated by comas, now close the parentheses ”)”. You just assigned the name data to that data set. Now make a histogram from it.  Type “hist” “(data)” Hit “return.” Bam! Histogram!

>            data=c(1,3,4,6,7,5,7,8,9,7,8,6,7,4,5,6,4,3,10,11,13,2,3)

>            hist(data)


It’s that easy! (if you cheated and copied and pasted my text, make sure to delete the prompts “>” before hitting return).

If you are starting to like what you see and you want to get some of your data stored in excel spreadsheets easily into R, I recommend Googling the “scan” command.  Not the most elegant way of getting data into R but good for your first time out on the road (I still use it probably more than I should).

If you are starting to think you might really use R, you might want to invest in some books to show you the ropes.  I have A First Course in Statistical Programming with R. I have also heard reasonably good things about Statistical Analysis with R.  Both of these books are light on statistics and heavy on R. So if you are looking to brush up on stats, you probably need something like Using R for Introductory Statistics, though I really can’t speak to how good or bad this book is because I’ve never used it.

Lastly, I figure some people out there might be looking to learn something about programming languages (or environments as the case may be) and wonder if R is a good place to start.  Well in my opinion it’s a fairly gentle start into learning a programming language.  What I don’t know is how well skills you’ve learned will translate into other harder hitting languages later on. You can read other people’s opinion on the matter (much more informed than my own) here http://www.psychwire.co.uk/2011/05/is-r-an-ideal-language-to-teach-the-fundamentals-of-programming-to-beginners/. Make sure you check out the comments for the back and forth discussion.

Ok, well enjoy getting started. Shoot me an email patrick[at]sciencesortof.com if you get a kick out of using R or want me to try to help you on something and I’ll expose my ignorance (though I like R, I’m not particularly great at it, tweet the once and future paleopal, @jdyeakel for some real expertise).


To leave a comment for the author, please follow the link and comment on their blog: Paleocave Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)