I’m introducing R to a few colleagues this week and want to share why learning a software like R is important… Here are a few articles that explain it well… Other reasons?
The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.
I think statisticians are part of it, but it’s just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills – of being able to access, understand, and communicate the insights you get from data analysis – are going to be extremely important. Managers need to be able to access and understand the data themselves.
Where R fits?
R provides an environment for all tools needed for data science (see the data science process below from Benjamin Fry’s thesis).
– R is ideal for small data analysis i.e. data that fits in a computer’s RAM e.g. data < 10GB. Whereas SQL and search techniques seem good for larger data sets that can fit in one machine and techniques like Hadoop are good for BIG data sets that cannot fit in one machine.
– NY times article on R you ready for R?
– NY times article on R