R is everywhere
- Learn what R is all about
- Get an overview of why R is useful
- Submit your first code exercise
Introduction to R
The most powerful statistical computing language on the planet.
Norman Nie, Founder of SPSS
R is a programming language and environment to work with data. It is loved by statisticians and data scientists for its expressive code syntax and plentiful external libraries and tools and works on all major operating systems.
It is the Swiss army knife for data analysis and statistical computing (and you can make some pretty charts, too!). The R language is easily extensible with packages written by a large and growing community of developers around the world. You can find it pretty much anywhere—it is used by academic institutions, start-ups, international corporations and many more.
This is also reflected by looking at its adoption. Here we can see a large increase in both downloads and number of packages available over the years:
In 2020 R celebrates its 20th birthday with the release of version 4.0. And yes, it’s free and open source 😀
Quiz: R Facts
Which of the following statements about R are correct?
Why Use R?
R is a popular language for solving data analysis problems and is also used by people who traditionally do not consider themselves as programmers. When creating charts and visualizations with R, you will find that you have a much greater creative possibilities as opposed to graphical applications, such as Excel.
Here are some of the features R is most famous for:
Visualization: Creating beautiful graphs and visualizations is one of its biggest strengths. The core language already provides a rich set of tools used for plotting charts and for all kinds of graphics. The sky’s the limit.
Reproducibility: Unlike spreadsheet software, R code is not coupled to specific datasets and can easily be reused across different projects – even when exceeding more than 1 million rows. Easily build reusable reports and automatically generate new versions as the data changes.
Advanced modelling: R provides the biggest and most powerful code base for data analysis in the world. The richness and depth of available statistical models is unparalleled and growing by the day, thanks to the huge community of open source package developers and contributors.
Automation: R code can also be used to automate reports or to perform data transformations and model computations. It can also be integrated in automated production workflows, cloud computing environments and modern database systems.
Quiz: Using R
What are the main reasons to use R compared to spreadsheet software?
You R in Good Company
R is the de facto standard for statistical computing at academic institutions and companies around the world. Its great support for literate programming (code that can be combined with human-readable text) enables researchers and data scientists to create publication-ready reports which are easy to reproduce for reviewers.
The language has seen a wide adoption in various industries—see some examples below:
- Microsoft: Microsoft R Open, TrueSkill(TM), more here
- Google: R for Marketing Research and Analytics, Predicting the Present with Google Trends
- Facebook: Visualizing Friendships, The Formation of Love, Prophet Package for time series forecasting.
- Others (with links to projects): AirBnB, Uber, Oracle, IBM, Twitter,
Pharma: Merck, Genentech (Roche), Novartis, Pfizer
Newspapers: The Economist, The New York Times, Financial Times
- Banks: Bank of America, J.P.Morgan, Goldman Sachs, Credit Suisse, UBS, Deutsche Bank
- Insurances: Lloyd’s, Allianz
See also the R Consortium page for further information about industrial partners and initiatives.
The R language consists of three fundamental building blocks, which we will have a look at in the following chapters:
- Objects: Everything that exists is an object
- Functions: Everything that happens is a function call
- Interfaces: R connects well with many statistical algorithms and libraries
The most important object type in R are vectors. They form the basis for (almost) all R data structures. Being very vector-oriented makes R a very expressive and powerful language.
Functions and operators make it easy to work with vectors and compute results.
The greatest strengths of R is its flexibility to easily integrate new algorithms and build interfaces around them. R’s package ecosystem allows you to choose from thousands of open source models and libraries. The main package repository, called CRAN, hosts these packages and allows you to easily install and use them in your code.
Exercise: Submit your first code
This course has code exercises to help you learn and quickly explore new concepts. After entering code in the editor, hit the “Submit” button to execute it. The editor will give you feedback on your submission and displays any output below the editor. If you need some additional help use the “Get Hint” button.
To finish your first exercise, press the “Submit” button.