R being the lingua franca of data science and is one of the popular language choices to learn data science. Once the choice is made, often beginners find themselves lost in finding out the learning path and end up with a signboard as below.
In this blog post I would like to lay out a clear structural approach to learning R for data science. This will help you to quickly get started in your data science journey with R.
The Wrong Turn
Ok, I am going to learn and master R by learning all the packages. Then get to the data science theory and start doing my projects.
Never head this way! Learning R is similar to learning APIs – Focus on incremental learning instead of mastery.
First Things First
- Download & Installation : Download a suitable binary distribution of R for your operating system.
- Get Rstudio: RStudio is a leading IDE for R development. This will help you to code more productively with all the plots, package management and the editor in one place.
Become a LearneR
Take your little steps by understanding the syntax, data structures and libraries in R
In the post 5 Steps to Get Started With Data Science have provided a list of resources to learn R. These resources would be a good starting point and help you in the incremental learning. R has a strong user community with ever growing list of packages and support. Once you are comfortable with the basics, start exploring the packages for different data science tasks. Learn how to import data sets in R using packages like readr , data.table.
Data Science with R
Now that you are familiar with R, the next step is using R to solve Data Science problems. Below is a list of common data science tasks and how you could use R to achieve them.
Getting the data into R is the first step of the data science process. R has a wide range of options to get data of all formats into R. Below is a common list of packages best suited for data loading.
Data Analysis & Visualization
After getting the data into the R environment the next step in the data science workflow is to do simple exploratory analysis. Below are a list of wonderful R packages that helps to simplify data analysis and preparation.
- dplyr Learn dplyr which helps you do simple and elegant data manipulation
- data.table – Handles big data with ease. Great package for faster data manipulation/analysis
- ggplot2/ggvis – Awesome packages for data visualization
Data preparation is an important step in the data science workflow. Clean data is really hard to find, often data needs to be transformed and molded into a form on which we can run models.
- Reshape2 – Melt and cast the dataset into a shape you want.
- tidyr – Tutorial on tidy data
- Amelia – Missing values imputation
Modelling & Evaluation
Now the data is ready to hit the machine learning workbench. Below is a set of resources and packages which could help you through the model building process.
- Hands On Data Science – OnePage solutions for data mining challenges
- rattle – Graphical user interface for data mining
- caret – One stop shop for all machine learning algorithms.
- e1071 – A compact library for many algorithms
- Metrics – Ben Hamner’s package for a list of evaluation metrics
- Practical Data Science With R (book)
- Applied predictive modelling (book)
Now that you have some insights from the data, it is lost without effective communication. R Markdown is great tool for reporting your insights and share with fellow data scientists.
Start Small .. Build Big
Understanding algorithms, building your first recipes for common data science tasks is the small step. This is where most of the tutorials, courses and blogs stop. You could achieve this small step in a weekend and focus on the next big step by building your repository of small projects. By this you build up your skill for R and data science
Delibrate practice on more data sets and different kind of challenges would take you to the next step – Mastery. Go for it!!