Top 9 R packages (that every Data Scientist must know)

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hey guys, welcome back to my R-tips newsletter. In today’s R-tip, I’m sharing the top 9 R packages that I use almost every day… You’re getting the cheat code to learning these R packages. Plus, I’m sharing a 200 lines of R code that shows how you can use my 9 R code templates for ANY company. Let’s go!

Table of Contents

Today I share how to use my Top 9 R Packages . Here’s what you’re learning today:

  • Top 9 R Packages: We’ll go through each of the top 9 R packages that I use almost every day.
  • 9 Code Templates: How I use each of these R packages to complete business analysis and data science tasks.
  • Shiny App Bonus: I’m sharing my Shiny App: Interactive Store Locator.

Top R Packages

Matt’s Top R Packages

This is the Bonus Shiny App you’re getting today!

Shiny App: Interactive Store Locator

Bonus Shiny App!


SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on October 18th

Inside the workshop I’ll share how I built a Machine Learning Powered Production Shiny App with ChatGPT (extends this data analysis to an insane production app):

ChatGPT for Data Scientists

What: ChatGPT for Data Scientists

When: Wednesday October 18th, 2pm EST

How It Will Help You: Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free chatgpt for data scientists workshop.

Price: Does Free sound good?

How To Join: 👉 Register Here


R-Tips Weekly

This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Pretty cool, right?

Here are the links to get set up. 👇

This Tutorial is Available in Video

I have a companion video that walks you through all of the code templates for my Top 9 R packages that every Data Scientist must know. 👇

I used to struggle at Data Science

Let’s be honest. I used to suck at Data Science.

In 2014, I was learning R. I was a beginner. I was struggling. I was frustrated. And I was stuck.

So if you’re in the same shoes now, I get it. I’ve been there. But here’s what changed for me.

I found out about an early version of the tidyverse. And it changed everything.

Over the course of the next two years, I went from a struggling Mechanical Engineer to a Director of Sales, Engineering, and Forecasting. And I had a dirty little secret.

R was behind everything.

In fact, I was using R to automate my job. I was using R to automate my team’s jobs. I was using R to automate my boss’s job.

And this led to a promotion. And then another promotion. And then another promotion.

My point is that R is a superpower. And I want to share with you the 9 R packages that I use almost every day.

Tutorial: Top 9 R Packages (With 9 Code Templates)

This tutorial is excellent. You’ll learn how to use my Top 9 R Packages with short code templates that you can use for almost ANY company:

  1. tidyverse – Meta R package for data analysis
  2. dplyr – Data wrangling and manipulation
  3. ggplot2 – Data visualization
  4. tidyr – Data wrangling and manipulation
  5. timetk – Time series analysis
  6. readr – Data import
  7. tidymodels – Machine learning
  8. leaflet – Interactive maps and geospatial analysis
  9. shiny – Interactive web apps

1: Tidyverse

The tidyverse is the meta package that loads all of the other packages that I use.

Code Template #1: Load the tidyverse

Prompt 1 Code

Get the Code.

Here’s what happens when you run the code:

Loading Tidyverse 2.0.0

It attaches (or loads):

  • dplyr – Data wrangling and manipulation
  • ggplot2 – Data visualization
  • tidyr – Data wrangling and manipulation
  • readr – Data import
  • purrr – Functional programming and iteration
  • tibble – Tidy data structure
  • stringr – String manipulation
  • forcats – Factor manipulation
  • lubridate – Date manipulation

In 1 line of code, now we have most of the R packages that we need to do our day-to-day work. Let’s get started with an example analysis. For that we’ll use the dplyr package.

2: dplyr

The dplyr package is the workhorse of the tidyverse. It’s the package that I use to manipulate data.

  • Purpose: Data manipulation.
  • Features: Enables filtering, grouping, summarizing data, and more.
  • Usefulness: Offers a more readable and concise syntax for data manipulation.

Code Template #2: Group by and summarize

Prompt 2 Code

Get the Code.

First, we make a sample sales data with products and their respective sales numbers.

Dplyr Input Data

Then we use group_by() and summarize() to get the Total Sales by Product.

Dplyr output data

3: ggplot2

The ggplot2 package is a data visualization package. It’s the package that I use to visualize data for static plots that go into Executive reports.

  • Purpose: Data visualization.
  • Features: Creates complex multi-plot layouts and produces elegant graphics.
  • Usefulness: It has a consistent syntax and is good for creating high-quality visualizations.

Code Template #3: Create a ggplot2 plot

Prompt 3 Code

Get the Code.

First, we make a sample monthly revenue data. And produce a ggplot2 data visualization with revenue by month.

ggplot data visualization

4: tidyr

The tidyr package is a data wrangling package. It’s the package that I use to reshape data (also called pivoting).

  • Purpose: Pivoting data (also nesting).
  • Features: Enables data reshaping and tidying.
  • Usefulness: Helps in organizing messy data for easier analysis.

Code Template #4: Pivot data from wide to long format

Prompt 4 Code

Get the Code.

First, we make a sample data with sales data for 2 products by month. This is in “wide format”.

Tidyr Wide Format

Then we use pivot_longer() to convert the data to “long format”. Long format is needed for most “tidy” data analysis including making plots with ggplot2 and summarizing data with dplyr.

Tidyr Long Format

5: timetk

The timetk package is a time series package. I am the creator of this R package. And it’s the package that I use to analyze time series data analysis problems.

  • Purpose: Time series analysis.
  • Features: Enables time series data wrangling and manipulation.
  • Usefulness: Helps in exploring and manipulating time series data for easier analysis.

Code Template #5: Create a time series trelliscope visualization for multiple time series

Prompt 5 Code

Get the Code.

We’ll use the FANG stock data to create a trelliscope visualization which is great for visualizing 10+ time series.

Trelliscope Visualization

6: readr

The readr package is a data input/output package. It’s the package that I use to read and write data.

  • Purpose: Data input/output.
  • Features: Provides functions to read and write data.
  • Usefulness: Efficiently handles large datasets and supports various data formats.

Code Template #6: Read data from a CSV file

Prompt 6 Code

We’ll use the read_csv() function to read data from a CSV file. This produces the following output:

Readr Output

7: tidymodels

The tidymodels package is a machine learning package. It’s the package that I use to build machine learning models.

  • Purpose: Machine learning.
  • Features: Provides a consistent interface for modeling and machine learning.
  • Usefulness: Helps in building and evaluating machine learning models fast.

Code Template #7: Fit and predict sales with a linear regression model

Prompt 7 Code

Get the Code.

We’ll use the linear_reg() function to fit a linear regression model to predict sales. Then we use the predict() function to predict sales for a Marketing_Spend of $4,000. The prediction is $8,000.

Tidymodels Output

8: leaflet

The leaflet package is a geospatial package. It’s the package that I use to create interactive maps for Shiny web apps and Exploratory Data Analysis.

  • Purpose: Interactive maps and geospatial analysis.
  • Features: Provides functions to create interactive maps.
  • Usefulness: Helps in visualizing geospatial data.

Code Template #8: Create an interactive map

Prompt 8 Code

Get the Code.

We’ll use the leaflet() function to create an interactive map for 2 Store Locations. This produces the following output:

Leaflet Output

9: shiny (BIG BONUS)

The shiny package is a web application package. It’s the package that I use to create interactive web apps for use in production.

  • Purpose: Interactive web apps.
  • Features: Allows the creation of interactive web applications directly from R.
  • Usefulness: Good for sharing analyses and visualizations in a user-friendly way. This is called “Production”.

Code Template #9: Create a Shiny App

This is a bonus and the code template is too. You’ll need to join the R-Tips newsletter to get the code.

Shiny Code Template

Click here to get the Bonus Shiny App.

It produces this Shiny App:

Shiny App: Interactive Store Locator

Click here to get the Bonus Shiny App.

Conclusion

In this article, I shared 9 R packages that have helped me the most.

  • You now have 9 code templates that you can use to perform data analysis and data science tasks for almost any company.
  • This should give you a leg up in your Data Science career.

With that said, if you are struggling to learn data science or need help becoming a Data Scientist, I want to help you. Here’s how I can help you become a 6-Figure Data Scientist.

Struggling to become a data scientist?

You know the feeling. Being unhappy with your current job.

Promotions aren’t happening. You’re stuck. Feeling Hopeless. Confused…

And you’re praying that the next job interview will go better than the last 12…

… But you know it won’t. Not unless you take control of your career.

The good news is…

I Can Help You Speed It Up.

I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.

I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.

And I built a training program that gets my students life-changing data science careers (don’t believe me? see my testimonials here):

6-Figure Data Science Job at CVS Health ($125K)
Senior VP Of Analytics At JP Morgan ($200K)
50%+ Raises & Promotions ($150K)
Lead Data Scientist at Northwestern Mutual ($175K)
2X-ed Salary (From $60K to $120K)
2 Competing ML Job Offers ($150K)
Promotion to Lead Data Scientist ($175K)
Data Scientist Job at Verizon ($125K+)
Data Scientist Job at CitiBank ($100K + Bonus)

Whenever you are ready, here’s the system they are taking:

Here’s the system that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…

What They're Doing - 5 Course R-Track

Join My 5-Course R-Track Program
(And Become The Data Scientist You Were Meant To Be…)

P.S. – Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). This could be you.

Success Samantha Got The Job

To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)