Site icon R-bloggers

R is for Research, Python is for Production

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

👉 Sign Up For More Blog Articles 👈


Both R and Python are great. We’ll showcase some of the strengths of each language in this article by showcasing where the major development efforts are within each ecosystem.

R is for Research


If I had to describe R in one word, it would be: tidyverse. It has made research tasks – wrangling data, visualizing outcomes, iterating from idea to code – painless. In fact, it’s a joy. I’ll explain why R is for Research using the Ultimate R Cheat Sheet, a one-stop shop for the R-ecosystem.


When starting with R, Tidyverse is an ideal place to begin your journey. This is the formalized set of packages and tools that have a consistently structured programming interface, as opposed to the base version of R that was notably more complex and less user friendly.

We see many smaller packages that tackle specific problems. The following are the most important packages:

Dplyr & ggplot2

Two great packages in R that you’ll make daily decisions from are dplyr and ggplot2, which amongst other things, are great for data manipulation and visualization. These are the two most important skills a data scientist or data analyst can have.

Rmarkdown

One of the most exceptional aspects of R is without a doubt Rmarkdown, which is a framework for creating reproducible reports, presentations, blogs, journals and more! Imagine having a report that runs itself, and creates an easily shareable HTML page or PDF to share with your team. Definitely a more streamlined approach than hundreds of clicks in Excel every Monday morning.

Shiny

Shiny is another framework within R that is used to create interactive web applications. One of the best features of Shiny is providing the non data focused members of your team with the data science tools they need for decision making through an easy to use GUI (graphical user interface). Imagine your team getting together for a Monday afternoon planning session, having already reviewed the previous week’s report created in Rmarkdown, and running simulations using your collaborative Shiny web application to determine where the data is guiding you next.

Where R is Growing

Next, if we scroll through to the “Special Topics Page”, we can see the R ecosystem is growing. This is a key feature that distinguishes the R Ecosystem from the Python Ecosystem.

We can see that R has expanded into:

What is R missing?

There is noticeably a gap in the Production. R has Shiny (Apps) and Plumber (APIs, not shown), but Automation Tools like Airflow and Cloud Software Development Kits (SDKs) are primarily available in Python.

R Overall

R is really something special when doing research because of the tidyverse, which streamlines data wrangling and visualization. Honestly, you’ll be 3-5X more productive doing data wrangling in R once you become proficient with the tidyverse.

Why is Python Great?

Python is amazing too, but for different reasons. Let’s take a Python Package like OpenCV – for Computer Vision.

This is a real strength for the Python language because we can do crazy cool things like Object Detection with OpenCV.


But, how much does this apply to my daily life? Around zero. Why? Because I’m a business analyst and data scientist that works with SQL databases. I’m more interested in how Python will help me better mine for information and productionalize the results.


Let’s check out the Python Ecosystem using the Ultimate Python Cheat Sheet (note that this is different from the R cheat sheet shown earlier).


We see that there’s Pandas for essentially everything related to import, tidying and data wrangling. So what is Pandas? Pandas is an object-oriented tool for data wrangling in Python.

Pandas vs Tidyverse

While programmers love pandas, business analysts may initially struggle with the object-oriented (pythonic) way of having Data Frames with methods.

customer_counts_df = df.group_by(‘customer_id’).value_counts()

Everything in Python is an object, and we call these methods (e.g. group_by, and value_counts) on the object. This call doesn’t seem too bad. But we are normally trying to do many more wrangling operations. It gets very challenging, less readable, and more complex.

Conversely, in R using the tidyverse we use a different syntax with a pipe (%>%). This is very similar to SQL and the flow of data wrangling how a user thinks.

customer_counts_tbl <- df %>%
    group_by(customer_id) %>%
    summarize(count = n())

This tidyverse data wrangling workflow makes it often much easier for analysts to expand the set of operations into 10 or more data wrangling commands. Remember, the challenge isn’t typing code, it’s turning your thoughts into code. This is where the tidyverse is really powerful.

Key Strengths of Python lie in Production ML

OK, so why is Python great for business? It turns out that it’s strengths lie in Machine Learning and Production!

We can see that Python has well-developed Production ML-oriented tools:

These production-oriented tools make it easier to work with others that interact with cloud and operations as part of a larger IT team because they are already in Python. No need to include R and any extra dependencies into a production system.

Python Overall

If you can get over the Pandas learning curve, then Python becomes a great tool. Most IT teams know Python, so your code will fit right into their workflow. Just realize that you may be 3X to 5X less productive at Research than your R counterparts due to the tidyverse boost.

Which Language Should You Learn?

The decision can be challenging because they both Python and R have clear strengths.


Why Not Learn Both Python and R?

One thing I haven’t mentioned is that I’m building a course that teaches Python from an R-users perspective. The core idea is that Python can be a tremendous asset, and being able to use tools like R’s reticulate to communicate between R and Python can make you a real asset to a data science team. Join the R/Python Teams course waitlist.

This waitlist is for:


Join the R/Python Teams Course Waitlist


To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.