Data science is vastly different than programming. We use only four languages – R, Python, Julia, and SQL. Now, SQL is non-negotiable, as every data scientist must be proficient in it. Julia is still the new kid on the block. Many argue which is better – Python or R? But today, we ask a different question – how can you use R and Python together?
It might seem crazy at first, but hear us out. Both Python and R are stable languages used by many data scientists. Even seasoned package developers, such as Hadley Wickham, borrow from
BeauftifulSoup (Python) to make
Rvest (R) web scraping packages. Reinventing the wheel makes no sense.
Today we’ll explore a couple of options you have if you want to use R and Python together in the same project. Let’s start with options for Python users.
Table of contents:
- Calling R Scripts from Python
- Running R Code from Python with rpy2
- Calling Python Scripts from R
- Running Python Code from R with R Markdown
How to Call R Scripts from Python
Using R and Python together at the same time is incredibly easy if you already have your R scripts prepared. Calling them from Python boils down to a single line of code. Let’s cover the R script before diving further.
It’s really a simple one, as it only prints some dummy text to the console:
On the Python end, you’ll need to use the
subprocess module to run a shell command. All R scripts can be run with the
Rscript <script-path> call:
Below you’ll see the output:
The line was successfully printed to the console, and a zero was returned. That’s the thing – this approach is excellent if your R script performs tasks one after the other. It falls short if you want to use the output from R code in Python.
It’s a shortcoming that the next option for using R and Python together addresses.
How to Run R Code from Python with rpy2
Now we’ll dive into the good stuff. You’ll have to install the
rpy2 package in Python to follow along. It’s assumed you also have R installed and configured.
To start, we’ll use the
robjects submodule to access R objects, such as the number PI:
Here’s what’s stored in the variable:
You can check its type. It’s an R-specific float vector:
There’s a lot more you can do than access individual R objects. For example, you can also declare and run R functions. The code snippet below shows you how to declare a function for adding numbers and call it two times. Just to be extra careful, make sure to surround the R code with triple quotation marks:
Here’s the output from the above code snippet:
Many times you won’t find the built-in R packages enough for your specific use case. You can install additional, external R packages through Python with the
There’s also an option to work with R dataframes in Python. The code snippet below shows you how to import the
datasets subpackage and access the well-known MTcars dataset:
Here’s what the dataset looks like when displayed in Python:
And for the last bit, we’ll show you how to visualize the dataset with R’s
ggplot2 package. As of now, you can’t display the figures directly in the notebook, so you’ll need to save the figure to a file using the
grDevices package. The code responsible for plotting should go between the call to
grdevices.dev_off(), so keep that in mind for future reference:
And that’s how you can use R and Python together at the same time by running R code from Python. Let’s reverse the roles next and explore options for R users.
Looking to style your scatter plots? Read our comprehensive guide to stunning scatter plots with R and ggplot2.
How to Call Python Scripts from R
R users have an even easier time running scripts from the opposite programming language. You’ll have to install the
reticulate package if you want to follow along, as it’s responsible for running Python scripts and configuring Python environments.
First things first, let’s write a Python script. It will be a simple one, as it prints a single line to the console:
In R, you’ll have to import the
reticulate package and call the
py_run_file() function with a path to the Python script provided:
Here’s the output displayed in the R console:
As you can see, everything works as advertised. You can go one step further and use a specific Python version, virtual environment, or Anaconda environment. Use any of the three function calls below as a reference:
Next, we’ll explore more advanced ways R users can use R and Python at the same time.
Can R programmers make Machine Learning models? Yes! Learn how with fast.ai in R.
How to Run Python Code from R
reticulate package comes with a Python engine you can use in R Markdown. Reticulate allows you to run chunks of Python code, print Python output, access Python objects, and so on.
To start, create a new R Markdown (Rmd) file and do the usual setup – library imports and Python location configuration:
You can now create either an R or a Python block by writing three backticks and specifying the language inside of curly brackets. We’ll start with Python. The code snippet below imports the Numpy library, declares an array, and prints it:
But what if you want to convert Python’s Numpy array to an R vector? As it turns out, you can access Python objects in R by prefixing the variable name with
py$. Here’s an example:
As you would imagine, the possibilities from here are endless. We’ll now show you how to import the Pandas library, load in a dataset from GitHub, and print its first five rows:
Easy, right? You can import any Python library and write any Python code you want, and then access the variables and functions declared with R.
Python’s de-facto standard data visualization library is Matplotlib, and it’s also easy to use in R Markdown. Just remember to call the
plt.show() method, as the figure won’t be displayed otherwise:
And that’s how you can run Python code in R and R Markdown. That’s all we wanted to cover in today’s article, so let’s make a brief summary next.
Summary of Using R and Python Together
Today you’ve learned how to use R and Python together from the perspectives of both R and Python users. Hopefully, you can now combine the two languages to get the best of both worlds. For example, some R packages, such as
autoarima have no direct competitor in Python. Reinventing the wheel doesn’t make sense. So don’t. Just preprocess the data with Python and model it with R.
Why don’t you give it a try as a homework assignment? Download the Airline passengers dataset, load and preprocess it in Python, and R’s
autoarima package to make the forecasts. Share your results with us on Twitter – @appsilon. We’d love to see what you come up with.
Want to crack your upcoming Python and Data Science coding interview? Here are the top 7 questions you must know how to answer.