In recent KDnuggets Analytics software survey poll, Python and R were ranked top 2 tools for data science and machine learning. If you really want to boost your career in data science world, these are the languages you need to focus on.

RStudio developed a package called **reticulate** which provides a medium to run Python packages and functions from R.

**Install and Load Reticulate Package**

Run the command below to get this package installed and imported to your system.

# Install reticulate package

install.packages(“reticulate”)# Load reticulate package

library(reticulate)

**Check whether Python is available on your system**

py_available()

**Import a python module within R**

You can use the function **import( ) **to import a particular package or module.

os <- import(“os”)

os$getcwd()

[1] "C:\\Users\\DELL\\Documents"

You can use **listdir( ) **function from **os** package** **to** **see all** **the files in working directory

os$listdir()

[1] ".conda" ".gitignore" ".httr-oauth"

[4] ".matplotlib" ".RData" ".RDataTmp"

[7] ".Rhistory" "1.pdf" "12.pdf"

[10] "122.pdf" "124.pdf" "13.pdf"

[13] "1403.2805.pdf" "2.pdf" "3.pdf"

[16] "AIR.xlsx" "app.r" "Apps"

[19] "articles.csv" "Attrition_Telecom.xlsx" "AUC.R"

**Install Python Package**

**Step 1 : Create a new environment **

conda_create(“r-reticulate”)

**Step 2 : Install a package within a conda environment**

conda_install(“r-reticulate”, “numpy”)

**Since numpy is already installed, you don’t need to install it again. The above example is just for demonstration.**

**Step 3 : Load the package**

numpy <- import(“numpy”)

**Working with numpy array**

Let’s create a sample numpy array

y <- array(1:4, c(2, 2))

x <-numpy$array(y)

[,1] [,2]

[1,] 1 3

[2,] 2 4

**Transpose the above array**

numpy$transpose(x)

[,1] [,2]

[1,] 1 2

[2,] 3 4

**Eigenvalues and eigen vectors**

numpy$linalg$eig(x)

[[1]]

[1] -0.3722813 5.3722813

[[2]]

[,1] [,2]

[1,] -0.9093767 -0.5657675

[2,] 0.4159736 -0.8245648

**Mathematical Functions**

numpy$sqrt(x)

numpy$exp(x)

**Working with Python interactively**

`repl_python()`

function, you can make it interactive. Download the **dataset**used in the program below.

repl_python()# Load Pandas packageimport pandas as pd# Importing Datasettravel = pd.read_excel(“AIR.xlsx”)# Number of rows and columns

travel.shape# Select random no. of rows

travel.sample(n = 10)# Group By

travel.groupby(“Year”).AIR.mean()# Filter

t = travel.loc[(travel.Month >= 6) & (travel.Year >= 1955),:]# Return to R

exit

Note : You need to enter **exit** to return to the R environment.

**How to access objects created in python from R**

You can use the **py** **object** to access objects created within python.

summary(py$t)

In this case, I am using R’s **summary( ) function** and accessing dataframe **t **which was created in python. Similarly, you can create line plot using ggplot2 package.

# Line chart using ggplot2

library(ggplot2)

ggplot(py$t, aes(AIR, Year)) + geom_line()

**How to access objects created in R from Python**

**r object**to accomplish this task.

**1. Let’s create a object in R**

mydata = head(cars, n=15)

**2. Use the R created object within Python REPL**

repl_python()

import pandas as pdr.mydata.describe()

pd.isnull(r.mydata.speed)

exit

**Building Logistic Regression Model using sklearn package**

repl_python()# Load librariesfrom sklearn import datasetsfrom sklearn.linear_model import LogisticRegression# load the iris datasetsiris = datasets.load_iris()# Developing logit modelmodel = LogisticRegression()model.fit(iris.data, iris.target)# Scoringactual = iris.targetpredicted = model.predict(iris.data)# Performance Metricsprint(metrics.classification_report(actual, predicted))print(metrics.confusion_matrix(actual, predicted))

**Other Useful Functions**

**To see configuration of python**

Run the **py_config( ) **command** **to find the version of R installed on your system.It also shows details about anaconda and numpy.

py_config()

python: C:\Users\DELL\ANACON~1\python.exe

libpython: C:/Users/DELL/ANACON~1/python36.dll

pythonhome: C:\Users\DELL\ANACON~1

version: 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]

Architecture: 64bit

numpy: C:\Users\DELL\ANACON~1\lib\site-packages\numpy

numpy_version: 1.14.2

**To check whether a particular package is installed**

In the following program, we are checking whether **pandas **package is installed or not.

py_module_available(“pandas”)

