Working on Data-Warehouse (SQL) with R

[This article was first published on DataScience+, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SQL being the most popular used query language for deep diving from small data to so called big data. However, there are many other languages like R which are growing in the user community for stats and graphical methodology models and plot data to match the data scientists mind.

R as an Analytic Application

R can be used for Data Warehousing and BI applications but R is not commonly being used as a platform for Data Warehouse; instead it works as a mediator for performing focused analysis with database managements.
Experts say that R handles everything in memory, which implies that it might be difficult to use it for expansive volumes of data.

R offers a large number of interfaces to apps such as Excel, SAS etc. which enable users to have a rich experience of the app integrated with the interface. Citing an experts comment on this – “Typically, I see people running their data warehouses in other packages and then using R to perform high-quality, focused analysis”.

R has compatibility with many packages like PL/R, Dplyr, plyr etc. Below we will see how Dplyr works.

Dplyr – what is Dplyr? (Just for the crazy minds)

Dplyr is a package which provides tools for manipulating datasets in R. Dplyr is the next iteration of plyr. Dplyr differs greatly from plyr for its easy, intuitive syntax. The major difference in Dplyr is the usage of the operator (%.%), that allows the code to be read in the opposite direction (left to right). If your usage is more towards explorations or if you want to collaborate your data with other data structures you could choose Dplyr.

Trying data with dplyr – Github

install.packages("dplyr")

For examples have the data packages installed

install.packages(c("nycflights13", "Lahman"))

To learn dyplr please refer to link

vignette("introduction", package = "dplyr")

— Source github

Working on R with PostgreSQL

In terms of Postgresql it’s commonly called PL/R or madlib as a language for many databases.

To install packages (postgres SQL) for R

install.packages("RPostgreSQL")
library(RPostgreSQL)
 package ‘RPostgreSQL’ successfully unpacked and MD5 sums checked 

Connect to DB (DW):

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="postgres", host="10.172.88.12", user="gpadmin",password="gpadmin",port="5432")

Experimenting with Sample retrieval

data <- dbGetQuery(con, "select datid,datname from pg_catalog.pg_stat_activity;")
data[1,1]

[1] 94888776

Clearing the result

dbClearResult(rs)
Function: dbClearResult (package DBI)
res="PostgreSQLResult"
dbDisconnect(con)
dbUnloadDriver(drv)

What is the Future of R?

When it comes to general data warehousing and BI tasks R as a solution makes it through many companies. Organizations prefer to take R as it’s easy to add existing IT environments or to add them on the current codes which can be used for special analytics in intensive projects. R lacks in state-of-the-art GUI. Hence, users prefer to use more interactive and easy-to-use front ends for R packages.

R has growth potential with more number of packages that help statisticians and analysts use these packages in more interactive ways hence enabling R to answer their research and business goals.

    Related Post

    1. Implementing Apriori Algorithm in R
    2. Handling missing data with MICE package; a simple approach
    3. Best packages for data manipulation in R
    4. Identify, describe, plot, and remove the outliers from the dataset
    5. Learn R By Intensive Practice – Part 2

    To leave a comment for the author, please follow the link and comment on their blog: DataScience+.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Never miss an update!
    Subscribe to R-bloggers to receive
    e-mails with the latest R posts.
    (You will not see this message again.)

    Click here to close (This popup will not appear again)