Using SQL for R data.frames with sqldf
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are many R packages for querying SQL Databases. Recently, I was looking into sqldf package | CRAN documentation.
There are so many great advantages (simple running SQL statements, creating, loading, deleteing data to data.frames, connectivity to many databases, support for SQL functions, data types and many many more) , but one that was really a major win was interactions with data frames and SQL Language.
There are also many great packages for manipulating, wrangling and engineering data frames. Tidyverse, dplyr, data.table, purr, tibble, magrittr are many more. A curated list of relevant packages for data scientists can be found here.
But using SQL syntax to get subsets of data.frame can also be done, especially for everyone with SQL background. This blogpost will show a simplicity of using this package and compare it with base R or dplyr.
Let’s create a data.frame with some sample data. I will use iris dataset.
iris <- iris
And load both packages:
library(dplyr) library(sqldf)
So let’s say we want to get a particular column from dataset that has filtered values. In base R:
iris[iris$Sepal.Width >= 3.0,]$Sepal.Width
using dplyr:
iris %>% select(Sepal.Width) %>% filter(Sepal.Width>=3.0)
and using sqldf:
sqldf("select [Sepal.Width] from iris where [Sepal.Width] >= 3.0")
All in all, it is your flavour of choice, but for convenience, up to you, which one to use.
As always, code is available in at the Github in same Useless_R_function repository.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.