There are so many great advantages (simple running SQL statements, creating, loading, deleteing data to data.frames, connectivity to many databases, support for SQL functions, data types and many many more) , but one that was really a major win was interactions with data frames and SQL Language.
There are also many great packages for manipulating, wrangling and engineering data frames. Tidyverse, dplyr, data.table, purr, tibble, magrittr are many more. A curated list of relevant packages for data scientists can be found here.
But using SQL syntax to get subsets of data.frame can also be done, especially for everyone with SQL background. This blogpost will show a simplicity of using this package and compare it with base R or dplyr.
Let’s create a data.frame with some sample data. I will use iris dataset.
iris <- iris
And load both packages:
So let’s say we want to get a particular column from dataset that has filtered values. In base R:
iris[iris$Sepal.Width >= 3.0,]$Sepal.Width
iris %>% select(Sepal.Width) %>% filter(Sepal.Width>=3.0)
and using sqldf:
sqldf("select [Sepal.Width] from iris where [Sepal.Width] >= 3.0")
All in all, it is your flavour of choice, but for convenience, up to you, which one to use.
As always, code is available in at the Github in same Useless_R_function repository.