SQLite is great for R and Shiny. The dbmisc package may help a bit.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Over the years I have written several Shiny apps that rely on SQLite databases. One example is my shiny app to find reproducible economic articles with data supplement: https://ejd.econ.mathematik.uni-ulm.de. The underlying data is stored in a SQLite database and I regularly run some scrapping code that inserts new content. SQLite is also super helpful for applications where multiple users can enter data. An example of mine is a Shiny app where each semester lecturers enter their seminars and then students enter their preferences over seminars in order to perform a centralized matching. For such applications, a database is much more convenient than trying to manage separate files. Given that not too many users enter their data at the same time, in my experience, SQLite performs very well. So far I had no need for a more complicated set-up with a separate database server.
The heavy-lifting for using SQLite in R is done by the great packages DBI and RSQLite. However, I also strongly rely on a bunch of convenience functions from my package dbmisc. For example, they facilitate seamless conversion of data types between R and SQLite. For a detailed description look at the README. Here is just a shorter overview what can be done.
Simple YAML Schemas
The core idea of dbmisc
is to describe the database schema in a simple YAML file with one main entry for each table. Here is an example entry for a table user
describing its columns and indices:
user: table: email: TEXT name: TEXT age: INTEGER female: BOOLEAN birthday: DATE last_edit: DATETIME unique_index: - email index: - age
The function dbCreateSQLiteFromSchema allows to quickly create an SQLite database from such a schema. More importantly, it also allows to update an existing database if the schema changed. This is quite helpful for me since smaller updates, like adding new columns, happen quite regularly in my projects.
dbmisc
has several utility functions to get, insert or update data into a table that can use schemas.
Conveniently inserting data
For example, the following code can be used to insert an entry into our table user
.
new_user = list(last_edit=Sys.time(), female=TRUE, email="[email protected]", name="Jane Doe", birthday = "2000-01-01", gender="female") dbInsert(db,table="user", new_user)
Note that the R object new_user
has several problems:
- Its fields are in a different order than in the database table
user
. - It contains the extra field
gender
that is not in the database table. - It does not contain the field
age
that is specified in the database table. - It has datetime and date variables whose conversion can in general be tricky.
Since I try to avoid writing boilerplate code, it is rather common in my Shiny apps that the R representation of my data has similar problems. Thanks to the information in the database schema, dbInsert can handle these issues automatically: the fields will be arranged in the right order, converted to the correct SQLite data types, additional fields in the R data set will just be ignored, and missing fields will be set to NA
.
Of course, such auto-correction also has risks in so far that certain errors in the entered data may be less likely to be detected. However, in my personal experience from programming smaller to medium sized apps, the convenience not to have to write the boilerplate code to manually fit my R objects to the database definition outweighs the risks of overlooking bugs.
If you set the argument run=FALSE
, dbInsert
performs no database action but just returns the SQL statement that would be run:
dbInsert(db,table="user", new_user,run = FALSE) "insert into user values (:email, :name, :age, :female, :birthday, :last_edit)"
Note that dbmisc
prepares in its default usage parametrized queries to avoid SQL-injections.
Opening a database connection with a schema
For dbInsert
to perform these corrections, it needs to know the database schema. The simplest way is to assign it as attribute to the database connection db
. This can be done, e.g. by opening the connection via the function dbConnectSQLiteWithSchema.
Side Remark: Use dbWithTransaction
Hint: When my Shiny apps insert, update or delete data they often have to modify several tables. Then I always combine the multiple dbInsert, dbUpdate or dbDelete commands inside a call to dbWithTransaction from the DBI
package to reduce the risk that my database ends up in an undesirable state should errors occur in some subset of the commands.
Getting data with dbGet
The function dbGet facilitates data retrieval, in particular my most common task to get entries from a single table. Here is an example that selects a single entry from the user
table:
dbGet(db, "user", list(email="[email protected]"))
(Yeah, Einstein was indeed born in Ulm, but that was long before our university was founded…) Of course, dbGet
can handle also more complex queries, as described in its reference. If a schema is specified, data types should be all correctly converted from SQLite to R.
There are also some other utility functions in the dbmisc package. Just look at the README file and the function documentation.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.