For a long time now I’ve wanted to add the ability for storing data from twitteR into a RDBMS. In the past I’ve done things by concatenating new results onto old results which simply becomes unwieldy. I know that many people have doctored up their own solutions for this but it seemed useful to have it baked in. Unfortunately I never had the time or energy to do this so the idea languished. But then dplyr happened – it provides some revolutionary tools for interacting with data stored in a database backend. I figured I’d kill two birds with one stone by finally implementing this project which in turn would give me a lot of data to play with. This is all checked in to master on github.
This is still a work in progress, so please let me know if you have any comments, particularly as regards making it more seamless to use.
First, some basics:
- While theoretically any DBI based backend will work, currently only RMySQL and RSQLite are supported.
- The only types of data able to be persisted are tweets (status) objects and user objects. Granted, this likely covers 95%+ of use cases.
- Data can be retrieved as either a list of the appropriate object or as a data.frame representing the table. Only the entire table will be retrieved – my expectation is that it will be simpler for users to interact with data via things like dplyr.
To continue, suppose we have a list of tweets we want to persist. Simply call store_tweets_db() with your list and they’ll be persisted into your database. By default they will be persisted to the table tweets but you can change this with the table_name argument.
Finally, to retrieve your tweets from the database the function is load_tweets_db(). By default this will return a list of the appropriate object, although by specifying as.data.frame=TRUE the result will be a data.frame mirroring the actual table. Much like store_tweets_db() there is a table_name argument.
Note that for user data there is a mirror set of functions, store_users_db() and load_users_db(), and the default table name is users.