MySQL Data Type Mapping in R

[This article was first published on RLang.io | R Language Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There was a recent question in the /r/Rlanguage subreddit which piqued my interest. They asked how to find the right mapping, and with the large number of data types I wondered if there was a good way to dynamically discover how fields are cast.

First step is to decide how to communicate with the database. I used the package RMySQL for this.

library(RMySQL)

Now to connect to the DB. I am running this locally. Make sure to use a password, even locally.

db_sql=dbConnect(MySQL(),user="root",
                 host="127.0.0.1",
                 dbname="test",
                 password="",
                 port=3306)

I included two different queries. The one below grabs all of the rows from the table.

q <- "SELECT * FROM main"
rows <- dbGetQuery(db_sql, q)

We get a couple warnings with the current table.

Warning messages:
1: In .local(conn, statement, ...) :
  Unsigned INTEGER in col 0 imported as numeric
2: In .local(conn, statement, ...) :
  unrecognized MySQL field type 7 in column 5 imported as character

We can use this loop to get the column name and data type for the imported dataframe.

for(column in names(rows)) {
  print(paste0(column,": ",typeof(rows[[column]])))
}

We get this nice output.

[1] "id: double"
[1] "char_test: character"
[1] "float_test: double"
[1] "bool_test: integer"
[1] "double_test: double"
[1] "timestamp_test: character"
[1] "datetime_test: character"
[1] "date_test: character"

The column names in MySQL were the data type followed by _test. As you can see, they were not all imported with a matching data type.

We can get a 1:1 mapping of the column type in MySQL and imported data type in R with the following piece of code.

q <- "SELECT DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'main'"
schema <- dbGetQuery(db_sql, q)
print(data.frame(SQL = schema$DATA_TYPE, R = as.vector(sapply(rows,typeof))))

This gives us a nice printout of a new dataframe created on the fly.

SQL         R
1       int    double
2      char character
3     float    double
4   tinyint   integer
5    double    double
6 timestamp character
7  datetime character
8      date character

And of course we need to close the connection if no more queries need to be made.

dbDisconnect(db_sql)

To leave a comment for the author, please follow the link and comment on their blog: RLang.io | R Language Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)