MySQL Data Type Mapping in R

October 2, 2018
By

(This article was first published on RLang.io | R Language Programming, and kindly contributed to R-bloggers)

There was a recent question in the /r/Rlanguage subreddit which piqued my interest. They asked how to find the right mapping, and with the large number of data types I wondered if there was a good way to dynamically discover how fields are cast.

First step is to decide how to communicate with the database. I used the package RMySQL for this.

library(RMySQL)

Now to connect to the DB. I am running this locally. Make sure to use a password, even locally.

db_sql=dbConnect(MySQL(),user="root",
                 host="127.0.0.1",
                 dbname="test",
                 password="",
                 port=3306)

I included two different queries. The one below grabs all of the rows from the table.

q <- "SELECT * FROM main"
rows <- dbGetQuery(db_sql, q)

We get a couple warnings with the current table.

Warning messages:
1: In .local(conn, statement, ...) :
  Unsigned INTEGER in col 0 imported as numeric
2: In .local(conn, statement, ...) :
  unrecognized MySQL field type 7 in column 5 imported as character

We can use this loop to get the column name and data type for the imported dataframe.

for(column in names(rows)) {
  print(paste0(column,": ",typeof(rows[[column]])))
}

We get this nice output.

[1] "id: double"
[1] "char_test: character"
[1] "float_test: double"
[1] "bool_test: integer"
[1] "double_test: double"
[1] "timestamp_test: character"
[1] "datetime_test: character"
[1] "date_test: character"

The column names in MySQL were the data type followed by _test. As you can see, they were not all imported with a matching data type.

We can get a 1:1 mapping of the column type in MySQL and imported data type in R with the following piece of code.

q <- "SELECT DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'main'"
schema <- dbGetQuery(db_sql, q)
print(data.frame(SQL = schema$DATA_TYPE, R = as.vector(sapply(rows,typeof))))

This gives us a nice printout of a new dataframe created on the fly.

SQL         R
1       int    double
2      char character
3     float    double
4   tinyint   integer
5    double    double
6 timestamp character
7  datetime character
8      date character

And of course we need to close the connection if no more queries need to be made.

dbDisconnect(db_sql)

To leave a comment for the author, please follow the link and comment on their blog: RLang.io | R Language Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)