A new twist on the identifier mapping problem

January 11, 2010

(This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers)

Yesterday, Deepak wrote about BridgeDB, a software package to deal with the “identifier mapping problem”. Put simply, biologists can name a biological entity in any way that they like, leading to multiple names for the same object. Easily solved, you might think, by choosing one identifier and sticking to it, but that’s apparently way too much of a challenge.

However, there are times when this situation is forced upon us. Consider this code snippet, which uses the Bioconductor package GEOquery via the RSRuby library to retrieve a sample from the GEO database:

require "rubygems"
require "rsruby"

if ENV['R_HOME'].nil?
  ENV['R_HOME'] = "/usr/lib/R"

r = RSRuby.instance
sample = r.getGEO("GSM434143")
table  = r.Table(sample)
keys   = table.keys
puts keys

All good so far. What if I try to save the data table, which contains entries such as { “DETECTION.P.VALUE” => “0.000146581″ }, to my new favourite database, MongoDB?

key must not contain '.'

So what am I to do, other than modify the key using something like:

newkey = key.gsub(/./, "_")

Voilà, my own personal contribution to the identifier mapping problem.

What’s the solution? Here are some options – rank them in order of silliness if you like:

  • Biological databases should avoid potentially “troublesome” keys
  • Database designers should allow any symbols in keys
  • Database driver writers should include methods to check keys and alter them if necessary
  • End users should create their own maps by storing the original key with the modified version

Posted in bioinformatics, programming, R, research diary, ruby Tagged: databases, identifiers, mapping, mongodb

To leave a comment for the author, please follow the link and comment on their blog: What You're Doing Is Rather Desperate » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)