-omics in 2013

June 24, 2013
By

(This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers)

Just how many (bad) -omics are there anyway? Let’s find out.

1. Get the raw data

It would be nice if we could search PubMed for titles containing all -omics:

*omics[TITL]

However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013:

2013[PDAT]

and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size.

2. Extract the -omics
Titles are in column 1 and we only want the -omics, so:

cut -f1 -d "," pubmed_result.csv | grep -i omics > omics.txt
wc -l omics.txt
# 1770 omics.txt

3. Clean, rinse, repeat…
We want just a list of -omics words. Time to break out the R. After much trial and error, I ended up with this. Ugly and far from optimized, but it (mostly) works. I say mostly, because I know of at least one case which is not detected: stain-omics.

library(stringr)

omics <- readLines("omics.txt")
omics <- strsplit(omics, " ")            # split titles on space
omics <- unlist(omics)                   # convert to vector of words
omics <- omics[grep("omics", omics)]     # just the -omics words
omics <- gsub("[\"\'\\.:\\?\\[\\]]", "", omics, perl = T)  # remove symbols, punctuation
omics <- tolower(omics)

m <- data.frame(a = omics, b = str_match(omics, "^(.*?omics)-")[, 2])  # matches e.g. "genomics-based"
omics <- ifelse(is.na(m$b), as.character(m$a), as.character(m$b))                       

m <- data.frame(a = omics, b = str_match(omics, "-{1,}(.*?omics)$")[, 2])  # matches e.g. "phospho-proteomics"
omics <- ifelse(is.na(m$b), as.character(m$a), as.character(m$b))

omics <- unlist(strsplit(omics, "\\/"))  # split e.g. "genomics/proteomics"
omics <- omics[grep("omics", omics)]     # just the -omics words again

# OK we're down to the edge cases now :)
omics <- gsub("applications", "", omics)
omics <- gsub("\\(meta\\)", "meta", omics)

4. Visualize
The top 20 -omics in 2013 and the less popular:

omics.freq <- as.data.frame(table(omics))
omics.freq <- omics.freq[ order(omics.freq$Freq, decreasing = T),]
ggplot(head(omics.freq, 20)) + geom_bar(aes(omics, Freq), stat = "identity", fill = "darkblue")
                             + coord_flip() + theme_bw()
# and the less popular
subset(omics.freq, Freq == 1)
On the right, the top 20. Click for a larger version of the graphic. Top of the list so far for 2013 is proteomics, followed by genomics and metabolomics.

Listed below, those -omics found only once in titles from 2013. Some shockers, I think you’ll agree (paging Jonathan Eisen).

                    omics Freq
          aquaphotomics    1
       biointeractomics    1
             calciomics    1
            cholanomics    1
           cytogenomics    1
           cytokinomics    1
          econogenomics    1
            glcnacomics    1
 glycosaminoglycanomics    1
          interactomics    1
               ionomics    1
         macroeconomics    1
            materiomics    1
      metalloproteomics    1
              metaomics    1
     metaproteogenomics    1
           microbiomics    1
         microeconomics    1
          microgenomics    1
        microproteomics    1
               miromics    1
         mitoproteomics    1
             mobilomics    1
             morphomics    1
              museomics    1
              neuromics    1
       neuropeptidomics    1
        nitroproteomics    1
      nutrimetabonomics    1
           oncogenomics    1
        orthoproteomics    1
            pangenomics    1
           petroleomics    1
   pharmacometabolomics    1
     pharmacoproteomics    1
   phylotranscriptomics    1
              phytomics    1
           postgenomics    1
              pyteomics    1
          radiogenomics    1
           rehabilomics    1
     retrophylogenomics    1
                 romics    1
            secretomics    1
              sensomics    1
         speleogenomics    1
           surfaceomics    1
              surfomics    1
     toxicometabolomics    1
            vaccinomics    1
              variomics    1
omics

Top 20 -omics in PubMed titles, 2013

Never heard of romics? That’s OK. It’s a surname.


Filed under: bioinformatics, publications, R, statistics Tagged: omics, pubmed

To leave a comment for the author, please follow the link and comment on his blog: What You're Doing Is Rather Desperate » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.