Blog Archives

PubChem 446220 = Yeyo

August 8, 2014
By
PubChem 446220 = Yeyo

I just updated my R package, CTSgetR, for biological database translation using the Chemical Translation Service (CTS). While making code examples I came across some humorous chemical name synonyms for the molecule referenced in PubChem  as CID = 446220. Below are a few examples, can you guess what this is? Badrock, Bazooka, Bernice, Bernies, Blast, Blizzard, Bouncing Powder, Bump, Burese,

Read more »

Multivariate Data Analysis and Visualization Through Network Mapping

June 27, 2014
By
Multivariate Data Analysis and  Visualization Through Network Mapping

Recently I had the pleasure of speaking about one of my favorite topics, Network Mapping. This is a continuation of a general theme I’ve previously discussed and involves the merger of statistical and multivariate data analysis results with a network. Over the past year I’ve been working on two major tools, DeviumWeb and MetaMapR, which

Read more »

Using Repeated Measures to Remove Artifacts from Longitudinal Data

June 4, 2014
By
Using Repeated Measures to Remove Artifacts from Longitudinal Data

Recently I was tasked with evaluating and most importantly removing analytical variance form a longitudinal metabolomic analysis carried out over a few years and including >2,5000 measurements for >5,000 patients. Even using state-of-the-art analytical instruments and techniques long term biological studies are plagued with unwanted trends which are unrelated to the original experimental design and stem from analytical

Read more »

Enrichment Network

May 10, 2014
By
Enrichment Network

Enrichment is beyond random occurrence within a category. Networks can represent relationships among variables. Enrichment networks display relationships among variables which are over represented compared to random chance. Next is  a tutorial for making enrichment networks for biological (metabolomic) data in R using the KEGG database.

Read more »

Choose Your Own Data Adventure

April 5, 2014
By
Choose Your Own Data Adventure

The question is: can we automate scientific discovery, and what might an interface to such a tool look like. I’ve been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered in the presentation

Read more »

High Dimensional Biological Data Analysis and Visualization

February 22, 2014
By
High Dimensional Biological Data Analysis and Visualization

High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used

Read more »

Tutorials- Statistical and Multivariate Analysis for Metabolomics

February 17, 2014
By
Tutorials- Statistical and Multivariate Analysis for Metabolomics

I recently had the pleasure in participating in the 2014 WCMC Statistics for Metabolomics Short Course. The course was hosted by the NIH West Coast Metabolomics Center and focused on statistical and multivariate strategies for metabolomic data analysis. A variety of topics were covered using 8 hands on tutorials which focused on: data quality overview

Read more »

Classification with O-PLS-DA

September 29, 2013
By
Classification with O-PLS-DA

Partial least squares (PLS) is a versatile algorithm which can be used to predict either continuous or discrete/categorical variables. Classification with PLS is termed PLS-DA, where the DA stands for discriminant analysis.  The PLS-DA algorithm has many favorable properties for dealing with multivariate data; one of the most important of which is how variable collinearity is

Read more »

Orthogonal Partial Least Squares (OPLS) in R

July 28, 2013
By
Orthogonal Partial Least Squares (OPLS) in R

I often need to analyze and model very wide data (variables >>>samples), and because of this I gravitate to robust yet relatively simple methods. In my opinion partial least squares (PLS) is a particular useful algorithm. Simply put, PLS is an extension of principal components analysis (PCA), a non-supervised  method to maximizing  variance explained in X,

Read more »

Interactive Heatmaps (and Dendrograms) – A Shiny App

July 7, 2013
By
Interactive Heatmaps (and Dendrograms) – A Shiny App

Heatmaps are a great way to visualize data matrices. Heatmap color and organization can be used to  encode information about the data and metadata to help learn about the data at hand. An example of this could be looking at the raw data  or hierarchically clustering samples and variables based on their similarity or differences.

Read more »