Blog Archives

Orthogonal Partial Least Squares (OPLS) in R

July 28, 2013
By
Orthogonal Partial Least Squares (OPLS) in R

I often need to analyze and model very wide data (variables >>>samples), and because of this I gravitate to robust yet relatively simple methods. In my opinion partial least squares (PLS) is a particular useful algorithm. Simply put, PLS is an extension of principal components analysis (PCA), a non-supervised  method to maximizing  variance explained in X,

Read more »

Interactive Heatmaps (and Dendrograms) – A Shiny App

July 7, 2013
By
Interactive Heatmaps (and Dendrograms) – A Shiny App

Heatmaps are a great way to visualize data matrices. Heatmap color and organization can be used to  encode information about the data and metadata to help learn about the data at hand. An example of this could be looking at the raw data  or hierarchically clustering samples and variables based on their similarity or differences.

Read more »

Principal Components Analysis Shiny App

June 23, 2013
By
Principal Components Analysis Shiny App

I’ve recently started experimenting with making Shiny apps, and today I wanted to make a basic app for calculating and visualizing principal components analysis (PCA). Here is the basic interface I came up with. Test drive the app for yourself using the code below or  check out the the R code HERE. Above is an example of the

Read more »

Dynamic Data Visualizations in the Browser Using Shiny

June 16, 2013
By
Dynamic Data Visualizations in the Browser Using Shiny

After being busy the last two weeks teaching and attending academic conferences, I finally found some time to do what I love, program data visualizations using R. After being interested in Shiny for a while, I finally decided to pull the trigger and build my first Shiny app! I wanted to make a proof of

Read more »

Tutorial- Building Biological Networks

April 4, 2013
By
Tutorial- Building Biological Networks

I love networks! Nothing is better for visualizing complex multivariate relationships be it social, virtual or biological. I recently gave a hands-on network building tutorial using R and Cytoscape to build large biological networks. In these networks Nodes represent metabolites and edges can be many things, but I specifically focused on biochemical relationships and chemical

Read more »

Evaluation of Orthogonal Signal Correction for PLS modeling (OSC-PLS and OPLS)

March 15, 2013
By
Evaluation of Orthogonal Signal Correction for PLS modeling (OSC-PLS and OPLS)

Partial least squares projection to latent structures or PLS is one of my favorite modeling algorithms. PLS is an optimal algorithm for predictive modeling using wide data or data with  rows << variables. While there is s a wealth of literature regarding the application of PLS to various tasks, I find it especially useful for biological

Read more »

PCA to PLS modeling analysis strategy for WIDE DATA

March 2, 2013
By
PCA to PLS modeling analysis strategy for WIDE DATA

Working with wide data is already hard enough, add to this row outliers and things can get murky fast. Here is an example of an anlysis of a wide data set, 24 rows  x 84 columns. Using imDEV, written in R, to calculate and visualize a principal components analysis (PCA) on this data set. We find that

Read more »

Data analysis approaches to modeling changes in primary metabolism

January 31, 2013
By
Data analysis approaches to modeling changes in primary metabolism

Read more »

Power Calculations – relationship between test power, effect size and sample size

January 17, 2013
By
Power Calculations – relationship between test power, effect size and sample size

I was interested in modeling the relationship between the power and sample size, while holding the significance level constant (p = 0.05) , for the common two-sample t-Test. Luckily R has great support for power analysis and I found the function I was looking for in the package pwr. To calculate the power for the two-sample T-test

Read more »

Anaerobic Stress in Seeds – A Chemical Similarity Network Story

December 31, 2012
By
Anaerobic Stress in Seeds – A Chemical Similarity Network Story

The chemical similarity network or CSN is a great tool for organizing biological data based on known biochemistry or chemical structural similarity. Here is an example CSN for visualizing metabolomic  changes (measured via GC/TOF) due to anaerobic stress in germinating seeds. In this network edges are formed for chemical similarity scores > 75. Node color describes

Read more »