In the third and last of the ggplot series, this post will go over interesting ways to visualize the distribution of your data.

I recently had the pleasure in participating in the 2014 WCMC Statistics for Metabolomics Short Course. The course was hosted by the NIH West Coast Metabolomics Center and focused on statistical and multivariate strategies for metabolomic data analysis. A variety of topics were covered using 8 hands on tutorials which focused on: data quality overview

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design intent leads many to (falsely) Related posts:

ShareLaTeX (click here to register a free account) is a wonderful and reliable on-line editor for writing and compiling LaTeX documents “in the cloud” as well as working together in real-time (imagine Google Docs supporting LaTeX => you get ShareLaTeX).…Read more ›

I've (passively) been keeping meticulous records of almost every song I've listened to since January of 2008. Since I opened my last.fm account 6 years ago, they've accumulated a massive detailed dataset of the 107,222 songs I've listened to since then. The best thing is that they're willing to share this data with me! I »more

As I have described before, Linear Discriminant Analysis (LDA) can be seen from two different angles. The first classify a given sample of predictors to the class with highest posterior probability . It minimizes the total probability of misclassification. To compute it uses Bayes’ rule and assume that follows a Gaussian distribution with class-specific mean

I always thought that there were some kind of schools in statistics, areas (not to say universities or laboratories) where people had common interest in term of statistical methodology. Like people with strong interest in extreme values, or in Lévy Processes. I wanted to check this point so I did extract information about articles puslished in about 35 journals...

The horseshoe effect is a well known and discussed issue with principal component analysis (PCA) (e.g. Goodall 1954; Swan 1970; Noy-Meir & Austin 1970). Similar geometric artefacts also affect correspondence analysis (CA). In part 1 of this series I looked at the implications of these “artefacts” for the recovery of temporal or single dominant gradients from multivariate palaeoecological data....

Earlier in this series I looked at the ordilabel() and then the orditorp() functions, and most recently the ordipointlabel() function in the vegan package as means to improve labelling in ordination plots. In this, the fourth and final post in the series I take a look at orditkplot(). If you’ve created ordination diagrams before or...

Ordination methods that yield orthogonal axes of variation are often used to summarise the multivariate data obtained from sediment cores. Usually the first or, less often, the first few ordination axes are taken as directions of change or the main patterns of variance in the multivariate data. There is an oft-overlooked issue with this approach that has the potential...