Explore the intersection of concepts like dimension reduction, clustering, data preparation, PCA, HDBSCAN, k-NN, SOM, deep learning...and Carl Sagan!

Principal component methods such as PCA (principal component analysis) or MCA (multiple correspondence analysis) can be used as a pre-processing step before clustering. But principal component methods give also a framework to visualize data. Thus, the clustering methods can be represented onto the map provided by the principal component method. In the figure below, the hierarchical tree

This post shows how to perform PCA with R and the package FactoMineR. If you want to learn more on methods such as PCA, you can enroll in this MOOC (everyting is free): MOOC on Exploratory Multivariate Data Analysis Dataset Here is a wine dataset, with 10 wines and 27 sensory attributes (like sweetness, bitterness,

A beautiful graph tells more than a lenghtly speach!! So it is crucial to improve the graphs obtained by Principal Component Analysis or (Multiple) Correspondence Analysis. The package Factoshiny allows us to easily improve these graphs interactively. The package Factoshiny makes interacting with R and FactoMineR simpler, thus facilitating selection and addition of supplementary information. The main advantage

This is a practical tutorial on performing PCA on R. If you would like to understand how PCA works, please see my plain English explainer here. Reminder: Principal Component Analysis (PCA) is a method used to reduce the number of variables in a dataset. We are using R’s USArrests dataset, a dataset from 1973 showing,

This post is the second part in the customer segmentation analysis. The first post focused on k-means clustering in R to segment customers into distinct groups based on purchasing habits. This post takes a different approach, using Pricipal Component Analysis (PCA) in R as a tool to view customer groups. Because PCA attacks the...

Not all Principal Component Analysis (PCA) (also called Empirical Orthogonal Function analysis, EOF) approaches are equal when it comes to dealing with a data field that contain missing values (i.e. "gappy"). The following post compares several methods by assessing the accuracy of the derived PCs to reconstruct the "true" data set, as was similarly...

nIntroductionnI work in consulting. If you're a consultant at a certain type of company, agency, organization, consultancy, whatever, this can sometimes mean travelling a lot.nnMany business travellers 'in the know' have heard the old joke that if you want to stay at any type of hotel anywhere in the world and get a great rate, all you have to...

Authors: Jan Smycka, Petr Keil This post introduces experimental R package bPCA which we developed with Jan Smycka, who actually came with the idea. We do not guarantee the very idea to be correct and there certainly are bugs – we invite anyone to show us wrong, or to contribute. … Continue reading →

e-mails with the latest R posts.

(You will not see this message again.)