Blog Archives

R scripts

May 21, 2014
By
R scripts

Here goes a little bit of my late experiences with R scripts. Comments, suggestions and/or opinions are welcome. Usefulness of R scripts Basic R script Processing command-line arguments Verbose mode and stderr stdin in a non-interactive mode Usefulness of R scripts Besides being an amazing interactive tool for data analysis, R software commands can also

Read more »

Near-zero variance predictors. Should we remove them?

March 6, 2014
By
Near-zero variance predictors. Should we remove them?

Datasets come sometimes with predictors that take an unique value across samples. Such uninformative predictor is more common than you might think. This kind of predictor is not only non-informative, it can break some models you may want to fit to your data (see example below). Even more common is the presence of predictors that

Read more »

Character strings in R

February 19, 2014
By
Character strings in R

This post deals with the basics of character strings in R. My main reference has been Gaston Sanchez‘s ebook , which is excellent and you should read it if interested in manipulating text in R. I got the encoding’s section from , which is also a nice reference to have nearby. Text analysis will be

Read more »

Computing and visualizing LDA in R

January 15, 2014
By
Computing and visualizing LDA in R

As I have described before, Linear Discriminant Analysis (LDA) can be seen from two different angles. The first classify a given sample of predictors to the class with highest posterior probability . It minimizes the total probability of misclassification. To compute it uses Bayes’ rule and assume that follows a Gaussian distribution with class-specific mean

Read more »

Computing and visualizing PCA in R

November 28, 2013
By
Computing and visualizing PCA in R

Following my introduction to PCA, I will demonstrate how to apply and visualize PCA in R. There are many packages and functions that can apply PCA in R. In this post I will use the function prcomp from the stats package. I will also show how to visualize PCA in R using Base R graphics.

Read more »

Plot matrix with the R package GGally

November 13, 2013
By
Plot matrix with the R package GGally

I am glad to have found the R package GGally. GGally is a convenient package built upon ggplot2 that contains templates for different plots to be combined into a plot matrix through the function ggpairs. It is a nice alternative to the more limited pairs function. The package has also functions to deal with parallel

Read more »

Unsupervised data pre-processing: individual predictors

November 7, 2013
By
Unsupervised data pre-processing: individual predictors

I just got the excellent book Applied Predictive Modeling, by Max Kuhn and Kjell Johnson . The book is designed for a broad audience and focus on the construction and application of predictive models. Besides going through the necessary theory in a not-so-technical way, the book provides R code at the end of each chapter.

Read more »

Reshape and aggregate data with the R package reshape2

October 31, 2013
By
Reshape and aggregate data with the R package reshape2

Creating molten data Instead of thinking about data in terms of a matrix or a data frame where we have observations in the rows and variables in the columns, we need to think of the variables as divided in two groups: identifier and measured variables. Identifier variables (id) identify the unit that measurements take place

Read more »

Numerical computation of quantiles

October 23, 2013
By
Numerical computation of quantiles

Recently I had to define a R function that would compute the -th quantile of a continuous random variable based on an user-defined density function. Since the main objective is to have a general function that computes the quantiles for any user-defined density function it needs be done numerically. Problem statement Suppose we are interested

Read more »

Latent Gaussian Models and INLA

October 16, 2013
By
Latent Gaussian Models and INLA

If you read my post about Fast Bayesian Inference with INLA you might wonder which models are included within the class of latent Gaussian models (LGM), and can therefore be fitted with INLA. Next I will give a general definition about LGM and later I will describe three completely different examples that belong to this

Read more »