Articles by thiagogm

R scripts

May 21, 2014 | thiagogm

Here goes a little bit of my late experiences with R scripts. Comments, suggestions and/or opinions are welcome. Usefulness of R scripts Basic R script Processing command-line arguments Verbose mode and stderr stdin in a non-interactive mode Usefulness of R scripts Besides being an amazing interactive tool for data ... [Read more...]

Near-zero variance predictors. Should we remove them?

March 6, 2014 | thiagogm

Datasets come sometimes with predictors that take an unique value across samples. Such uninformative predictor is more common than you might think. This kind of predictor is not only non-informative, it can break some models you may want to fit to your data (see example below). Even more common is ... [Read more...]

Character strings in R

February 19, 2014 | thiagogm

This post deals with the basics of character strings in R. My main reference has been Gaston Sanchez‘s ebook [1], which is excellent and you should read it if interested in manipulating text in R. I got the encoding’s section from [2], which is also a nice reference to have ... [Read more...]

Computing and visualizing LDA in R

January 15, 2014 | thiagogm

As I have described before, Linear Discriminant Analysis (LDA) can be seen from two different angles. The first classify a given sample of predictors to the class with highest posterior probability . It minimizes the total probability of misclassification. To compute it uses Bayes’ rule and assume that follows a Gaussian ... [Read more...]

Computing and visualizing PCA in R

November 28, 2013 | thiagogm

Following my introduction to PCA, I will demonstrate how to apply and visualize PCA in R. There are many packages and functions that can apply PCA in R. In this post I will use the function prcomp from the stats package. I will also show how to visualize PCA in ... [Read more...]

Plot matrix with the R package GGally

November 13, 2013 | thiagogm

I am glad to have found the R package GGally. GGally is a convenient package built upon ggplot2 that contains templates for different plots to be combined into a plot matrix through the function ggpairs. It is a nice alternative to the more limited pairs function. The package has also ... [Read more...]

Unsupervised data pre-processing: individual predictors

November 7, 2013 | thiagogm

I just got the excellent book Applied Predictive Modeling, by Max Kuhn and Kjell Johnson [1]. The book is designed for a broad audience and focus on the construction and application of predictive models. Besides going through the necessary theory in a not-so-technical way, the book provides R code at the ... [Read more...]

Reshape and aggregate data with the R package reshape2

October 31, 2013 | thiagogm

Creating molten data Instead of thinking about data in terms of a matrix or a data frame where we have observations in the rows and variables in the columns, we need to think of the variables as divided in two groups: identifier and measured variables. Identifier variables (id) identify the ... [Read more...]

Numerical computation of quantiles

October 23, 2013 | thiagogm

Recently I had to define a R function that would compute the -th quantile of a continuous random variable based on an user-defined density function. Since the main objective is to have a general function that computes the quantiles for any user-defined density function it needs be done numerically. Problem ... [Read more...]

Latent Gaussian Models and INLA

October 16, 2013 | thiagogm

If you read my post about Fast Bayesian Inference with INLA you might wonder which models are included within the class of latent Gaussian models (LGM), and can therefore be fitted with INLA. Next I will give a general definition about LGM and later I will describe three completely different ... [Read more...]

Fast Bayesian Inference with INLA

October 9, 2013 | thiagogm

I am currently a research fellow and 4th year PhD candidate within the INLA group.  If you deal with Bayesian models and have never heard about INLA, I sincerely think you should spend a small portion of your time to at least know what it is. If you have heard ... [Read more...]

Profiling R code

September 25, 2013 | thiagogm

Profiling R code gives you the chance to identify bottlenecks and pieces of code that needs to be more efficiently implemented [1]. Profiling R code is usually the last thing I do in the process of package (or function) development. In my experience we can reduce the amount of time necessary ... [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)