Articles by Andrés Gutiérrez

Scatter plots in survey sampling

November 27, 2017 |

You can find this post in Blogdown format by clicking hereWhen it comes to analyzing survey data, you have to take into account the stochastic structure of the sample that was selected to obtain the data. Plots and graphics should not be an exception. ...

dplyr and the design effect in survey samples

November 21, 2017 |

Blogdown entry here.For those guys like me who are not such R geeks, this trick could be of interest. The package dplyr can be very useful when it comes to data manipulation and you can extract valuable information from a data frame. For example, when ... [Read more...]

Automatic output format in Rmarkdown

November 19, 2017 |

I am writing a Rmarkdown document with plenty of tables, and I want them in a decent format, e.g. kable. However I don't want to format them one by one. For example, I have created the following data frame in dplyrdata2 %__% group_by(uf) %__% sum...

Sampling weights and multilevel modeling in R

June 15, 2017 |

So many things have been said about weighting, but on my personal view of statistical inference processes, you do have to weight. From a single statistic until a complex model, you have to weight, because of the probability measure that induces the var... [Read more...]

Small Area Estimation 101

April 16, 2017 |

Small area estimation (SAE) has become a widely used technique in official statistics since the last decade of past century. When the sample size is not enough to provide reliable estimates at a very particular level, the power of models and auxiliary ... [Read more...]

Gelman’s MrP in R – What is this all about?

January 15, 2017 |

Multilevel regression with poststratification (MrP) is a useful technique to predict a parameter of interest within small domains through modeling the mean of the variable of interest conditional on poststratification counts. This method (or methods) w... [Read more...]

3PL models viewed through the lens of total probability theorem

January 1, 2017 |

As I currently am the NPM for PISA in Colombia, I must assist to several meetings dealing with the proper implementation of this assessment in my country. Few of them are devoted to the analysis of this kind of data (coming from IRT models). As usual, ...

Computing Sample Size for Variance Estimation

December 24, 2016 |

The R package samplesize4surveys contains functions that allow to calculate sample sizes for estimating proportions, means, difference of proportions and even difference of two means. It also permits the calculation of sample error and power level for ...

Highlighting R code for the web

December 3, 2016 |

When blogging about statistics and R, it is very useful to differentiate the body text to R code. I used to manage this issue by highlighting the code and pretty-R was a valuable instrument from Revolutions Analytics to accomplish this. However, as you... [Read more...]

How important is that variable?

December 3, 2016 |

When modeling any phenomena by including explanatory variables that highly relates the variable of interest, one question arises: which of the auxiliary variables have a higher influence on the response? I am not writing about significance testing or s... [Read more...]

November 20, 2016 |

In an article called A Paradox in the Interpretation of Group Comparisons published in Psychological Bulletin, Lord (1967) made famous the following controversial story:A university is interested in investigating the effects of the nutritional diet its...

Sublime Text 3: an alternative to RStudio

October 17, 2016 |

It was a Saturday morning; I was lecturing my students of my Item Response Theory class when I decided to run some R scripts to introduce my students with the JAGS syntax and the estimation of parameters in a Bayesian logistic regression setup.As it wa...

Multilevel Modeling of Educational Data using R (Part 1)

October 11, 2016 |

Linear models fail to recognize the effect of clustering due to intraclass correlation accurately. However, under some scenarios force you to take into account that units are clustered into subgroups that at the same time are nested within larger group...

#PredictiveCOL – Forecasting Colombia’s peace plebiscite (final update)

October 4, 2016 |

For sure, this is the more exciting forecast I have ever done. On one hand, I am Colombian guy, and I really want to live in a peaceful country, and I do want a better place for raising my children. On the other hand, I am very serious when it ...

Isolating confounding effects – Rankings and residuals

July 8, 2016 |

In a previous entry, we talked about the meaning and importance of isolating confounding variables. This entry is dedicated to the residuals and its relation to the variable of interest when controlling for some confounding factors.Let's think about ed...

I don’t care about that lost unit

June 4, 2016 |

Just assume that you have planned a survey along with the necessary sample size to obtain representativity. Let’s suppose the sample size is 100. However, as nonresponse is always present, unfortunately your effective sample size is 99. Consider the...

IRT classic anchoring with R functions

March 16, 2016 |

The main goal of standardised tests is to produce scores that can be compared not only within subgroups of students (and subpopulations of interest) but between applications (in different times). In summary, researchers and methodologists must assure t...

IRT equating using R functions – The calibrated pool method

March 5, 2016 |

In the assessment of education it is very common to use Item Response Theory in order to produce measures of ability for the students that applied an standardised test. Moreover, if you want to gain comparability between applications you should know th...

Anchoring estimation or the perfect excuse to become "Bayesian"

September 1, 2015 |

Anchoring is an usual process when estimating abilities in test equating. This is about analyzing standardized tests, while maintaining a predefined scale. For example, assume that you have a set of 60 items in your test. However, two test forms (named... [Read more...]

Parametric bootstrap

August 7, 2015 |

Assume we want to know the mean square error (MSE) of the sample median as a estimator of a population mean under normality. As you know, this is not a trivial problem. We may take advantage of the Bootstrap method and solve it by means of simulation.... [Read more...]