Blog Archives

Predicting optimal of iterations and completion time for GBM

November 20, 2013
By
Predicting optimal of iterations and completion time for GBM

When choosing the hyperparameters for Generalized Boosted Regression Models, two important choices are shrinkage and the number of trees. Generally a smaller shrinkage with more trees produces a better model, but the modeling time significantly increases. Building a model with too many trees that are heavily cut back by cross validation wastes time, while building a model...

Read more »

Binomial confidence intervals: exact vs. approximate

October 30, 2013
By
Binomial confidence intervals: exact vs. approximate

This graph and R code compares the exact vs. normal approximations for 95% binomial confidence intervals for n trials with either one success or 50% success. Continue reading →

Read more »

Bar plot with error bars in R

October 20, 2013
By
Bar plot with error bars in R

Here's a simple way to make a bar plot with error bars three ways: standard deviation, standard error of the mean, and a 95% confidence interval. The key step is to precalculate the statistics for ggplot2. Continue reading →

Read more »

Calculate RMSE and MAE in R and SAS

July 12, 2013
By
Calculate RMSE and MAE in R and SAS

Here is code to calculate RMSE and MAE in R and SAS. RMSE (root mean squared error), also called RMSD (root mean squared deviation), and MAE (mean absolute error) are both used to evaluate models. MAE gives equal weight to all errors, while RMSE gives extra weight to large errors. Continue reading →

Read more »

Geolocate IP addresses in R

May 20, 2013
By
Geolocate IP addresses in R

This R function uses the free freegeoip.net geocoding service to resolve an IP address (or a vector of them) into country, region, city, zip, latitude, longitude, area and metro codes. Continue reading →

Read more »

Popup notification from R on Windows

April 19, 2013
By
Popup notification from R on Windows

After R is done running a long process, you may need to notify the operator to check the R console and provide the next commands. Without installing any more software or creating any batch files or VBS scripts, here is a simple way to create the popup notice in Windows Continue reading →

Read more »

lag function for data frames

October 29, 2012
By
lag function for data frames

When applying the stats::lag() function to a data frame, you probably expect it will pad the missing time periods with NA, but lag() doesn’t. For example: Nothing happened. Here is an alternative lag function made for this situation. It pads … Continue reading →

Read more »

nnet2sas() supports centering and scaling

October 4, 2012
By
nnet2sas() supports centering and scaling

nnet2sas() version 1 introduced a way to export a nnet() model trained in R to Base SAS through metaprogramming, and now nnet2sas() version 2 introduces support for variable centering and scaling as implemented in caret::train(). See the link for version … Continue reading →

Read more »

Comparing continuous distributions with R

June 13, 2012
By
Comparing continuous distributions with R

In R we’ll generate similar continuous distributions for two groups and give a brief overview of statistical tests and visualizations to compare the groups. Though the fake data are normally distributed, we use methods for various kinds of continuous distributions. … Continue reading →

Read more »

Plotting individual growth charts

March 14, 2012
By
Plotting individual growth charts

This R code draws individual growth plots as shown in “Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence” by Judith D. Singer and John B. Willett, an excellent book on multilevel modeling and survival analysis. This code recreates figure … Continue reading →

Read more »