Articles by heuristicandrew

Predicting optimal of iterations and completion time for GBM

November 20, 2013 | heuristicandrew

When choosing the hyperparameters for Generalized Boosted Regression Models, two important choices are shrinkage and the number of trees. Generally a smaller shrinkage with more trees produces a better model, but the modeling time significantly increases. Building a model with too many trees that are heavily cut back by cross ... [Read more...]

Bar plot with error bars in R

October 20, 2013 | heuristicandrew

Here's a simple way to make a bar plot with error bars three ways: standard deviation, standard error of the mean, and a 95% confidence interval. The key step is to precalculate the statistics for ggplot2. Continue reading → [Read more...]

Calculate RMSE and MAE in R and SAS

July 12, 2013 | heuristicandrew

Here is code to calculate RMSE and MAE in R and SAS. RMSE (root mean squared error), also called RMSD (root mean squared deviation), and MAE (mean absolute error) are both used to evaluate models. MAE gives equal weight to all errors, while RMSE gives extra weight to large errors. ... [Read more...]

Geolocate IP addresses in R

May 20, 2013 | heuristicandrew

This R function uses the free freegeoip.net geocoding service to resolve an IP address (or a vector of them) into country, region, city, zip, latitude, longitude, area and metro codes. Continue reading → [Read more...]

Popup notification from R on Windows

April 19, 2013 | heuristicandrew

After R is done running a long process, you may need to notify the operator to check the R console and provide the next commands. Without installing any more software or creating any batch files or VBS scripts, here is a simple way to create the popup notice in Windows ... [Read more...]

lag function for data frames

October 29, 2012 | heuristicandrew

When applying the stats::lag() function to a data frame, you probably expect it will pad the missing time periods with NA, but lag() doesn’t. For example: Nothing happened. Here is an alternative lag function made for this situation. It pads … Continue reading → [Read more...]

nnet2sas() supports centering and scaling

October 4, 2012 | heuristicandrew

nnet2sas() version 1 introduced a way to export a nnet() model trained in R to Base SAS through metaprogramming, and now nnet2sas() version 2 introduces support for variable centering and scaling as implemented in caret::train(). See the link for version … Continue reading → [Read more...]

Comparing continuous distributions with R

June 13, 2012 | heuristicandrew

In R we’ll generate similar continuous distributions for two groups and give a brief overview of statistical tests and visualizations to compare the groups. Though the fake data are normally distributed, we use methods for various kinds of continuous distributions. … Continue reading → [Read more...]

Plotting individual growth charts

March 14, 2012 | heuristicandrew

This R code draws individual growth plots as shown in “Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence” by Judith D. Singer and John B. Willett, an excellent book on multilevel modeling and survival analysis. This code recreates figure … Continue reading → [Read more...]

Scales and transformations in ggplot2 0.9.0

March 14, 2012 | heuristicandrew

Some R code designed for ggplot2 0.8.9 is not compatible with ggplot2 0.9.0, and today the ggplot2 web site has outdated documentation which gives this broken example: Dennis Murphy points to the ggplot2 0.9.0 transition guide from where I derived … Continue reading → [Read more...]

doSMP removed from CRAN

February 17, 2012 | heuristicandrew

If you do parallel processing in R on Windows, then you probably have heard of the doSMP package. However, it was recently removed from the CRAN repository with the terse message: Package ‘doSMP’ was removed from the CRAN repository. Revolution … Continue reading → [Read more...]

Using neural network for regression

November 17, 2011 | heuristicandrew

Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth … Continue reading → [Read more...]

Confidence interval diagram in R

October 19, 2011 | heuristicandrew

This code shows how to easily plot a beautiful confidence interval diagram in R. First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from … Continue reading → [Read more...]

Paired sample t-test in R

September 28, 2011 | heuristicandrew

Let’s walk through using R and Student’s t-test to compare paired sample data. The book Statistics: The Exploration & Analysis of Data (6th edition, p505) presents the longitudinal study “Bone mass is recovered from lactation to postweaning in adolescent mothers … Continue reading → [Read more...]

Basic line chart with ggplot2

September 27, 2011 | heuristicandrew

ggplot2 is a package for R which easily draws plots that are easier on the eyes than R’s built-in plotting functions, though the grammar is different than what is commonly used in R. This code demonstrates how to prepare a … Continue reading → [Read more...]

Two browsers for R help documentation

June 29, 2011 | heuristicandrew

The same excellent documentation for R commands is available through two different help browsers: text and HTML, and let’s see how how each looks, works, and how to switch the default. Look and feel Here is how both look for … Continue reading → [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)