# Part 1 of 3: Building/Loading/Scoring Against Predictive Models in R

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Advanced Analytics Blog by Scott Mutchler**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this first installment, I’m going to focus on:

The example used throughout this 3 part series is centered around an eCommerce site. We are going to look at the spend associated with promotions. The mix of promotions (expressed as

Let’s look at the code:

# load the data from a CSV

SRC_PATH <- '/analytics/margin_model/'

data <- read.csv(file=paste(SRC_PATH,'margin_modeling.csv',sep=''), header=TRUE)

# split the data 80% train/20% test

sample_idx <- sample(nrow(data), nrow(data)*0.8)

data_train <- data[sample_idx, ]

data_test <- data[-sample_idx, ]

# create a linear model using the training partition

gm_pct_model <- lm(GROSS_MARGIN_RATE ~ PROMO_AFFILIATE_UNITS + PROMO_COMP_SHOP_ENGINES_UNITS + PROMO_DISPLAY_ADS_UNITS + PROMO_EMAIL_UNITS + PROMO_LOCAL_SEM_UNITS + PROMO_SEARCH_ENG_MKT_UNITS + PROMO_TELESALES_UNITS + PROMO_UNPAID_UNITS,

# save the model to disk

save(gm_pct_model, file=paste(SRC_PATH,’gm_pct_model.model’,sep=”))

# load the model back from disk (prior variable name is restored)

load(paste(SRC_PATH,’gm_pct_model.model’,sep=”))

# score the test data and plot pred vs. obs

plot(data.frame(‘Predicted’=predict(gm_pct_model, data_test), ‘Observed’=data_test$GROSS_MARGIN_PCT))

# score the test data and append it as a new column (for later use)

new_data <- cbind(data_test,'PREDICTED_GROSS_MARGIN_PCT'=predict(gm_pct_model, data_test))

# score an individual row

predicted_gm_rate <- predict(gm_pct_model, data_test[1,])

It’s amazing how little code it takes to automate the modeling and scoring process. Next, I’ll show you how to perform non-linear optimization of these predictive models to determine the optimal promotional mix.

- Building/evaluating a predictive model with partitioned data
- Saving the predictive model to disk
- Loading the predictive model from disk
- Scoring data against a predictive model (within R)

The example used throughout this 3 part series is centered around an eCommerce site. We are going to look at the spend associated with promotions. The mix of promotions (expressed as

**percent of total**promotional spend) is the input to the model. The outputs of the model are AOV (average order value), gross margin % and conversion rate. The goal is to maximize AOV, gross margin % and conversion rate with the best mix of promotional spend.Let’s look at the code:

# load the data from a CSV

SRC_PATH <- '/analytics/margin_model/'

data <- read.csv(file=paste(SRC_PATH,'margin_modeling.csv',sep=''), header=TRUE)

# split the data 80% train/20% test

sample_idx <- sample(nrow(data), nrow(data)*0.8)

data_train <- data[sample_idx, ]

data_test <- data[-sample_idx, ]

# create a linear model using the training partition

gm_pct_model <- lm(GROSS_MARGIN_RATE ~ PROMO_AFFILIATE_UNITS + PROMO_COMP_SHOP_ENGINES_UNITS + PROMO_DISPLAY_ADS_UNITS + PROMO_EMAIL_UNITS + PROMO_LOCAL_SEM_UNITS + PROMO_SEARCH_ENG_MKT_UNITS + PROMO_TELESALES_UNITS + PROMO_UNPAID_UNITS,

**data_train**)# save the model to disk

save(gm_pct_model, file=paste(SRC_PATH,’gm_pct_model.model’,sep=”))

# load the model back from disk (prior variable name is restored)

load(paste(SRC_PATH,’gm_pct_model.model’,sep=”))

# score the test data and plot pred vs. obs

plot(data.frame(‘Predicted’=predict(gm_pct_model, data_test), ‘Observed’=data_test$GROSS_MARGIN_PCT))

# score the test data and append it as a new column (for later use)

new_data <- cbind(data_test,'PREDICTED_GROSS_MARGIN_PCT'=predict(gm_pct_model, data_test))

# score an individual row

predicted_gm_rate <- predict(gm_pct_model, data_test[1,])

It’s amazing how little code it takes to automate the modeling and scoring process. Next, I’ll show you how to perform non-linear optimization of these predictive models to determine the optimal promotional mix.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Advanced Analytics Blog by Scott Mutchler**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.