(This article was first published on

**Advanced Analytics Blog by Scott Mutchler**, and kindly contributed to R-bloggers)In this first installment, I'm going to focus on:

The example used throughout this 3 part series is centered around an eCommerce site. We are going to look at the spend associated with promotions. The mix of promotions (expressed as

Let's look at the code:

# load the data from a CSV

SRC_PATH <- '/analytics/margin_model/'

data <- read.csv(file=paste(SRC_PATH,'margin_modeling.csv',sep=''), header=TRUE)

# split the data 80% train/20% test

sample_idx <- sample(nrow(data), nrow(data)*0.8)

data_train <- data[sample_idx, ]

data_test <- data[-sample_idx, ]

# create a linear model using the training partition

gm_pct_model <- lm(GROSS_MARGIN_RATE ~ PROMO_AFFILIATE_UNITS + PROMO_COMP_SHOP_ENGINES_UNITS + PROMO_DISPLAY_ADS_UNITS + PROMO_EMAIL_UNITS + PROMO_LOCAL_SEM_UNITS + PROMO_SEARCH_ENG_MKT_UNITS + PROMO_TELESALES_UNITS + PROMO_UNPAID_UNITS,

# save the model to disk

save(gm_pct_model, file=paste(SRC_PATH,'gm_pct_model.model',sep=''))

# load the model back from disk (prior variable name is restored)

load(paste(SRC_PATH,'gm_pct_model.model',sep=''))

# score the test data and plot pred vs. obs

plot(data.frame('Predicted'=predict(gm_pct_model, data_test), 'Observed'=data_test$GROSS_MARGIN_PCT))

# score the test data and append it as a new column (for later use)

new_data <- cbind(data_test,'PREDICTED_GROSS_MARGIN_PCT'=predict(gm_pct_model, data_test))

# score an individual row

predicted_gm_rate <- predict(gm_pct_model, data_test[1,])

It's amazing how little code it takes to automate the modeling and scoring process. Next, I'll show you how to perform non-linear optimization of these predictive models to determine the optimal promotional mix.

- Building/evaluating a predictive model with partitioned data
- Saving the predictive model to disk
- Loading the predictive model from disk
- Scoring data against a predictive model (within R)

The example used throughout this 3 part series is centered around an eCommerce site. We are going to look at the spend associated with promotions. The mix of promotions (expressed as

**percent of total**promotional spend) is the input to the model. The outputs of the model are AOV (average order value), gross margin % and conversion rate. The goal is to maximize AOV, gross margin % and conversion rate with the best mix of promotional spend.Let's look at the code:

# load the data from a CSV

SRC_PATH <- '/analytics/margin_model/'

data <- read.csv(file=paste(SRC_PATH,'margin_modeling.csv',sep=''), header=TRUE)

# split the data 80% train/20% test

sample_idx <- sample(nrow(data), nrow(data)*0.8)

data_train <- data[sample_idx, ]

data_test <- data[-sample_idx, ]

# create a linear model using the training partition

gm_pct_model <- lm(GROSS_MARGIN_RATE ~ PROMO_AFFILIATE_UNITS + PROMO_COMP_SHOP_ENGINES_UNITS + PROMO_DISPLAY_ADS_UNITS + PROMO_EMAIL_UNITS + PROMO_LOCAL_SEM_UNITS + PROMO_SEARCH_ENG_MKT_UNITS + PROMO_TELESALES_UNITS + PROMO_UNPAID_UNITS,

**data_train**)# save the model to disk

save(gm_pct_model, file=paste(SRC_PATH,'gm_pct_model.model',sep=''))

# load the model back from disk (prior variable name is restored)

load(paste(SRC_PATH,'gm_pct_model.model',sep=''))

# score the test data and plot pred vs. obs

plot(data.frame('Predicted'=predict(gm_pct_model, data_test), 'Observed'=data_test$GROSS_MARGIN_PCT))

# score the test data and append it as a new column (for later use)

new_data <- cbind(data_test,'PREDICTED_GROSS_MARGIN_PCT'=predict(gm_pct_model, data_test))

# score an individual row

predicted_gm_rate <- predict(gm_pct_model, data_test[1,])

It's amazing how little code it takes to automate the modeling and scoring process. Next, I'll show you how to perform non-linear optimization of these predictive models to determine the optimal promotional mix.

To

**leave a comment**for the author, please follow the link and comment on his blog:**Advanced Analytics Blog by Scott Mutchler**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...