Predicting Marketing Campaign with R

[This article was first published on Cloud.. Analytics and CRM, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last blog I created a mechanism to fetch data from Salesforce using rJava and SOQL. In this blog I am going to use that mechanism to fetch ad campaign data from salesforce and predict future ad campaign sales using R

Let us assume that Salesforce has campaign data for last eight quarters.  This data is Total Sales generated by Newspaper, TV and Online ad campaigns and associated expenditure as follows:

Sales            Newspaper   TV           Online

1 16850           1000           500           1500

2 12010             500           500             500

3 14740           2000           500             500

4 13890          1000          1000           1000

5 12950          1000            500             500

6 15640            500          1000           1000

7 14960          1000          1000           1000

8 13630            500          1500            500

Thus, quarter# 1 indicates that $1000, $500 and $1500 were spent on Newspaper, TV and Online ad campaigns respectively and total sales during that quarter was $16,850.

First step is find out if there is any relationship with sales and advertising expenditure. The tool I am going to use is Regression Analysis. In order to perform regression analysis, I am going to fetch data using rJava as follows:

library(rJava)  # Load rjava library
.jnit()         # Initialize java

sfObj=.jnew("SalesforceHelper") # Instantiate java object
CampaignVector=sfObj$queryObject("SELECT Sales__c,Newspaper__c,TV__c,Online__c from CampaignData__c") 

Campaigndata<-getSFDataFrame(CampaignVector) # Convert vector to R data frame

(For more information on how to integrate R and Salesforce, please refer my previous blog at:

Let us ask R to perform regression analysis on Campaigndata using lm() function as follows:

Campaignmodel<-lm(Sales~Newspaper+TV+Online) # perform regression

After performing regression analysis, R is going to give me relationship in form of following equation:

Total Sales = (sales with no advertising) + (newspaper contribution per dollar*newspaper expenditure)+(TV contribution per dollar*TV expenditure)+(Online contribution per dollar*Online expenditure)

sales with no advertising is called Intercept while each contribution is called coefficient.

R will also gives information on how meaningful or strong this relationship is, with R^2(R squared).
As you can see that campaign manager will be interested to know per dollar contribution by each adverting medium. In other words, how much sales will be generated for each dollar of expenditure.

Let us find out this information from our model

> Campaignmodel

lm(formula = Sales ~ Newspaper + TV + Online)

(Intercept)    Newspaper           TV       Online  
  9561.4286       1.2465       0.9193       3.5161  

Sales without advertising (Rounded) = $9562
Newspaper returns = $1.25 per $1
TV returns = $0.92 per $1
Online returns““= $3.52 per $1 of expenditure. (Clearly a winner)

But how strong is the model? Let us find out

> summary(Campaignmodel)

lm(formula = Sales ~ Newspaper + TV + Online)

       1        2        3        4        5        6        7        8 
  308.32  -392.36   467.95 -1353.29   -75.59  1019.94  -283.29   308.32 

                    Estimate        Std.   Error     t value           Pr(>|t|)   
(Intercept)        9561.4286         1700.5869        5.622             0.00492 **
Newspaper             1.2465            0.8100        1.539             0.19865   
TV                    0.9193            1.0766        0.854             0.44126   
Online                3.5161            0.9584        3.669             0.02141 * 
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 938.2 on 4 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared: 0.7879,     Adjusted R-squared: 0.6289 
F-statistic: 4.954 on 3 and 4 DF,  p-value: 0.0781 

Look at the t-values for each advertising medium. t value more than 2 is strong. Again Online advertising has strongest relationship with sales while TV has the weakest. Also Multiple R-squared of 0.7879 indicates that there is 79% probability with respect to predictability of the model.

Obviously I am not expecting real world campaign or account managers to analyze R output. So, I created one related custome object called Campaign_predictor__c with custom fields as follows:

Sales_without_ad__c NUMERIC initialized to 9562.00,
Newspaper_expenditure__c NUMERIC,
TV_expenditure__c NUMERIC,
Online_expenditure__c NUMERIC
Predicted_sales__c FORMULA = (Sales_without_ad__c)+(3.52*Online_expenditure__c)+(1.25*Newspaper_expenditure__c)+(0.92*TV_expenditure__c)
Prediction_probability__c = 78

Now the managers have to just plug in the values and predict the sales. Suppose the manager has $3000 to spend on ad campaigns and based on model decides to allocate $2000 to Online $500 to Newspaper and $500 to TV. The predicted sales with 78% probability is:

$9562 + (3.53*2000)+(1.25*500)+(0.92*500) = $17,707

Thus we can move complex predictive analytics from the realm of super specialists and statisticians to marketing and sales managers using R and

To leave a comment for the author, please follow the link and comment on their blog: Cloud.. Analytics and CRM. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)