In my last blog I created a mechanism to fetch data from Salesforce using rJava and SOQL. In this blog I am going to use that mechanism to fetch ad campaign data from salesforce and predict future ad campaign sales using R
Let us assume that Salesforce has campaign data for last eight quarters. This data is Total Sales generated by Newspaper, TV and Online ad campaigns and associated expenditure as follows:
Sales Newspaper TV Online
1 16850 1000 500 1500
2 12010 500 500 500
3 14740 2000 500 500
4 13890 1000 1000 1000
5 12950 1000 500 500
6 15640 500 1000 1000
7 14960 1000 1000 1000
8 13630 500 1500 500
Thus, quarter# 1 indicates that $1000, $500 and $1500 were spent on Newspaper, TV and Online ad campaigns respectively and total sales during that quarter was $16,850.
First step is find out if there is any relationship with sales and advertising expenditure. The tool I am going to use is Regression Analysis. In order to perform regression analysis, I am going to fetch data using rJava as follows:
library(rJava) # Load rjava library .jnit() # Initialize java sfObj=.jnew("SalesforceHelper") # Instantiate java object CampaignVector=sfObj$queryObject("SELECT Sales__c,Newspaper__c,TV__c,Online__c from CampaignData__c") Campaigndata<-getSFDataFrame(CampaignVector) # Convert vector to R data frame
(For more information on how to integrate R and Salesforce, please refer my previous blog at: http://www.r-bloggers.com/r-and-salesforce/
Let us ask R to perform regression analysis on Campaigndata using lm() function as follows:
attach(Campaigndata) Campaignmodel<-lm(Sales~Newspaper+TV+Online) # perform regression
After performing regression analysis, R is going to give me relationship in form of following equation:
Total Sales = (sales with no advertising) + (newspaper contribution per dollar*newspaper expenditure)+(TV contribution per dollar*TV expenditure)+(Online contribution per dollar*Online expenditure)
sales with no advertising is called Intercept while each contribution is called coefficient.
R will also gives information on how meaningful or strong this relationship is, with R^2(R squared).
As you can see that campaign manager will be interested to know per dollar contribution by each adverting medium. In other words, how much sales will be generated for each dollar of expenditure.
Let us find out this information from our model
> Campaignmodel Call: lm(formula = Sales ~ Newspaper + TV + Online) Coefficients: (Intercept) Newspaper TV Online 9561.4286 1.2465 0.9193 3.5161 >
Sales without advertising (Rounded) = $9562
Newspaper returns = $1.25 per $1
TV returns = $0.92 per $1
Online returns““= $3.52 per $1 of expenditure. (Clearly a winner)
But how strong is the model? Let us find out
> summary(Campaignmodel) Call: lm(formula = Sales ~ Newspaper + TV + Online) Residuals: 1 2 3 4 5 6 7 8 308.32 -392.36 467.95 -1353.29 -75.59 1019.94 -283.29 308.32 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9561.4286 1700.5869 5.622 0.00492 ** Newspaper 1.2465 0.8100 1.539 0.19865 TV 0.9193 1.0766 0.854 0.44126 Online 3.5161 0.9584 3.669 0.02141 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 938.2 on 4 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.7879, Adjusted R-squared: 0.6289 F-statistic: 4.954 on 3 and 4 DF, p-value: 0.0781
Look at the t-values for each advertising medium. t value more than 2 is strong. Again Online advertising has strongest relationship with sales while TV has the weakest. Also Multiple R-squared of 0.7879 indicates that there is 79% probability with respect to predictability of the model.
Obviously I am not expecting real world campaign or account managers to analyze R output. So, I created one related custome object called Campaign_predictor__c with custom fields as follows:
Sales_without_ad__c NUMERIC initialized to 9562.00,
Predicted_sales__c FORMULA = (Sales_without_ad__c)+(3.52*Online_expenditure__c)+(1.25*Newspaper_expenditure__c)+(0.92*TV_expenditure__c)
Prediction_probability__c = 78
Now the managers have to just plug in the values and predict the sales. Suppose the manager has $3000 to spend on ad campaigns and based on model decides to allocate $2000 to Online $500 to Newspaper and $500 to TV. The predicted sales with 78% probability is:
$9562 + (3.53*2000)+(1.25*500)+(0.92*500) = $17,707
Thus we can move complex predictive analytics from the realm of super specialists and statisticians to marketing and sales managers using R and Salesforce.com