# Product revenue prediction with R – part 3

October 8, 2012
By

(This article was first published on Tatvic Blog » R, and kindly contributed to R-bloggers)

After development and improvement  of predictive model with R (as in the previous blog), I have focused here about making a prediction with the R model ( linear regression model ) and comparison with the Google prediction API model. In statistical modeling, R will calculate intercept and variable coefficients to describe the relationship between a response variable and the explanatory variables. And that model will use this relation for making the prediction purpose.

The statistical relation can be described by the following formula (which is derived from this model ),

Productrevenue = 39.92 – 0.0078 * (xcartaddtotalrs_out) – 34.10 * (xcartremove_out) + 12.480 * (xprodviews_out) – 13.50 * (xuniqprodview_out) + 0.00037 * (xprodviewinrs_out)

Therefor in R, there is predict() function of making a prediction with this relation. But for making prediction we require input data. We will store request data to input_data. Now, suppose we have an input dataset ( description ) like

• xcartremove_out = 0
• xproductviews_out = 47
• xuniqprodview_out = 38
• xprodviewinrs_out = 5828
> input_data <- data.frame(xcartaddtotalrs_out=0, xcartremove_out=0, xprodviews_out=47,  xuniqprodview_out=38, xprodviewinrs_out=5828)

This means we want to predict the transactional product revenue on the base of  xcartaddtotalrs, xcartremove, xproductviews, xuniqprodview and xprodviewinrs. Now, we will make a prediction by predict function and model_out prediction model (which we have already developed in Product revenue prediction with R – part 2).

> predict(model_out,input_data,type="response")
output
115.8346013

We can do the same prediction activity on google prediction API with less effort. When we processed the same dataset with google prediction API for predictive modeling, it’s model summary would look like

Let’s identify above attributes from prediction result. The id is unique model identity information for model identification. In model information, there are numberInstances which describes total numbers of rows are 4061 in the dataset, modelType attribute describes the type of model (either regression or categorical) which is regression and meanSquaredError which is 1606123.17.  Root of mean squared error is the cost of this regression model in the Google Prediction API model which is 1267.33.

In R, our predictive model has cost 656.4 which is lower than the Google Prediction API. The reason behind the reduced cost is variable selection and removal of outliers from our dataset. In R, we can improve our prediction accuracy with improving model as well as improving our dataset quality as per model type. But in google prediction API, we can improve prediction accuracy only by dataset quality, we can’t update model.

Don’t think this stuff is more complex, it’s pretty interesting once you are used to developing it. To start learning this predictive modeling, just start with rough implementation and improve step by step as per your requirement. If you need to do it yourself you can download this R code + sample dataset. In next of my blog- Product revenue prediction with Prediction API, I will discuss about generating prediction with Google Prediction API with more description.

### Vignesh Prajapati

Vignesh is Data Engineer at Tatvic. He loves to play with opensource playground to make predictive solution on Big data with R, Hadoop and Google Prediction API.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...