Product revenue prediction with R – part 3

October 8, 2012

(This article was first published on Tatvic Blog » R, and kindly contributed to R-bloggers)

After development and improvement  of predictive model with R (as in the previous blog), I have focused here about making a prediction with the R model ( linear regression model ) and comparison with the Google prediction API model. In statistical modeling, R will calculate intercept and variable coefficients to describe the relationship between a response variable and the explanatory variables. And that model will use this relation for making the prediction purpose.

The statistical relation can be described by the following formula (which is derived from this model ),

Productrevenue = 39.92 – 0.0078 * (xcartaddtotalrs_out) – 34.10 * (xcartremove_out) + 12.480 * (xprodviews_out) – 13.50 * (xuniqprodview_out) + 0.00037 * (xprodviewinrs_out)

Therefor in R, there is predict() function of making a prediction with this relation. But for making prediction we require input data. We will store request data to input_data. Now, suppose we have an input dataset ( description ) like

  • xcartaddtotalrs_out = 0
  • xcartremove_out = 0
  • xproductviews_out = 47
  • xuniqprodview_out = 38
  • xprodviewinrs_out = 5828
> input_data <- data.frame(xcartaddtotalrs_out=0, xcartremove_out=0, xprodviews_out=47,  xuniqprodview_out=38, xprodviewinrs_out=5828)

This means we want to predict the transactional product revenue on the base of  xcartaddtotalrs, xcartremove, xproductviews, xuniqprodview and xprodviewinrs. Now, we will make a prediction by predict function and model_out prediction model (which we have already developed in Product revenue prediction with R – part 2).

> predict(model_out,input_data,type="response")

We can do the same prediction activity on google prediction API with less effort. When we processed the same dataset with google prediction API for predictive modeling, it’s model summary would look like

Let’s identify above attributes from prediction result. The id is unique model identity information for model identification. In model information, there are numberInstances which describes total numbers of rows are 4061 in the dataset, modelType attribute describes the type of model (either regression or categorical) which is regression and meanSquaredError which is 1606123.17.  Root of mean squared error is the cost of this regression model in the Google Prediction API model which is 1267.33.

In R, our predictive model has cost 656.4 which is lower than the Google Prediction API. The reason behind the reduced cost is variable selection and removal of outliers from our dataset. In R, we can improve our prediction accuracy with improving model as well as improving our dataset quality as per model type. But in google prediction API, we can improve prediction accuracy only by dataset quality, we can’t update model.

Don’t think this stuff is more complex, it’s pretty interesting once you are used to developing it. To start learning this predictive modeling, just start with rough implementation and improve step by step as per your requirement. If you need to do it yourself you can download this R code + sample dataset. In next of my blog- Product revenue prediction with Prediction API, I will discuss about generating prediction with Google Prediction API with more description.

Want us to help you implement or analyze the data for your visitors. Contact us

Vignesh Prajapati

Vignesh Prajapati

Vignesh is Data Engineer at Tatvic. He loves to play with opensource playground to make predictive solution on Big data with R, Hadoop and Google Prediction API.
Google Plus profile: Vignesh Prajapati

To leave a comment for the author, please follow the link and comment on their blog: Tatvic Blog » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)