Product revenue prediction with R – part 3

October 8, 2012

(This article was first published on Tatvic Blog » R, and kindly contributed to R-bloggers)

style="text-align: justify">After development and improvement  of predictive model with R (as in the previous href="" >blog), I have focused here about making a prediction with the R model ( linear regression model ) and comparison with the Google prediction API model. In statistical modeling, R will calculate intercept and variable coefficients to describe the relationship between a response variable and the explanatory variables. And that model will use this relation for making the prediction purpose.

The statistical relation can be described by the following formula (which is derived from href="" >this model ),

style="text-align: justify">Productrevenue = 39.92 – 0.0078 * (xcartaddtotalrs_out) – 34.10 * (xcartremove_out) + 12.480 * (xprodviews_out) – 13.50 * (xuniqprodview_out) + 0.00037 * (xprodviewinrs_out)

Therefor in R, there is predict() function of making a prediction with this relation. But for making prediction we require input data. We will store request data to input_data. Now, suppose we have an input dataset ( href="" >description ) like

  • xcartaddtotalrs_out = 0
  • xcartremove_out = 0
  • xproductviews_out = 47
  • xuniqprodview_out = 38
  • xprodviewinrs_out = 5828
> input_data <- data.frame(xcartaddtotalrs_out=0, xcartremove_out=0, xprodviews_out=47,  xuniqprodview_out=38, xprodviewinrs_out=5828)

This means we want to predict the transactional product revenue on the base of  xcartaddtotalrs, xcartremove, xproductviews, xuniqprodview and xprodviewinrs. Now, we will make a prediction by predict function and model_out prediction model (which we have already developed in href="">Product revenue prediction with R – part 2).

> predict(model_out,input_data,type="response")

style="text-align: justify">We style="text-align: justify"> can do style="text-align: justify">the same prediction activity style="text-align: justify"> on google prediction API with less effort. When we processed the same href="" >dataset with google prediction API for predictive modeling, it’s model summary would look like

style="text-align: center"> href=""> class="aligncenter wp-image-3293" src="" alt="" width="700" height="270" />

Let’s identify above attributes from prediction result. The id is unique model identity information for model identification. In model information, there are numberInstances which describes total numbers of rows are 4061 in the dataset, modelType attribute describes the type of model (either regression or categorical) which is regression and meanSquaredError which is 1606123.17.  Root of mean squared error is the cost of this regression model in the Google Prediction API model which is 1267.33.

In R, our href="" >predictive model has cost 656.4 which is lower than the Google Prediction API. The reason behind the reduced cost is variable selection and removal of outliers from our dataset. I style="text-align: justify">n style="text-align: justify"> R,  style="text-align: justify">we style="text-align: justify"> can improve our prediction accuracy with improving model as well as improving our style="text-align: justify">dataset style="text-align: justify"> quality as per model type. But in google prediction API, we can improve prediction accuracy only by dataset quality, we can’t update model.

style="text-align: justify">Don’t think this stuff is more complex, it’s pretty interesting once you are used to developing it. To start learning this predictive modeling, just start with rough implementation and improve step by step as per your requirement. If you need to do it yourself you can href="" onclick="_gaq.push(['_trackEvent','Downloads','Product Revenue-r','Blog',,1]);">download this R code + sample dataset. In next of my blog-  title="Product revenue prediction with Prediction API" href="" >Product revenue prediction with Prediction API, I will discuss about generating prediction with Google Prediction API with more description.

style="text-align: justify">Want us to help you implement or analyze the data for your visitors.  href="">Contact us

class="wp-about-author-containter-top" style="background-color:#FFEAA8;"> class="wp-about-author-pic"> src="" alt="Vignesh Prajapati" width="60" class="photo" />

href='' title='Vignesh Prajapati'>Vignesh Prajapati

Vignesh is Data Engineer at Tatvic. He loves to play with opensource playground to make predictive solution on Big data with R, Hadoop and Google Prediction API.
Google Plus profile: href="">Vignesh Prajapati

align="right" style="float: right; clear:left; padding: 0px 5px 0px 7px;"> name="fb_share" type="box_count" share_url="">

To leave a comment for the author, please follow the link and comment on his blog: Tatvic Blog » R. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Top 3 Posts from the past 2 days

Top 9 articles of the week

  1. In-depth introduction to machine learning in 15 hours of expert videos
  2. Scatterplots
  3. Installing R packages
  4. Using apply, sapply, lapply in R
  5. The Single Most Important Skill for a Data Scientist
  6. Basics of Histograms
  7. R at Microsoft
  8. Everyone loves R markdown and Github; stories from the R Summit, day two
  9. DotCity: a game written in R? and other statistical computer games?