# Prediction model with HANA and R

January 14, 2012
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

These days, I have been reading and playing a lot with R, and I really come to love it…of course, I don’t have a clue on those weird statistics formulas, but it doesn’t mean I can’t use R and try do some awesome stuff with it.

So, yesterday I was thinking about doing another integration between HANA and R, my new adopted kids, so I came with the idea of building a prediction model for a flight company. I followed this steps.

1.- First, I need to choose a table, so I picked SNVOICE:

This table offers us, the carrier id, the date and book id, meaning the amount of tickets sold in a particular day. And from here when can do some calculation and determine how many tickets were sold in each month of a particular year.

2.- I needed a table to store my new information, so I created the table TICKETS_BY_YEAR:

3.- I needed a Procedure script to analyse the table, determine the total amount per day of the month and then gave a grand total per month.

`CREATE PROCEDURE GetTicketsByMonth (IN var_year NVARCHAR(4),IN var_carrid NVARCHAR(2))LANGUAGE SQLSCRIPT ASv_found NVARCHAR(2) := 1;sum_bookid INT;v_date NVARCHAR(8) := '';BEGIN                TT_MONTH = select fldate, count(bookid) as "BOOKID"           from sflight.snvoice            where year(fldate) = VAR_YEAR and carrid = VAR_CARRID           group by fldate           order by fldate asc;v_date := (:var_year * 10000) + 101;while :v_found <= 12 do select sum(bookid) into sum_bookidfrom :TT_MONTHwhere month(fldate) = :v_found;insert into TICKETS_BY_YEARvalues(v_date,sum_bookid);v_date := :v_date + 100;v_found := :v_found + 1;end while;           END;`

4.- Of course…I needed to call my Procedure…

`CALL P075400.GetTicketsByMonth('2011','''AA''');`

5.- Once finished, I checked my table to see if everything worked as expected…

6.- After realizing that my data was nice and clean, I exported to an .CSV file (Sorry…no pics this time…I already post it in a previous blog)

7.- I went to my R Studio and start coding…

`Flight_Tickets=read.csv(file="Flight_Tickets.csv",header=TRUE)period=Flight_Tickets\$PERIODtickets=Flight_Tickets\$TICKETSvar_year=substr(period[1],1,4)var_year=as.integer(var_year)var_year=var_year+1var_year=as.character(var_year)new_period=gsub("^\\d{4}",var_year,period)next_year=data.frame(year=new_period,StringsAsFactors=FALSE)prt.lm=lm(tickets ~ period)pred=predict(prt.lm,next_year,interval="none")plot(tickets,type="b",     col="red",     main="Annual Tickets Sale",     xlab="Months",ylab="Tickets")lines(pred,type="b",col="blue")legend("bottomleft",inset=.05,title="Real vs. Predicted",       c("Real","Predicted"),       lty=c(1,1),col=c("red","blue"))`

8.- I watch my generated graphic showing the real tickets sale vs. the predicted tickets sale. The real is for every month of 2011 and the predicted for every month of 2012.

9.- Nothing to do here…it’s done -:)

10.- See you next time with more HANA, R or another nice technology.

Greetings,

Blag.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.