Positional bias — the tendency for users to preferentially select results in the first few positions of a search — is a big issue for all kinds of search engines. But for online travel site Orbitz the stakes are higher than for a traditional Web search engine: if a customer chooses the first-listed hotel in a search for accommodations, but will be dissatisfied with their stay, that means Orbitz will soon have an unhappy customer. So for Orbitz, a key problem was to optimize their hotel search results for customer satisfaction.
As Orbitz's Jonathan Seidman (Lead Engineer on the Intelligent Marketplace/Machine Learning Team) and Ramesh Venkataramaiah (Principal Engineer on the Operations and Engineering Team) revealed in presentations to the WindyCityDB and Hadoop World NYC conferences, Orbitz solves this problem by using R to perform statistical analysis on data stored in Hadoop and extracted with Hive.
After extracting data including customer hotel booking records and user ratings of hotels from Hive, the Orbitz team used statistical analysis to identify the best hotel to promote to the top of the list for each new booking. Ramesh reports that the statistical techniques included liner filtering of time series (via the filter function) and applied moving averages with equal weights. These models even allowed for seasonal trends to be incorporated into the recommendations — for example, the fact that longer hotel stays tend to be booked in the summer months, as shown by the red days in this calendar heat map:
This is another great example of applying advanced statistical and visualization techniques in R to large and complex data sets stored in a Hadoop environment. See the full slide deck for other analyses employed by the Orbitz team, including hexagonal binning charts to identify positional bias and kernel density estimation to model hotel ratings. As Ramesh says in the presentation, R has a "steep learning curve, but worth it!".
Slideshare.net: Using Hadoop and Hive to Optimize Travel Search