AutoComplete Using Google App Engine and YUI in two parts (part 2)

[This article was first published on novyden, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Part 2: Autocomplete with YUI3 and GAE RPC Handlers


This is second and final part of 2-part series on implementing AutoComplete with GAE. Recall that in part 1 we built a foundation for keyword autocomplete lookup service for our document search application. Both the service itself and its HTML/JavaScript client will materialize below.

Let’s for a moment switch to JavaScript side where I intend to use YUI3 AutoComplete. It supports variety of sources to query for available auto-complete choices including XHR (XMLHttpRequest) style and JSONP style URL sources. While working within bounds of the same application XHR URL will suffice the simplicity of both YUI widget and GAE RPC service support will let us do both with almost no extra work (JSONP service allows access from third-party web site pages which should not be taken lightly for security concerns).

The choice of YUI3 widget versus other libraries such as jQuery with its Autocomplete plugin is not important as one can swap plugins with few lines of JavaScript. YUI3 Library offers rich set of built-in options, wide variety of other compatible widgets, utilities, infrastructure and its API resembles jQuery now (I believe Yahoo stole – not copied – according to Picasso).

Great article by Paul Peavyhouse contains building blocks for the RPC handlers in GAE. We begin with RPCHandler class:

class RPCHandler(webapp.RequestHandler):
    """ Allows the functions defined in the RPCMethods class to be RPCed."""

    def __init__(self):
        webapp.RequestHandler.__init__(self)
        self.methods = RPCMethods()

    def get(self):
        func = None

        action = self.request.get('action')
        if action:
            if action[0] == '_':
                self.error(403) # access denied
                return
            else:
                func = getattr(self.methods, action, None)

        if not func:
            self.error(404) # file not found
            return        

        args = ()
        while True:
            key = 'arg%d' % len(args)
            val = self.request.get(key)
            if val:
                args += (val,)
            else:
                break
            
        # Checking if result is cached
        cache_key = action + ';' + ';'.join(args)
        result = memcache.get(cache_key) #@UndefinedVariable
        
        # Query if it's not
        if result is None:
            result = func(*args)
            memcache.add(cache_key, result, 900) #@UndefinedVariable
            
        return_data = self.prepare_result(result)
        self.response.out.write(return_data)
        self.response.headers['Content-Type'] = "application/json"

Actually RPCHandler takes care of roughly 90% of the job:
  • it retrieves action from request and matches it to appropriate RPC method from RPCMethods class via reflection (lines: 4-6 and 9-21)
  • it extracts service parameters from the request (parameter names matching argN) to pass to RPC method (lines: 23-30)
  • it forms a key to cache this call and checks if it’s already available from memcache (lines: 32-43)
  • it calls RPC method and saves results in cache (lines: 36-39)
  • it formats results and sends them back to a client (lines: 41-43)

RPCHandler is an abstract class – concrete handlers extend using template method pattern : single abstract method prepare_result lets us have both XHR and JSONP style handlers:

class JSONPHandler(RPCHandler):
    
    def prepare_result(self, result):
        callback_name = self.request.get('callback')
        json_data = simplejson.dumps(result)
        return_data = callback_name + '(' + json_data + ');'
        return return_data
    
class XHRHandler(RPCHandler):
    
    def prepare_result(self, result):
        json_data = simplejson.dumps(result)
        return json_data

While XHRHandler formats data in JSON, JSONPHandler adds callback function to reply as expected by JSONP client (on top of generated JSON). Django provided simplejson encoder implementation imported from django.utils is part of App Engine environment.

With RPC plumbing done class RPCMethods does actual work: its method for keyword autocomplete action is ac_keywords (later you can offer more services by adding methods in RPCMethods):

class RPCMethods:
    
    def ac_keywords(self, *args):
        prefix = args[0]
        limit = int(args[1])
        
        query = Query(Keyword, keys_only=True)        
        query.filter('words >=', prefix)
        query.filter('words <=', unicode(prefix) + u"\ufffd")
        
        keyword_keys = query.fetch(limit, 0)
        result = map(lambda x: x.name(), keyword_keys) 
        return result
The method ac_keywords executes a search that matches all keywords starting with prefix and returns normalized version of corresponding keyword using retrieved key. In first part we called this approach embedded RIE exactly for this reason: retrieving key as data using search over string list property.
Now that everything is ready on GAE side (well, almost: last piece of code I left for the very end), we can implement take care of the html inside the browser. I start with defining a form containing input field to enter keywords:
<form action="https://r-bloggers.com/phplist/?p=subscribe&id=1" method="post" target="popupwindow">
<p>
<input id="search_field" class="search" type="text" name="search_field" 
               value="Enter keywords..." 
               onfocus="if(!this._haschanged){this.value=''};this._haschanged=true;"/>
        <input name="search" type="image" src="images/search.png" 
               alt="Search" title="Search" />
    </p>
</form>
New empty input box will contain phrase Enter keywords... that disappears as soon as user focuses on the filed:

With auto-complete enabled it will look like this:


Adding YUI3 AutoComplete plugin is just few lines of JavaScript that also include extra customization to control highlighting, filtering, and delimiter for matching words (far from all options available to tune this plugin for one's needs):

...

    //Create a new YUI instance and populate it with the required modules.
    YUI().use('autocomplete', 'autocomplete-highlighters', 'autocomplete-filters', function (Y) {
      // Add the yui3-skin-sam class to the body so the default
      // AutoComplete widget skin will be applied.
      Y.one('body').addClass('yui3-skin-sam');
      
      // AutoComplete on search input field
      Y.one('#search_field').plug(Y.Plugin.AutoComplete, {
        resultHighlighter: 'phraseMatch',
        resultFilters: 'charMatch',
        queryDelimiter: ' ',
        // source: '/rpc.jsonp?action=ac_keywords&arg0={query}&arg1=20&callback={callback}',
        source: '/rpc.xhr?action=ac_keywords&arg0={query}&arg1=20',
        minQueryLength: 1
      }); 
    });

I used queryDelimiter to activate autocomplete for each word user enters: feel free to play and change these and other attributes plugin offers. The line with source that is commented out defines source URL for JSONP style service, while the line with source left active is for XHR URL.

Finally, last piece of server-side Python code that enables both URLs using webapp framework for Python (in main.py):

application = webapp.WSGIApplication(
            [( '/',                 MainHandler),
             ...
             ( '/rpc.jsonp',        JSONPHandler),
             ( '/rpc.xhr',          XHRHandler)
            ])
    
run_wsgi_app(application)
 

I wanted to finish emphasizing how efficient this solution is. YUI3 AutoComplete plugin caches responses from JSONP and XHR URL sources automatically based on the query value for the duration of the pageview. Python RPC services implemented in GAE use memcache automatically and transparently to cache results for each action type and query value. Finally, querying for matching keywords in the datastore uses key only queries which are least expensive. Given that autocomplete feature on a busy site must be quite popular all these will contribute to both performance and savings on GAE.

To leave a comment for the author, please follow the link and comment on their blog: novyden.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)