didYouMean() Function: Using Google to correct errors in Strings

May 22, 2014
By

(This article was first published on sweissblaug, and kindly contributed to R-bloggers)

A function that will take a String as an input and return the "Did you mean.." or "Showing Results for.." from google.com. Good for misspelled names or locations.


library(RCurl)
didYouMean=function(input){
  input=gsub(" ", "+", input)
  doc=getURL(paste("https://www.google.com/search?q=",input,"/", sep=""))
  
  
  dym=gregexpr(pattern ='Did you mean',doc)
  srf=gregexpr(pattern ='Showing results for',doc)
  
  
  if(length(dym[[1]])>1){
    doc2=substring(doc,dym[[1]][1],dym[[1]][1]+1000)
    s1=gregexpr("?q=",doc2)
    s2=gregexpr("/&",doc2)
    new.text=substring(doc2,s1[[1]][1]+2,s2[[1]][1]-1)
    return(gsub("[+]"," ",new.text))
    break
  }
  
  else if(srf[[1]][1]!=-1){
    doc2=substring(doc,srf[[1]][1],srf[[1]][1]+1000)
    s1=gregexpr("?q=",doc2)
    s2=gregexpr("/&",doc2)
    new.text=substring(doc2,s1[[1]][1]+2,s2[[1]][1]-1)
    return(gsub("[+]"," ",new.text))
    break
  }
  else(return(gsub("[+]"," ",input)))
}  

So didYouMean("gorecge washington") returns "george washington"


Works well with misspelled companies or nouns or phrases. For example; you're doing text analysis on twitter and a customer raves about Carlsburg beer. Only problem is he's enjoying their product while tweeting (something that happens only rarely, I'm sure) and wrote "clarsburg gprou". Not to worry!

> didYouMean("clarsburg gprou")
[1] "carlsberg group"

Or suppose you have a 3 phase plan for profits. This can help you get there!

didYouMean("clletc nuderpants")
[1] "collect underpants"

To leave a comment for the author, please follow the link and comment on his blog: sweissblaug.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.