Google Insights and RCurl

December 20, 2010
By

(This article was first published on Dan Knoepfle's Blog, and kindly contributed to R-bloggers)

Google Insights is nifty. If you’re logged in to your Google account, you can download the results as a CSV file. This is straightforward if you’re using a browser; if you’re trying to retrieve the results of queries using R, however, things get more complicated.

The following code retrieves the results of a Google Insights search for “Sarah Palin” as a data.frame. It uses the RCurl package to do all of the hard work.

username <- "[email protected]"
password <- "password_here"

loginURL <- "https://accounts.google.com/accounts/ServiceLogin"
authenticateURL <- "https://accounts.google.com/accounts/ServiceLoginAuth"

require(RCurl)

ch <- getCurlHandle()

curlSetOpt(curl = ch,
            ssl.verifypeer = FALSE,
            useragent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13",
            timeout = 60,
            followlocation = TRUE,
            cookiejar = "./cookies",
            cookiefile = "./cookies")


## do Google Account login
loginPage <- getURL(loginURL, curl = ch)

require(stringr)
galx.match <- str_extract(string = loginPage,
                          pattern = ignore.case('name="GALX"\\s*value="([^"]+)"'))
galx <- str_replace(string = galx.match,
                    pattern = ignore.case('name="GALX"\\s*value="([^"]+)"'),
                    replacement = "\\1")

authenticatePage <- postForm(authenticateURL, .params = list(Email = username, Passwd = password, GALX = galx), curl = ch)


## get Google Insights results CSV
insightsURL <- "http://www.google.com/insights/search/overviewReport"
resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch)

if(isTRUE(unname(attr(resultsText, "Content-Type")[1] == "text/csv"))) {
  ## got CSV file

  ## create temporary connection from results
  tt <- textConnection(resultsText)

  resultsCSV <- read.csv(tt, header = FALSE)

  ## close connection
  close(tt)
} else {
  ## something went wrong

  ## probably need to log in again?

}

download ‘Google Insights.R’ from gist.github.com

I don’t have much else to say about this, but I hope that it will be helpful to someone.

You can change the query to incorporate geographic restrictions or such by adding the parameters that appear in the URL when you change your search through the Google Insights web search; for instance, a basic search for “QUERY” gives URL http://www.google.com/insights/search/#q=QUERY&cmpt=q whereas the same search restricted to the state of New York has URL http://www.google.com/insights/search/#q=QUERY&geo=US-NY&cmpt=q; the added parameter is “geo=US-NY”. To incorporate this into the script, change

resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch)

to have the additional parameter in the .params list:

resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", geo = "US-NY", content = 1, export = 1), curl = ch)

[Updated 2012-04-24]

To leave a comment for the author, please follow the link and comment on his blog: Dan Knoepfle's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.