Friday fun with: Google Trends

May 19, 2011

(This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers)

Some years ago, Google discovered that when people are concerned about influenza, they search for flu-related information and that to some extent, search traffic is an indicator of flu activity. Google Flu Trends was born.


Google Trends: bronchitis

Illness is sweeping through our department this week and I have succumbed. It’s not flu but at one point, I did wonder if my symptoms were those of bronchitis. Remembering Google Flu Trends, I thought I’d try my query for “bronchitis” at Google Trends, where I saw the chart shown at right.

Interesting. Clearly seasonal, peaking around the latest and earliest months of each year. Winter, for those of you in the northern hemisphere.


  • select USA and Australia as regions
  • download the data in CSV format (I chose fixed scaling), rename files “us.csv” and “aus.csv”
  • edit the files a little to retain only the “Week, bronchitis, bronchitis (std error)” section

Fire up your R console and try this:

us <- read.table("us.csv", header = T, sep = ",")
aus <- read.table("aus.csv", header = T, sep = ",")
# add a region column
us$region <- "usa"
aus$region <- "aus"
# combine data
alldata <- rbind(us, aus)
# add a date column
alldata$week <- strptime(alldata$Week, format = "%b %d %Y")
# and plot the non-zero values
ggplot(alldata[alldata$bronchitis > 0,], aes(as.Date(week), bronchitis)) + geom_line(aes(color = region)) + xlab("Date")


Google Trends: bronchitis, USA + Australia

Result shown at right: click for the full-size version.

That’s not unexpected, but it’s rather nice. In the USA peak searches for “bronchitis” coincide with troughs in Australia and vice-versa. The reason, of course, is that search peaks for both regions during winter, but winter in the USA (northern hemisphere) occurs during the southern summer (and again, vice-versa).

There must be all sorts of interesting and potentially useful information buried away in web usage data. I guess that’s why so many companies are investing in it. However, for those of us more interested in analysing data than marketing – what else is “out there”? Can we “do science” with it? How many papers are published using data gathered only from the Web?

Filed under: google, R, statistics Tagged: google trends, health

To leave a comment for the author, please follow the link and comment on their blog: What You're Doing Is Rather Desperate » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)