Friday fun with: Google Trends

[This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Some years ago, Google discovered that when people are concerned about influenza, they search for flu-related information and that to some extent, search traffic is an indicator of flu activity. Google Flu Trends was born.

bronchitis

Google Trends: bronchitis

Illness is sweeping through our department this week and I have succumbed. It’s not flu but at one point, I did wonder if my symptoms were those of bronchitis. Remembering Google Flu Trends, I thought I’d try my query for “bronchitis” at Google Trends, where I saw the chart shown at right.

Interesting. Clearly seasonal, peaking around the latest and earliest months of each year. Winter, for those of you in the northern hemisphere.

Next:

  • select USA and Australia as regions
  • download the data in CSV format (I chose fixed scaling), rename files “us.csv” and “aus.csv”
  • edit the files a little to retain only the “Week, bronchitis, bronchitis (std error)” section

Fire up your R console and try this:

library(ggplot2)
us <- read.table("us.csv", header = T, sep = ",")
aus <- read.table("aus.csv", header = T, sep = ",")
# add a region column
us$region <- "usa"
aus$region <- "aus"
# combine data
alldata <- rbind(us, aus)
# add a date column
alldata$week <- strptime(alldata$Week, format = "%b %d %Y")
# and plot the non-zero values
ggplot(alldata[alldata$bronchitis > 0,], aes(as.Date(week), bronchitis)) + geom_line(aes(color = region)) + xlab("Date")

bronchitis2

Google Trends: bronchitis, USA + Australia

Result shown at right: click for the full-size version.

That’s not unexpected, but it’s rather nice. In the USA peak searches for “bronchitis” coincide with troughs in Australia and vice-versa. The reason, of course, is that search peaks for both regions during winter, but winter in the USA (northern hemisphere) occurs during the southern summer (and again, vice-versa).

There must be all sorts of interesting and potentially useful information buried away in web usage data. I guess that’s why so many companies are investing in it. However, for those of us more interested in analysing data than marketing – what else is “out there”? Can we “do science” with it? How many papers are published using data gathered only from the Web?


Filed under: google, R, statistics Tagged: google trends, health

To leave a comment for the author, please follow the link and comment on their blog: What You're Doing Is Rather Desperate » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)