GTrendsR package to Explore Google trending for Field Dependent Terms

November 24, 2014
By

(This article was first published on TRinker's R Blog » R, and kindly contributed to R-bloggers)

My friend, Steve Simpson, introduced me to Philippe Massicotte and Dirk Eddelbuettel’s GTrendsR GitHub package this week. It’s a pretty nifty wrapper to the Google Trends API that enables one to search phrase trends over time. The trend indices that are given are explained in more detail here: https://support.google.com/trends/answer/4355164?hl=en

Ever have a toy you know is super cool but don’t know what to use it for yet? That’s GTrendsR for me. So I made up an activity to use it for, that’s related to my own interests (click HERE to download the just R code for this post). I decided to chose the first 10 phrases I could think of, related to my field, literacy. I then used GTrendsR to view how Google search trending has changed for these terms. Here are the 10 biased terms I choose:

  1. reading assessment
  2. common core
  3. reading standards
  4. phonics
  5. whole language
  6. lexile score
  7. balanced approach
  8. literacy research association
  9. international reading association
  10. multimodal

The last term did not receive enough hits to trend which is telling since the field is talking about multimodality but search trends don’t seem to be affect to the point of registering with Google Trends.


Getting Started

The GTrendsR package provides great tools for grabbing the information from Google, however, for my own task I wanted simpler tools to grab certain chunks of information easily and format them in a tidy way. So I built a small wrapper package, mostly for myself, that will likely remain a GitHub only package: https://github.com/trinker/gtrend

You can install it for yourself (We’ll use it in this post), and load all necessary packages via:

devtools::install_github("dvanclev/GTrendsR")
devtools::install_github("trinker/gtrend")
library(gtrend); library(dplyr); library(ggplot2); library(scales)

The Initial Search

When you perform the search with gtrend_scraper, you will need to enter your Google user name and password.

I did an initial search and plotted the trends for the 9 terms. It was a big, colorful, clustery mess.

terms <- c("reading assessment", "common core", "reading standards",
    "phonics", "whole language", "lexile score", "balanced approach",
    "literacy research association", "international reading association"
)

out <- gtrend_scraper("[email protected]", "password", terms)

out %>%
    trend2long() %>%
    plot() 

plot of chunk trend_mess

So I faceted each of the terms out to look at the trends.

out %>%
    trend2long() %>%
    ggplot(aes(x=start, y=trend, color=term)) +
        geom_line() +
        facet_wrap(~term) +
        guides(color=FALSE)

plot of chunk trend_facet

Some interesting patterns began to emerge. I noticed a repeated pattern in almost all of the educational terms which I thought interesting. First we’ll explore that. The basic shape wasn’t yet discernible and so I took a small subset of one term, reading+assessment, to explore the trend line by year:

names(out)[1]
## [1] "reading+assessment"
dat <- out[[1]][["trend"]]
colnames(dat)[3] <- "trend"

dat2 <- dat[dat[["start"]] > as.Date("2011-01-01"), ]

rects <- dat2  %>%
    mutate(year=format(as.Date(start), "%y")) %>%
    group_by(year) %>%
    summarize(xstart = as.Date(min(start)), xend = as.Date(max(end)))

ggplot() +
    geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, 
        ymax = Inf, fill = factor(year)), alpha = 0.4) +
    geom_line(data=dat2, aes(x=start, y=trend), size=.9) + 
    scale_x_date(labels = date_format("%m/%y"), 
        breaks = date_breaks("month"),
        expand = c(0,0), 
        limits = c(as.Date("2011-01-02"), as.Date("2014-12-31"))) +
    theme(axis.text.x = element_text(angle = -45, hjust = 0)) 

plot of chunk trend_iso

What I noticed was that for each year there was a general double hump pattern that looked something like this:

This pattern holds consistent across educational terms. I added some context to a smaller subset to help with the narrative:

dat3 <- dat[dat[["start"]] > as.Date("2010-12-21") & 
        dat[["start"]] < as.Date("2012-01-01"), ]

ggplot() +
    geom_line(data=dat3, aes(x=start, y=trend), size=1.2) + 
    scale_x_date(labels = date_format("%b %y"), 
        breaks = date_breaks("month"),
        expand = c(0,0)) +
    theme(axis.text.x = element_text(angle = -45, hjust = 0)) +
    theme_bw() + theme(panel.grid.major.y=element_blank(),
        panel.grid.minor.y=element_blank()) + 
    ggplot2::annotate("text", x = as.Date("2011-01-15"), y = 50, 
        label = "WinternBreak Ends") +
    ggplot2::annotate("text", x = as.Date("2011-05-08"), y = 70, 
        label = "SummernBreaknAcademia") +
    ggplot2::annotate("text", x = as.Date("2011-06-15"), y = 76, 
        label = "SummernBreaknTeachers") +
    ggplot2::annotate("text", x = as.Date("2011-08-18"), y = 63, 
        label = "AcademianReturns") +
    ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 78, 
        label = "TeachersnReturn")+
    ggplot2::annotate("text", x = as.Date("2011-11-17"), y = 61, 
        label = "Thanksgiving")

plot of chunk narrative

Of course this is all me trying to line up dates with educational search terms in a logical sense; a hypothesis rather than an firm conclusion. If this visual model is correct though, that these events impact Google searches around educational terms, and if a Google search is an indication of work to advance understanding of a concept, it’s clear that folks aren’t too interested in doing much advancing of educational knowledge at Thanksgiving and Christmas time. These are of course big assumptions. But if true, the implications extend further. Perhaps the most fertile time to engage educators, education students, and educational researchers is the first month after summer break.


Second Noticing

I also noticed that the two major literacy organizations are in a negative downward trend.

out %>%
    trend2long() %>%
    filter(term %in% c("literacy+research+association", 
        "international+reading+association")) %>%
    as.trend2long() %>%
    plot() + 
    guides(color=FALSE) +
    ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 60, 
        label = "InternationalnReadingnAsociation", color="#F8766D")+
    ggplot2::annotate("text", x = as.Date("2006-01-17"), y = 38, 
        label = "LiteracynResearchnAssociation", color="#00BFC4") +
    theme_bw() +
    stat_smooth()

plot of chunk downward_trend

I wonder what might be causing the downward trend? Also, I notice the trend is growing apart for the two associations, with the International Reading Association being effected less. Can this downward trend be reversed?


Associated Terms

Lastly, I want to look at some term uses across time and see if they correspond with what I know to be historical events around literacy in education.

out %>%
    trend2long() %>%
    filter(term %in% names(out)[1:7]) %>%
    as.trend2long() %>%
    plot() + scale_colour_brewer(palette="Set1") +
    facet_wrap(~term, ncol=2) +
        guides(color=FALSE)

plot of chunk terms

This made me want to group the following 4 terms together as there’s near perfect overlap in the trends. I don’t have a plausible historical explanation for this. Hopefully, a more knowledgeable other can fill in the blanks.

out %>%
    trend2long() %>%
    filter(term %in% names(out)) %>%
    as.trend2long() %>%
    plot() 

plot of chunk overlap

I explored the three remaining terms in the graph below. As expected, “common core” and “lexile” (scores associated with quantitative measures of text complexity) are on an upward trend. Phonics on the other hand is on a downward trend.

out %>%
    trend2long() %>%
    filter(term %in% names(out)) %>%
    as.trend2long() %>%
    plot() 

plot of chunk overlap2

This was an fun exploratory use of the GTrends package. Thanks to Steve Simpson for the introduction to GTrends and Philippe Massicotte and Dirk Eddelbuettel for sharing their work.


*Created using the reports package

To leave a comment for the author, please follow the link and comment on their blog: TRinker's R Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)