Working With SEM Keywords in R

September 20, 2015

(This article was first published on Mathew Analytics » R, and kindly contributed to R-bloggers)

The following post is taken from two previous posts from an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine marketing campaign. Two, how can I develop an effective system for examining keywords based on different characteristics.

Generating PPC Keywords in R

Paid search marketing refers to the process of driving traffic to a website by purchasing ads on search engines. Advertisers bid on certain keywords that users might search for, and that determines when and where their ads appear. For example, an individual who owns an auto dealership would want to bid on keywords relating to automobiles that a reasonable people would search for on a search engine. In both Google and Bing, advertisers are able to specify which keywords they would like to bid for and at what amount. If the user decides to bid on just a small number of keywords, they can type that information and specify a bid. However, what if you want to bid on a significant number of keywords. Instead of typing each and every keyword into the Google or Bing dashboard, you could programmatically generate the keywords in R.

Let’s say that I run an online retail establishment that sells mens and womens streetwear and I want to drive more traffic to my online store by placing ads on both Google and Bing. I want to bid on about a number of keywords related to fashion and have created a number of ‘root’ words that will comprise the majority of these keywords. To generate my desired keywords, I have a written a function which will take every single permutation of the root words.

root1 = c("fashion", "streetwear")
root2 = c("karmaloop", "crooks and castles", "swag")
root3 = c("urban clothing", "fitted hats", "snapbacks")
root4 = c("best", "authentic", "low cost") 

myfunc <- function(){
      lst <- list(root1=c(root1), root2=c(root2), root3=c(root3),
      myone <- function(x, y){
            m1 <-, expand.grid(lst[[x]], lst[[y]]))
            mydf <- data.frame(keyword=c(m1))
      mydf <- rbind(myone("root4","root1"), myone("root2","root1"))

mydat <- myfunc()

write.table(mydat, "adppc.txt", quote=FALSE, row.names=FALSE)

This isn’t the prettiest code in the world, but it gets the job done. In fact, the same results could have achieved using the following code, which is much more efficient.

root5 = c("%s fashion")
root6 = c("%s streetwear")
adcam1 = sprintf(root5, root2)
adcam2 = sprintf(root6, root2)
df = data.frame(keywords=c(adcam1, adcam2))

write.table(df, "adppc.txt", quote=FALSE, row.names=FALSE)

If you have any suggestions for improving my R code, please mention it in the comment section below.

Creating Tags For PPC Keywords

When performing search engine marketing, it is usually beneficial to construct a system for making sense of keywords and their performance. While one could construct Bayesian Belief Networks to model the process of consumers clicking on ads, I have found that using ’tags’ to categorize keywords is just as useful for conducting post-hoc analysis on the effectiveness of marketing campaigns. By ‘tags,’ I mean identifiers which categorize keywords according to their characteristics. For example, in the following data frame, we have six keywords, our average bids, numbers of clicks, and tags for state, model, car, auto, save, and cheap. What we want to do now is set the boolean for each tag to 1 if and only if that tag is mentioned in the keyword.

df = data.frame(keyword=c("best car insurance",
                          "honda auto insurance",
                          "florida car insurance",
                          "cheap insurance online",
                          "free insurance quotes",
                          "iowa drivers save money"),
                average_bid=c(3.12, 2.55, 2.38, 5.99, 4.75, 4.59),
                clicks=c(15, 20, 30, 50, 10, 25),
                conversions=c(5, 2, 10, 15, 3, 5),
                state=0, model=0, car=0, auto=0, save=0, cheap=0)

main <- function(df) {
  state <- c("michigan", "missouri", "florida", "iowa", "kansas")
  model <- c("honda", "toyota", "ford", "acura", "audi")
  car <- c("car")
  auto <- c("auto")
  save <- c("save")
  cheap <- c("cheap")
  for (i in 1:nrow(df)) {
    Words = strsplit(as.character(df[i, 'keyword']), " ")[[1]]
    if(any(Words %in% state)) df[i, 'state'] <- 1
    if(any(Words %in% model)) df[i, 'model'] <- 1 
    if(any(Words %in% car)) df[i, 'car'] <- 1
    if(any(Words %in% auto)) df[i, 'auto'] <- 1     
    if(any(Words %in% save)) df[i, 'save'] <- 1
    if(any(Words %in% cheap)) df[i, 'cheap'] <- 1

one = main(df)

subset(one, state==TRUE | model==TRUE | auto==TRUE)




state <- c("michigan", "missouri", "florida", "iowa", "kansas")
model <- c("honda", "toyota", "ford", "acura", "audi")
car <- c("car")
auto <- c("auto")
save <- c("save")
cheap <- c("cheap")

state_match <- str_c(state, collapse = "|")
model_match <- str_c(model, collapse = "|")
car_match <- str_c(car, collapse = "|")
auto_match <- str_c(auto, collapse = "|")
save_match <- str_c(save, collapse = "|")
cheap_match <- str_c(cheap, collapse = "|")

main <- function(df) {
  df$state <- str_detect(df$keyword, state_match)
  df$model <- str_detect(df$keyword, model_match)
  df$car <- str_detect(df$keyword, car_match)
  df$auto <- str_detect(df$keyword, auto_match)
  df$save <- str_detect(df$keyword, save_match)
  df$cheap <- str_detect(df$keyword, cheap_match)

two = main(df2)

subset(two, state==TRUE | model==TRUE | auto==TRUE)

By now, some of you are probably wondering why we don’t just select the keyword directly from the original data frame based on the desired characteristic. Well, that works too, albeit I’ve found that the marketing professionals that I’ve worked with have preferred the ‘tagging’ method.

## Alternate approach - SELECT DIRECTLY


main <- function(df) {
  model <- c("honda", "toyota", "ford", "acura", "audi")
  for (i in 1:nrow(df)) {
    Words = strsplit(as.character(df[i, 'keyword']), " ")[[1]]
    if(any(Words %in% model)) return(df[i, c(1:4) ])    

three = main(df)

So there you have it, a method of ‘tagging’ strings according to a certain set of specified characteristics. The benefit of using ‘tags’ is that it provides you with a systematic way to document how the presence of certain words or phrases impacts performance.



To leave a comment for the author, please follow the link and comment on their blog: Mathew Analytics » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)