The Road To Detecting Metaphors – Part 1

[This article was first published on R – Gradient Metrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a 2011 experiment, half of the participants read about a crime-ridden city where crime was described as a beast ravaging the city of Addison. The other half read the same description of the city, only it described crime as a disease that ravaged the town. When participants were asked how to solve their crime problem, those who read the animal metaphor mainly suggested control strategies (increasing police presence, imposing stricter penalties). Those who read the disease metaphor instead suggested diagnostic/treatment strategies (seeking out the primary cause of the crime wave, bolstering the economy).

Metaphors, which apply a concept from a source (in this case, animals or disease) to a target (here, crime), are a key component of how people construct ideas in their mind. As Will Saletan writes in a 2007 review of Steven Pinker’s The Stuff of Thought:

“Metaphor turns out to be our crucial talent. It parlays crude animal knowledge into human advancement. From physical destinations, we extrapolate a conception of goals. From physical journeys, we build an understanding of relationships. Metaphors structure even our most advanced ideas: heat works like fluid, atoms like solar systems, genes like code, evolution like design. In each case, language has fossilized the construction process: “heat flow,” “genetic code,” “natural selection.”

Changing a single metaphorical word in a news excerpt changed the political response of the participants. This shows how powerful metaphors can be, and how they can influence political views.

Some scholars (like George Lakoff — a linguist who has written several books militating for progressives to make better use of metaphors) believe that choosing alternate frames and metaphors is a panacea for political persuasion, and others (like Pinker), are more skeptical, but there is no denying that people with different conceptualizations of an issue will have different political views. Is immigration a wave crashing over our country? Or is our country a family that is open to new members? In foreign affairs, is our country a shining light on a hill? Or a schoolyard bully?

Ok, but how do we find the metaphors that people use? We could, in theory, recruit a bunch of people to interview and pay attention to the metaphors they use, but (1), that’s very costly, and (2), even if we were listening carefully, we might miss most of the material! Most metaphors pass without notice in everyday speech — they are so fundamental that most people don’t explicitly notice that they are applying the logic of one domain to another. They operate under the hood, or behind the scenes.

We wanted to develop a different approach — one that could leverage the enormous amount of text and natural speech available on the internet. We wanted to see if we could build a taxonomy of conceptualizations around a single issue, and we started with immigration.

Our methodology roadmap

To find and tag immigration-related metaphors, we needed text. Lot’s of it. We generated a large corpus by scraping news articles and TV transcripts from different news channels, comments from Reddit and a mix of popular and recent posts on Twitter. Online content can often be very messy — which is why we carefully developed code to keep sentence structure intact.

When we were satisfied with our corpus we had to perform numerous Natural Language Processing (NLP) tasks before we could start extracting metaphors. We started by parsing the structure of the sentences in our corpus, tagging each individual word with its associated part-of-speech (POS) value and added the lemmatized term for each word.

Now, as mentioned before, a metaphor applies a concept from a source to a target —  but never in a literal sense. We know this, and most of the time (when we’re paying attention), we are capable of correctly identifying metaphors in a given text. But how do you teach a computer to distinguish between the literal or metaphorical meaning of words? Many researchers have tried to answer this question over the years, and it is still a developing research field.

In a paper published in the Association for Computational Linguistics, the researchers proposed a set of constructional patterns most likely associated with metaphors. We identified these patterns in our annotated corpus and extracted every possible combination (still millions!). Although it was a start, there were still countless of word combinations that had no metaphorical value whatsoever.

We found a way to trim our results using an approach from a recent publication. The researchers scored word pairs through a type of neural network they call a Supervised Similarity Network (SSN) and classified word pairs as metaphorical if the score was above a certain score threshold.

Our results finally took form! After removing named entities, identifying negations (e.g. immigration is NOT a wave) and counting occurrences, we are now anxious to show our results.

Immigration metaphors

Our goal was to build a pipeline to identify metaphors in a large corpus of text around a single issue immigration. And we succeeded.

In our very first iteration, we had over 1.5 million metaphor-eligible word combinations. Of these, approximately half were scored as being metaphorical by our SSN. Below, we highlight a small sample of our findings.

Describing current immigration patterns

  • A wave of immigration / immigrants
  • Immigration flow
  • Surge of immigration
  • Spike of immigration
  • Tide of immigration
  • Immigrants are in limbo

Political associations with immigration

  • Abuse of power
  • Creating a layer of mistrust
  • Creating layers of uncertainty
  • Take on the mantle of leadership
  • Shielding immigrants
  • We are behind a veil of ignorance

Positive associations with immigration

  • Sliver of hope
  • Illegal immigration is the lifeblood of our country
  • Immigrants are on a pursuit to happiness

Negative associations with immigration

  • Seed of violence
  • People are on edge
  • Surge of violence
  • Poison of immigration

Now these are all proper and good examples — which is definitely not in the majority.

The one thing that makes or breaks this ambitious project, is whether we can correctly separate the signal from the noise, and put the signals into context. What metaphors belong to what topics? Does abuse of power relate to the building a wall statements or does it relate to something completely unrelated to the immigration debate? How often do these metaphors occur? And from which news sources or public domains do these metaphors originate from? Do they change over time? So many questions, but we are answering them all.


We will be sharing our results and detailed analyses of our metaphor project over the coming weeks, which will:

  • Cover why metaphors are important & discuss current state-of-the-art research
  • Explain our metaphor detection process through constructional patterns in the corpus
  • Present results of identified metaphors by news source & topic (e.g. which metaphors are associated with the NY Times? And which metaphors are associated with certain topics within our corpus?)
  • Present results of applying clustering techniques to identified metaphors
  • Discuss our methodology in-depth be discussing the choices we made, examples of our written code & the open-source libraries that we used
  • Explore and explain potential business opportunities that researching metaphors allows, such as brand tracking, message phrasing, slogan generation and more

We hope you enjoyed this unintentionally not-so-brief introduction and would love to hear your thoughts, opinions and we highly encourage discussion in our comments below!

To leave a comment for the author, please follow the link and comment on their blog: R – Gradient Metrics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)