As part of our road to detecting metaphors we got stuck on a simple problem: compound nouns.
If you take the sentence:
series of immigration policy changes
Series modifies changes in reference to immigration policy, which is a compound noun.
“Series of changes” is not what we would consider metaphorical usage, but our detector would label “series of immigration” as potentially metaphorical, given its strangeness. Identifying compound nouns, and identifying which specific word is being modified (and is thus the “target” concept in the metaphor), is critical to improving performance.
But, we realized we didn’t want to throw this extra information out. Enter collocations:
“a sequence of words or terms that co-occur more often than would be expected by chance”
Using the same corpus that we’ve been using (which contains news articles, social media posts, and TV transcripts), we calculated the most prominent collocations containing “immigrant”, “immigration”, and “migration”.
|prefix||suffix||prefix frequency||suffix frequency||co-occurrence|
In graphical form, here’s what the information looks like:
The blue lines indicate the word is a prefix to the key source words (e.g. “chain migration”, “unaccompanied immigrant”), and the green lines indicate it is a suffix (e.g. “immigration policy”, “illegal reform”.)
What stands out to us is how little overlap there is in the collocations that overlap between the three words (except “illegal”, which is highly related to all three). This is especially surprising between “migration” and “immigration” which are both abstract nouns.