Referring to American Presidents

[This article was first published on Jason Timm, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


In a previous post, I looked at how House Representatives referred to the 45th US President on Twitter during the 116th Congress. Variation was super interesting, and demonstrated (in part) that party affiliation influenced how House members referred to 45. Namely, Republicans more frequently used referring expressions that highlighted the status of the presidency, eg, “POTUS” and “President Trump”, while Democrats more frequently used referring expressions that downplayed the status of the presidency, eg, “Trump”, the “Trump Administration”, and “Donald Trump”.

So, is this pattern a Trump Presidency anomaly, or is this par for the course? In other words, do lawmakers (or parties in the aggregate) change how they refer to sitting US Presidents depending on whether or not they are supportive of the President/President’s policies (ie, share party affiliation)? To investigate, we consider a collection of e-newsletters from US lawmakers circa 2009 to present day, spanning seven congresses and three presidencies. Results are intuitive, and fairly comparable to previous post findings.

Data set

The data set used here is a collection of e-newsletters sent out by US Lawmakers from 2009 onward, made available at DCinbox. Roughly 150K newsletters amounting to ~100 million words.

cpe <- read.csv('dcinbox_export.csv')
cpe <- janitor::clean_names(cpe)
cpe$doc_id  <- 1:nrow(cpe)
cpe$text <- cpe$body

Referring to American Presidents

So, lots of ways to refer to American presidents. Example (1) below presents a generic cline based on American English honorifics/political convention – expressions to the left of the cline are more deferential to the status of the office; those to the right less so. We can abide by these conventions, or flout them – both communicate. Example (2) illustrates this cline using as an example America’s current president.

  1. President_FIRST_LAST > President_LAST > the president > FIRST_LAST > LAST_Administration > LAST

  2. President Joe Biden > President Biden > the president > Joe Biden > the Biden Administration > Biden

The inclusion of “Biden Administration” (generically as “LAST_Administration”) is perhaps odd, as it more generally refers to the executive branch. But it is an interesting variant, and one that has become more frequent among US lawmakers. Note: “POTUS”, “Mr. President”, and (eg) “Mr. Biden” are not especially well-attested in the e-newsletter corpus – so, they are not included here.

Below, we construct a dictionary of referring expressions for the last three American presidents. This allows us to make referring expressions generic across administrations, as well as filter references to only sitting presidents. In other words, we want to ignore references to former President Obama, eg, during the Trump presidency, eg.

presidents <- c('Barack Obama', 'Donald Trump', 'Joe Biden')
lnames <- gsub('^.* ', '', presidents)
refs <- list(President_FIRST_LAST = paste0('President ', presidents),
             President_LAST = paste0('President ', gsub('^.* ', '', presidents)),
             FIRST_LAST = presidents,
             LAST = lnames,
             LAST_Administration = c(paste0(lnames, ' Administration'),
                                     paste0(lnames, "'s Administration")),
             the_president = 'the president')

congs <- list(OBAMA = c(111:114),
              TRUMP = c(115:116),
              BIDEN = c(117),
              GENERIC = c(111:117))

pres <- c(rep(toupper(refs$LAST), 6), 'GENERIC')

pres_forms <- cbind(pres, stack((refs))) %>%
  rename(token = values, gen = ind) %>%
  left_join(stack(congs), by = c('pres' = 'ind')) %>%
  rename(congress = values) %>%
  mutate(token = toupper(gsub(' ', '_', token)))

presx0 <- gsub('_', ' ', unique(pres_forms$token))

A portion of the dictionary:

pres_forms %>% head() %>% knitr::kable()
pres token gen congress

Term extraction

The workflow for identifying/extracting relevant referring expressions is summarized as follows: tokenize each newsletter, combine tokens comprising multi-word referring expressions, filter tokens to only those included in our dictionary, and cast tokens to a data frame.

text_fn <- function(x){
    x2 <- text2df::tif2token(x1) 
    x3 <- text2df::token2mwe(tok = x2,
                             mwe = subset(presx0, grepl(' ', presx0)))
    x3 <- quanteda::as.tokens(x3)
    x4 <- quanteda::tokens_select(x3,
                                  pattern = gsub(' ', '_', presx0),
                                  selection = 'keep')

We can then distribute this workflow across multiple cores:

batches <- split(cpe[, c('doc_id', 'text')], 

clust <- parallel::makeCluster(7)
parallel::clusterExport(cl = clust, 
                        varlist = c('batches', 'presx0'),
                        envir = environment())
anno2 <- pbapply::pblapply(X = batches,
                           FUN = text_fn,
                           cl = clust)

pdata <- anno2 %>%
  bind_rows() %>%
  mutate(token = toupper(token)) %>%
  group_by(doc_id, token) %>%
  count() %>% ungroup()

Historical reference to the US President

pdata0 <- pdata %>%
  mutate(doc_id = as.integer(doc_id)) %>%
                   select = c(doc_id, 
                              party))) %>%
  left_join(pres_forms) %>%
  filter(!gen %in% c('POTUS', 'Mr_LAST')) %>%

Here, we compute the relative frequency with which each US lawmaker uses each referring expression per congress.

pdata1 <- pdata0 %>%
  group_by_at(vars(-doc_id, -n, -token)) %>% 
  summarize(n = sum(n)) %>%
  group_by(chamber, congress, bio_guide_id) %>% 
  mutate(pres_ref = sum(n)) %>%
  ungroup() %>%
  mutate(per = round(n/pres_ref, 3)) %>%
  left_join(pres_forms %>% filter(pres != 'GENERIC') %>%
              select(congress, pres) %>% distinct() %>% rename(label = pres)) %>%
  filter(party != 'Independent') 

As an example, relative frequencies are presented below for Marsha Blackburn – Republican Senator from Tennessee – during the 117th Congress. The junior Senator has referenced the 46th POTUS n = 157 times in her e-newsletters. She has referred to 46 as simply “Biden” a plurality (33.1%) of the time; she uses the expression “Biden Administration” in 24.2% of total references. In only 17.8% of occasions does the Senator refer to 46 as “President Biden”.

pdata1 %>%
  filter(bio_guide_id == 'B001243' & congress == 117) %>%
  select(gen:per) %>%
gen n pres_ref per
President_FIRST_LAST 10 157 0.064
President_LAST 28 157 0.178
FIRST_LAST 26 157 0.166
LAST 52 157 0.331
LAST_Administration 38 157 0.242
the_president 3 157 0.019

In contrast, Senator Blackburn referred to the 45th POTUS as “President Trump” in 62.7% of references-to-president during the 116th Congress.

pdata1 %>%
  filter(bio_guide_id == 'B001243' & congress == 116) %>%
  select(gen:per) %>%
gen n pres_ref per
the_president 21 102 0.206
President_FIRST_LAST 3 102 0.029
President_LAST 64 102 0.627
LAST 8 102 0.078
LAST_Administration 6 102 0.059
pdata2 <- pdata1 %>%
  group_by(chamber, congress, party, gen, label) %>%
  summarize(per = median(per)) %>% ungroup() 

The plot below illustrates the median relative frequency for each referring expression by congress (111-117) and party – for members of the House.

pdata2 %>%
  filter(chamber == 'House') %>%
  ggplot(aes(y = per, 
             x = congress, 
             color = party
             )) + 
  ggthemes::scale_color_stata() +
  geom_point(aes(y = per, 
                 x = congress, 
                 color = party,
                 shape = label
             ), size = 2.5) + # 
  geom_line() +
  #theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  scale_x_continuous(breaks=seq(111, 117, 1))+
  ggtitle('Referring to the sitting US President: House Representatives') +
  facet_wrap(~gen) #geom_smooth(se = T) 


So, some intuitive variation in prevalent referring expressions by party as a function of the party affiliation of the sitting President. When House Reps are supportive of the President (ie, when House Rep’s party = President’s party), referring expressions that highlight the status of the presidential office are more predominant, most notably in the case of the “President_LAST” variant.

House Reps that are aligned politically with the President have used “President_LAST” at a clip of roughly 50% over the last seven congresses (and three presidencies); this compares to ~25-30% prevalence of “President_LAST” for House Reps that are not aligned politically with the President. A similar pattern is attested (albeit to a lesser degree) in the case of “President_FIRST_LAST”.

When a House Rep’s party is not the party of the President, House members more frequently use “LAST_Administration” and “LAST” to refer to the President. The latter being the least deferential, absent of any reference to title/office. The “LAST_Administration” pattern is curious, and increasing in prevalence across both parties.

The most neutral referring expression investigated here, “the president”, has fallen out of use quite a bit in both parties since the 111th Congress. (With the exception of a notable peak in usage among Dems in the 116th.) One explanation is that, as a referring expression, “the president” is a wasted opportunity to communicate stance to one’s audience.

A quick look at the same data for the Senate reveals strikingly similar patterns over the last seven congresses.

An aggregate perspective

A slightly different perspective is presented in the plot below, in which overall mean prevalences by party and congress for all US lawmakers are summarized. Perhaps not surprisingly, a quick glance at these profiles very quickly sheds light on the political affiliation of the sitting President per congress.

pdata0 %>%
  group_by(congress, party, gen) %>% 
  summarize(n = sum(n)) %>%
  group_by(congress, party) %>% 
  mutate(pres_ref = sum(n)) %>%
  ungroup() %>%
  mutate(per = round(n/pres_ref, 3)) %>%
  mutate(gen = ordered(gen, levels = c('President_FIRST_LAST',
                                       'LAST'))) %>%
  filter(party != 'Independent') %>%
  ggplot(aes(x = congress, 
             y = per, 
             group = gen,
             fill = gen)) +
  geom_col(show.legend = T, 
           alpha = 0.85,
           width = .7) +
  scale_x_continuous(breaks=seq(111, 117, 1)) +
  facet_wrap(~party, ncol = 1)  +
  ggtitle('Referencing the sitting US President: US Lawmakers')


So, some interesting patterns in how lawmakers have referred to US Presidents over the last seven congresses. Most lawmakers use all the variants investigated here, and often the exigencies of text constraints and information flow will dictate to some extent which referring expression is used – which we have not considered. That said, when viewed in the aggregate, the patterns described here demonstrate how lawmakers exploit lexical variation (in the honorific system) to purposefully communicate stance with readers of their newsletters. And, importantly, as lawmaker support for (or stance towards) a President changes (per an election), so to does their choice of referring expressions.

To leave a comment for the author, please follow the link and comment on their blog: Jason Timm. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)