Getting Your Comparison Groups Right

[This article was first published on R on Sam Portnow's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Often, when presented with evidence of racism, or the existence of racism, critics will say neither exists, at least not in a meaningful way because many immigrant groups have higher household incomes than White Americans. These critics often point to the median household income of Nigerian Americans are irrefutable proof that racism is a thing of the past. However, comparing the income of Nigerian Americans with White Americans is not the right comparison. The correct comparison is between Nigerian Americans and other immigrant groups. Do Nigerian Americans make more than other White immigrants? Fortunately, we can investigate thesee claims easily with Census data. The Census publishes a Selected Population Profile as part of the American Community Survey; the table includes information on household income from a variety of immigrant groups. I downloaded the data to a csv file, which I’ll use to make the appropriate comparisons.

census = read_csv(here('content', 'post_data', 'ACS_17_1YR_S0201_with_ann.csv'))
## Parsed with column specification:
## cols(
##   .default = col_character()
## )
## See spec(...) for full column specifications.
census = census %>% select(`POPGROUP.display-label`, EST_VC319, MOE_VC319, EST_VC147, MOE_VC147)
census = census[-1,]
census %>% glimpse()
## Observations: 77
## Variables: 5
## $ `POPGROUP.display-label`  "Somali (568)", "Cajun (936-938)", "Afghan (…
## $ EST_VC319                 "26802", "59272", "40954", "70881", "58581",…
## $ MOE_VC319                 "1804", "6213", "5304", "5889", "1817", "275…
## $ EST_VC147                 "10.0", "26.8", "30.7", "38.2", "46.7", "39.…
## $ MOE_VC147                 "3.3", "3.8", "5.6", "3.3", "1.1", "3.3", "3…
census = census %>% mutate_at(vars(EST_VC319:MOE_VC147), as.numeric)
census = census %>% rename('Population Group' = `POPGROUP.display-label`)
census = census %>% mutate(`Population Group` = str_replace_all(`Population Group`, ' \\(.*\\).*', ''))
census = census %>%
  mutate(
    ymin = EST_VC319 - MOE_VC319,
    ymax = EST_VC319 + MOE_VC319
  )


### getting rid of obs with huge median incomes
### 
census = census %>% filter(EST_VC319 <= 250000)

ggplot(census, aes(x = reorder(`Population Group`, EST_VC319), y = EST_VC319)) + 
  geom_pointrange(aes(ymin = ymin, ymax = ymax)) + 
  coord_flip() + 
  xlab('Median Household Income') + 
  ylab('Population Group') + 
  scale_y_continuous(labels = scales::dollar)

From this graph, we see that Nigerian Americans are actually in the bottom half of immigrant groups with regard to household income!

Another point that critics make is that it’s all about education; Nigerian Americans have a higher income than White Americans because they have more education. It is true that Nigerian Americans do have more education. But again, we need the right comparison groups – do Nigerian Americans make more or less than other immigrant groups, with similar levels of education?

First, we will build a model predicting income from percent of population with at least a bachelor’s degree.

lm.mod = lm(EST_VC319 ~ EST_VC147, data = census)
summary(lm.mod)
## 
## Call:
## lm(formula = EST_VC319 ~ EST_VC147, data = census)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -30052  -3484   1855   5200  18489 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  33229.8     4023.1   8.260 3.82e-12 ***
## EST_VC147      810.1       93.1   8.702 5.50e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8831 on 75 degrees of freedom
## Multiple R-squared:  0.5024, Adjusted R-squared:  0.4957 
## F-statistic: 75.72 on 1 and 75 DF,  p-value: 5.504e-13

We see that education explains nearly 50% of the variance in income – education is clearly an important predictor of income. We can look at the residuals from this model, which will tell us whether Nigerian Americans make more or less than other immigrant groups with similar levels of education.

census$residual = resid(lm.mod)

ggplot(census, aes(x = reorder(`Population Group`, residual), y = residual)) + 
  geom_point() + coord_flip() +
  ylab('Population Group') + 
  xlab('Residual') + 
  scale_y_continuous(labels = scales::dollar)

From the chart above, we see that Nigerian Americans make approximately $15,000 less than expected based on their level of education! With the approriate comparison groups, the success of Nigeran Americans as evidence that racism is not meaningful in America anymore is a lot less convincing.

To leave a comment for the author, please follow the link and comment on their blog: R on Sam Portnow's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)