Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Exploring Explanations Of the Black-White wealth gap

The black-white wealth gap is nearly 15:1; the mean of black household wealth is $138,200; for white houesholds it is$933,700.

A frequent explanation has been single-parent families in the Black community (1, 2 3) or culture (1, 2, 3, 4, 5)

Fortunately, we can test these claims. I am going to use the Add Health to investigate them.

The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year.

# get wave files

The first thing I’ll do is get the files from each wave.

dirs = dir_ls(here('content','data'), type = 'directory')

wave1_dirs = dirs[1:4]
wave2_dirs = dirs[5:7]
wave3_dirs = dirs[c(8, 16:18, 20:21)]
wave4_dirs = dirs[c(22, 21)]

wave1_files = dir_ls(wave1_dirs, glob = '*.rda')
wave2_files = dir_ls(wave2_dirs, glob = '*.rda')
wave3_files = dir_ls(wave3_dirs, glob = '*.rda')
wave4_files = dir_ls(wave4_dirs, glob = '*.rda')

# function to read rda to list

rda2list <- function(file) {
e <- new.env()
as.list(e)
}

wave1_dfs <- map(wave1_files, rda2list)
wave1_dfs <- map(wave1_dfs, ~ as_tibble(.[]))
wave1_df = wave1_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Joining, by = "AID"
## Joining, by = "AID"
wave2_dfs <- map(wave2_files, rda2list)
wave2_dfs <- map(wave2_dfs, ~ as_tibble(.[]))
wave2_df = wave2_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Joining, by = "AID"
wave3_dfs <- map(wave3_files, rda2list)
wave3_dfs <- map(wave3_dfs, function(x){
d = as_tibble(x[])
if ('RRELNO' %in% names(d)){
d$RRELNO = as.factor(d$RRELNO)
}
if ('value' %in% names(d)){
d$value = as.character(d$value)
}
d
})

wave3_df = wave3_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Warning: Column AID joining factors with different levels, coercing to
## character vector
## Joining, by = "AID"
## Warning: Column AID joining character vector and factor, coercing into
## character vector
## Joining, by = "AID"
## Warning: Column AID joining character vector and factor, coercing into
## character vector
## Joining, by = "AID"
## Warning: Column AID joining character vector and factor, coercing into
## character vector
## Joining, by = "AID"
## Warning: Column AID joining character vector and factor, coercing into
## character vector
wave4_dfs <- map(wave4_files, rda2list)
wave4_dfs <- map(wave4_dfs, ~ as_tibble(.[]))
wave4_df = wave4_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Warning: Column AID joining factors with different levels, coercing to
## character vector

# bind them all together

wave1_df$wave = 1 wave2_df$wave = 2
wave3_df$wave = 3 wave4_df$wave = 4

df = bind_rows(wave1_df, wave2_df, wave3_df, wave4_df)
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector

# Variables to Look at

## Race/Ethnicity

Which ONE category best describes your racial background?

H1GI6A - H1GI6E

## Income

Now think about your personal earnings. In {2006/2007/2008}, how much income did you receive from personal earnings before taxes, that is, wages or salaries, including tips, bonuses, and overtime pay, and income from self-employment?

H4EC2 (W4)

I’m going to look at variables that best approximate the reasons for racial disparities mentioned above. I’ll focus on Impulsivity, Conscientiousness, Intelligence, and Parental Marital Status. I chose Parental Marital Status because it’s so frequently offered as the reason for racial disparities. I choses impulsivity, conscientiousness, and intelligence because I think they approximate the cultural explanations – that if one works hard and is smart, then there won’t be any racial disparities.

## Impulsivity:

When making decisions, you usually go with your “gut feeling” without thinking too much about the consequences of each alternative.

• H1PF16 (W1)
• H2PF15 (W2)
• H3SP18 (W3)
• H4PE34 (W4)

## Conscientiousnes:

These variables are taken (from the following article)[https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00158/full]

• H1PF18 (Pay attention to details)
• H1PF19 (Come up with good solutions)
• H1PF20 (Do things according to a plan)
• H1PF21 (Do more than what’s expected of me)

AH_PVT (W1)

C4WD90_1 (W4)

PA10

# Data Manipulation

Now, I’ll move onto cleaning up the dataset.

## Construct Race

df = df %>%
mutate(
race = case_when(
H1GI6B == '(1) (1) Marked' ~ 'black',
H1GI6C == '(1) (1) Marked' ~ 'native american',
H1GI6D == '(1) (1) Marked' ~ 'asian or pacific islander',
H1GI6E == '(1) (1) Marked' ~ 'other',
H1GI6A == '(1) (1) Marked' ~ 'white',
)
)

# fill in race for missing waves
df = df %>% group_by(AID) %>%
mutate(
race = if_else(any(! is.na(race)), paste0(unique(race), collapse = '|'), NA_character_)
)

# for some reason the na detection wasnt working correctly above, so im just str_replacing the NAs
df$race = str_replace(df$race, '\\|.*', '')

## construct conscientiousness

df = df %>%
mutate_at(vars(H1PF18:H1PF21), as.numeric)

df$conscientiousness = 5 - rowMeans(df[,c('H1PF18', 'H1PF19', 'H1PF20', 'H1PF21')], na.rm = T) Below, I make a function to plot the data. For each plot, I am going to plot the distribution of White and Black people’s earnings, and I am going to break those distribution out by categories (e.g., parents married/not married, or the 0, 10%, 30%, 50%, 70%, 90%, and 100% quantiles of a continuous variable ) plot_it <- function(value){ probs <- c(0, .1, .3, .5, .7, .9, 1) df.plot$quartile <- with(df.plot, cut(value, breaks = unique(quantile(value, probs, na.rm=TRUE)),
include.lowest=TRUE))
### getting rid of outliers
df.plot = df.plot %>% filter(H4EC2 < 100000)
df.plot$H4EC2 = df.plot$H4EC2/10000
ggplot(df.plot, aes(x = H4EC2, y = race, fill = race)) +
stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
scale_x_continuous(breaks = seq(0, 10, by = .5)) +
facet_wrap(~ quartile, scales = 'free', ncol = 1)

}

## get wave 1 variables joined in with wave 4 variables

df = df %>%
mutate(
married = if_else(PA10 == '(2) (2) Married (skip to A12)', 'married', 'not married')
)

df.wave1 = df %>%
filter(wave == 1 &race %in% c('black', 'white') & ! is.na(married)) %>%
select(AID, race, married)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2)
df.plot = left_join(df.wave1, df.outcome)
## Joining, by = "AID"

# Data Analysis

look at earnings in Wave 4 (H4EC2)

For each graph, income is measured in 10,000’s. so 2.5 = $25,000 ## by marital status df.plot = df.plot %>% filter(H4EC2 < 100000) df.plot$H4EC2 = df.plot$H4EC2/10000 ggplot(df.plot, aes(x = H4EC2, y = race, fill = race)) + stat_density_ridges(quantile_lines = TRUE, quantiles = 2) + scale_x_continuous(breaks = seq(0, 10, by = .5)) + facet_wrap(~ married, scales = 'free', ncol = 1) ## Picking joint bandwidth of 0.46 ## Picking joint bandwidth of 0.509 ## by conscientiousness df.wave1 = df %>% filter(wave == 1 &race %in% c('black', 'white') & ! is.na(conscientiousness)) %>% select(AID, race, conscientiousness) df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) df.plot = left_join(df.wave1, df.outcome) ## Joining, by = "AID" plot_it(df.plot$conscientiousness)
## Picking joint bandwidth of 0.606
## Picking joint bandwidth of 0.566
## Picking joint bandwidth of 0.475
## Picking joint bandwidth of 0.557
## Picking joint bandwidth of 0.752 ## by peabody picture vocabulary

df.wave1 = df %>%
filter(wave == 1 &race %in% c('black', 'white') & ! is.na(AH_PVT)) %>%
select(AID, race, AH_PVT)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2)
df.plot = left_join(df.wave1, df.outcome)
## Joining, by = "AID"
plot_it(df.plot$AH_PVT) ## Picking joint bandwidth of 0.594 ## Picking joint bandwidth of 0.498 ## Picking joint bandwidth of 0.518 ## Picking joint bandwidth of 0.538 ## Picking joint bandwidth of 0.544 ## Picking joint bandwidth of 0.684 ## word recall df.wave4 <- df %>% filter(wave == 4 &race %in% c('black', 'white') & ! is.na(C4WD90_1)) %>% select(AID, race, C4WD90_1) df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) df.plot = left_join(df.wave4, df.outcome) ## Joining, by = "AID" plot_it(df.plot$C4WD90_1)
## Picking joint bandwidth of 0.545
## Picking joint bandwidth of 0.481
## Picking joint bandwidth of 0.534
## Picking joint bandwidth of 0.614
## Picking joint bandwidth of 0.686
## Picking joint bandwidth of 0.708 ## by impulsivity

df.wave1 = df %>%
filter(wave == 1 &race %in% c('black', 'white') & ! is.na(H1PF16)) %>%
select(AID, race, H1PF16)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2)
df.plot = left_join(df.wave1, df.outcome)
## Joining, by = "AID"
df.plot$H1PF16 = as.numeric(df.plot$H1PF16)
plot_it(df.plot$H1PF16) ## Picking joint bandwidth of 0.475 ## Picking joint bandwidth of 0.531 ## Picking joint bandwidth of 0.509 ## Picking joint bandwidth of 0.745 In summary, it’s clear that none of the variables come close to explaining the racial disparities in income. While it is true that some variables (particularly intelligence), reduce disparities at the higher ends of the distributions, White participants in ADD Health at the middle and lower ends of the disribution outpaced their Black counterparts by approximately$5,000 - \$10,000. Another important point to add is that I’m only looking at income; to eliminate racial wealth gaps, Black participants in the study would have to earn more than their White counterparts to have any hope of eliminating those disparities.