Exploring the Gender Pay Gap in Malaysia

[This article was first published on Zahier Nasrudin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Purpose/Objective

To explore the gender pay gap in Malaysia & its various dimensions, including state, ethnicity, education, occupation, strata, and age group. Using data from the Department of Statistics Malaysia (DOSM), we will analyze and visualize the differences in median salaries between male and female.

It is vital to note that this blog post is not intended to be political & also does not take into account other factors that may be contributing to the gender pay gap, while in our analysis may show a clear disparity between men & women’s pay in certain dimensions, it is worth noting that there are many other complex factors at play that contribute to the gap

Note that each graph is interactive, so you can hover over each bar/line chart to see the exact values and other details.

Load library

Show code
library(tidyverse)
library(ggiraph)
library(glue)

theme_set(theme_minimal(base_size = 7))

Read datasets

This blog post is based on publicly available data from the Department of Statistics Malaysia (DOSM)

Show code
## Links of dataset
link_salary <- c("https://storage.googleapis.com/dosm-public-economy/salaries_state_sex.csv",
                 "https://storage.googleapis.com/dosm-public-economy/salaries_industry_sex.csv",
                 "https://storage.googleapis.com/dosm-public-economy/salaries_ethnicity_sex.csv",
                 "https://storage.googleapis.com/dosm-public-economy/salaries_education_sex.csv",
                 "https://storage.googleapis.com/dosm-public-economy/salaries_occupation_sex.csv",
                 "https://storage.googleapis.com/dosm-public-economy/salaries_strata_sex.csv",
                 "https://storage.googleapis.com/dosm-public-economy/salaries_age_sex.csv")


## Read all
median_salary_all <- map_df(link_salary, ~ read.csv(.x) %>%
                              ## Remove overall calculation from datasets
                              filter(sex != "overall", variable_en != "Overall") %>%
                              ## Select only necessary columns
                              select(-c(variable_bm, variable,recipients)) %>%
                              ## Put category whether its state, industry etc
                              mutate(Remark = str_to_title(str_extract(.x, "(?<=salaries_)[a-z]+"))))

Create function for graph

This function is created to simplify the process of cleaning & plotting multiple datasets with the same format (State, Ethnic etc):

Show code
plot_median_salary_ratio <- function(data, remark_var, ncol) {
  
  # create interactive ggplot with facet wrap by state and line color by sex
  data %>%
    ## Filter based on state, industry etc
    filter(Remark == remark_var) %>%
    ## Plot graph
    ggplot(aes(x = year, y = `Female/Male`, color = variable_en)) +
    ## Point chart
    geom_point_interactive(aes(tooltip = `Label Median`, 
                               data_id = variable_en)) +
    ## Line chart
    geom_line(size = 0.2) +
    facet_wrap(variable_en ~ ., scales = "free_x", ncol = ncol) +
    theme(legend.position = "none",
          title = element_text(face = "bold"),
          strip.text = element_text(face = "bold")) +
    xlab("Year") +
    ylab("Female to Male Median Salary Ratio") +
    labs(caption = "Data from DOSM. Graph by Zahier Nasrudin") +
    # Title
    ggtitle(paste0("Ratio of Female to Male Median Salary by Year and by ", remark_var, " in Malaysia")) +
    # add a horizontal line at the ratio of 1
    geom_hline(yintercept = 1, linetype = "dashed", color = "black", size= 0.2)
}

Analysis

In the first step of the analysis, the ratio of median salaries between female and male workers in Malaysia was calculated. This was done by dividing the median salary for female workers by the median salary for male workers in each year (2010-2021). The resulting ratio provides a measure of the gender pay gap in Malaysia, where values below 1 indicate that women earn less than men.

Show code
### Make it wider
median_salary_all_wider <- median_salary_all %>%
  pivot_wider(names_from = c(sex),
              values_from = c(mean, median))

### Calculate ratio
median_salary_all_wider <- median_salary_all_wider %>%
  mutate(`Female/Male` = round(median_female / median_male, 2),
         variable_en = str_to_upper(variable_en),
         `Label Median` = glue("\nMedian Female: {median_female}\nMedian Male: {median_male}\nRatio: {`Female/Male`} ({year})"))

By State

The graph below is displaying the ratio of female to male median salary in Malaysia by state (from 2010 - 2021). It shows the changes in the pay gap between male & female across different states in Malaysia. This will also help us identify which states have a wider/narrower gender pay gap & how this gap has changed/evolved over the years

Show code
median_graph_state <- plot_median_salary_ratio(data = median_salary_all_wider,
                                               remark_var = "State",
                                               ncol = 3)
  
girafe(ggobj =  median_graph_state,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-state-download")
  ))

Strata

In this section, we analyze the gender pay gap by strata, which refers to the different levels of urbanization in Malaysia (Urban & Rural)

Show code
median_graph_strata <- plot_median_salary_ratio(data = median_salary_all_wider,
                                               remark_var = "Strata",
                                               ncol = 3)



girafe(ggobj =  median_graph_strata,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-strata-download")
  ))

It is interesting to note that the ratio of median female to male salary in Malaysia is consistently below 1 across all levels of strata and over the years. To visualize this trend, we now then created a bar chart that shows the ratio of median female to male salary for each level of strata over the years (side by side).

Show code
median_graph_strata2 <- median_salary_all_wider %>%
  filter(Remark == "Strata") %>%
  ggplot(aes(x = year, y = `Female/Male`, fill = variable_en)) +
  geom_bar_interactive(stat = "identity", position = "dodge",
                       aes(tooltip = `Label Median`, 
                               data_id = year)) +
  labs(x = "Year", y = "Ratio (Median Salary Female / Median Salary Male)",
       title = "Gender Pay Ratio by Strata in Malaysia",
       subtitle = "Values below 1 indicate women are paid less than men",
       fill = "Strata") +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold"),
        plot.subtitle = element_text(face = "italic")) +
  scale_x_continuous(breaks = unique(median_salary_all_wider$year))


girafe(ggobj =  median_graph_strata2,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-strata-download2")
  ))

Ethnic

Next, we explored the gender pay ratio by ethnic group in Malaysia. The dataset prepared by DOSM in this analysis categorizes ethnicity into six levels: Bumiputera, Chinese, Citizen, Indian, Non-citizen, and Others. It is vital to note that the use of the ethnic levels in the analysis is based on the data provided by DOSM and may not necessarily align with individuals’ self identified ethnicities

Show code
median_graph_ethnic <- plot_median_salary_ratio(data = median_salary_all_wider,
                                               remark_var = "Ethnicity",
                                               ncol = 3)


girafe(ggobj =  median_graph_ethnic,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-ethnic-download")
  ))

Education

We also examined the gender pay gap by education level. There are four categories: no formal education, primary education, secondary education, and tertiary education.

Show code
median_graph_education <- plot_median_salary_ratio(data = median_salary_all_wider,
                                               remark_var = "Education",
                                               ncol = 2)

girafe(ggobj =  median_graph_education,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-education-download")
  ))

Occupation

For the occupation analysis, we will be going for a slightly different approach. Instead of using the ratio of median salaries in Malaysia, we will calculate the pay gap as the difference between the median salary for women and the median salary for men, divided by the median salary for men. In simpler words, we calculate (median salary female / median salary male) - 1. This gives us a percentage difference, with negative values indicating that women earn less than men, and positive values indicating the opposite. We then plotted this pay gap by occupation over the years:

Show code
median_occupation <- median_salary_all_wider %>%
  filter(Remark == "Occupation") %>%
  mutate(pay_gap = round(median_female/median_male-1, 3),
         `Label Median Diff` = glue("\nMedian Female: {median_female}\nMedian Male: {median_male}\nDiff: {pay_gap * 100}% ({year})"))

median_graph_occupation1 <- median_occupation %>%
  ggplot(aes(x = year, y = pay_gap, fill = factor(sign(pay_gap)))) +
  geom_col_interactive(position = "dodge", aes(tooltip = `Label Median Diff`, 
                               data_id = variable_en)) +
  scale_fill_manual(values = c("red", "blue")) +
  labs(x = "Year", y = "Pay Gap",
       title = "Gender Pay Gap by Occupation in Malaysia",
       subtitle = "Negative values indicate women are paid less than men",
       fill = "Pay Gap") +
  facet_wrap(variable_en ~., ncol = 2) +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 90, vjust = 0.5),
        plot.title = element_text(face = "bold"),
        plot.subtitle = element_text(face = "italic")) +
  scale_x_continuous(breaks = unique(median_occupation$year)) +
  scale_y_continuous(labels = scales::percent)


girafe(ggobj =  median_graph_occupation1,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-occupation-download1")
  ))

For those who favor the line chart (Ratio):

Show code
median_graph_occupation <- plot_median_salary_ratio(data = median_salary_all_wider,
                                               remark_var = "Occupation",
                                               ncol = 2)





girafe(ggobj =  median_graph_occupation,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-occupation-download")
  ))

Age Group

For the age group analysis, we will look at the median salaries for both genders across different age groups. The data was divided into six age groups: 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54 & 55-59. The ratio values are then plotted on the graph below to visualize the changes in the pay gap over time. The aim is to identify any trends or patterns in pay gap across different age groups over the years

Show code
median_graph_age <- plot_median_salary_ratio(data = median_salary_all_wider,
                                               remark_var = "Age",
                                             ncol = 3)

girafe(ggobj =  median_graph_age,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-age-download")
  ))

By industry

Lastly, the chart below is displaying the ratio by industry (over the years). Again, ratio of 1 indicates that men and women are being paid equally, while a ratio below 1 indicates that women are being paid less than men.

Show code
median_graph_industry <- plot_median_salary_ratio(data = median_salary_all_wider,
                                               remark_var = "Industry",
                                               ncol = 2)
  
girafe(ggobj =  median_graph_industry,
       options = list(
    opts_hover_inv(css = "opacity:0.1;"),
    opts_hover(css = "stroke-width:2;"),
    width_svg = 8, height_svg = 6,
    opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-industry-download")
  ))

Conclusion

The interactive visualizations presented in this blog post provide a comprehensive overview of the pay gap trends by various factors. While this analysis sheds light on the extent of the gender pay gap in Malaysia, it is important to note that other factors beyond the scope of this analysis may contribute to the pay gap, such as differences in work experience and job preferences. Nonetheless, this analysis serves as a starting point for further exploration and discussion on how to close the gender pay gap in Malaysia.

To leave a comment for the author, please follow the link and comment on their blog: Zahier Nasrudin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)