The significance of education on the salary in Sweden, a comparison between different occupational groups

[This article was first published on R Analystatistics Sweden , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post, I found that education has a significant impact on the salary of engineers. Is the significance of education on wages unique to engineers or are there similar correlations in other occupational groups?

I will use the same model in principal as in my previous post to calculate the significance of education. I will not use sex as an explanatory variable since there are occupational groups that do not have enough data for both genders. Searching through the different occupational groups I will fit education with a polynomial of degree one. I am interested in occupational groups where a longer education also results in higher salaries. Because of that, I will use the numerical approximation from my last post instead of using the categorical predictor. A polynomial of higher degree than one would result in a better fit but the problem with oscillation and overfitting made me settle for degree one. A straight line as a function also has the advantage that the average increase in salary for each education year is directly given from the model.

There are still occupational groups with too little data for regression analysis. More than 30 posts are necessary to fit both education and year.

The F-value from the Anova table is used as the single value to discriminate how much education and salary correlates. For exploratory analysis, the Anova value seems good enough.

In the figure below I will also use the estimate for education to see how much the salaries are raised by education for the different occupational groups holding year as constant.

library (tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.0     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library (broom)
library (car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
library (splines)
#install_github("ZheyuanLi/SplinesUtils")
library (SplinesUtils)

readfile <- function (file1){read_csv (file1, col_types = cols(), locale = readr::locale (encoding = "latin1"), na = c("..", "NA")) %>%
  gather (starts_with("19"), starts_with("20"), key = "year", value = salary) %>%
  drop_na() %>%
  mutate (year_n = parse_number (year))
}

The data table is downloaded from Statistics Sweden. It is saved as a comma-delimited file without heading, 000000CY.csv, http://www.statistikdatabasen.scb.se/pxweb/en/ssd/.

The table: Average basic salary, monthly salary and women´s salary as a percentage of men´s salary by sector, occupational group (SSYK 2012), sex and educational level (SUN). Year 2014 – 2018 Monthly salary All sectors

tb <- readfile("000000CY.csv")

numedulevel <- read.csv("edulevel.csv") 

numedulevel %>%
  knitr::kable(
  booktabs = TRUE,
  caption = 'Initial approach, length of education') 
Table 1: Initial approach, length of education
level.of.educationeduyears
primary and secondary education 9-10 years (ISCED97 2)9
upper secondary education, 2 years or less (ISCED97 3C)11
upper secondary education 3 years (ISCED97 3A)12
post-secondary education, less than 3 years (ISCED97 4+5B)14
post-secondary education 3 years or more (ISCED97 5A)15
post-graduate education (ISCED97 6)19
no information about level of educational attainmentNA
tbnum <- tb %>% 
  right_join(numedulevel, by = c("level of education" = "level.of.education")) %>%
  filter(!is.na(eduyears))
## Warning: Column `level of education`/`level.of.education` joining character
## vector and factor, coercing into character vector
summary_table = vector()
anova_table = vector()

for (i in unique(tbnum$`occuptional  (SSYK 2012)`)){
  temp <- filter(tbnum, `occuptional  (SSYK 2012)` == i)
  if (dim(temp)[1] > 30){
    model <- lm (log(salary) ~ year_n + eduyears, data = temp)
    summary_table <- rbind (summary_table, mutate (tidy (summary (model)), ssyk = i))
    anova_table <- rbind (anova_table, mutate (tidy (Anova (model, type = 2)), ssyk = i))
  }
}

merge(summary_table, anova_table, by = "ssyk", all = TRUE) %>%
  filter (term.y == "eduyears") %>%
  filter (term.x == "eduyears") %>%
  mutate (estimate = (exp(estimate) - 1) * 100) %>%
  ggplot () +
    geom_point (mapping = aes(x = estimate, y = statistic.y)) +
    labs(
      x = "Increase in salaries (% / year of education)",
      y = "F-value for education"
    ) 

The significance of education on the salary in Sweden, a comparison between different occupational groups, Year 2014 - 2018

Figure 1: The significance of education on the salary in Sweden, a comparison between different occupational groups, Year 2014 – 2018

The table with all occupational groups sorted by Increase in salary in descending order.

merge(summary_table, anova_table, by = "ssyk", all = TRUE) %>%
  filter (term.y == "eduyears") %>%
  filter (term.x == "eduyears") %>%
  select (ssyk, estimate, statistic.y) %>%
  mutate (estimate = (exp(estimate) - 1) * 100) %>%
  rename (`F-value for education` = statistic.y) %>%
  rename (`Increase in salary` = estimate) %>%
  arrange (desc (`Increase in salary`)) %>%
  knitr::kable(
    booktabs = TRUE,
    caption = 'Correlation for F-value (education) and the increase in salaries for each year of education')
Table 2: Correlation for F-value (education) and the increase in salaries for each year of education
ssykIncrease in salaryF-value for education
151 Health care managers11.3222562615.8861538
221 Medical doctors10.265011370.5835974
121 Finance managers9.917268932.7350353
261 Legal professionals9.783412048.1004817
122 Human resource managers9.0394770103.8231602
161 Financial and insurance managers8.660382219.2527280
231 University and higher education teachers7.5276294141.4796497
132 Supply, logistics and transport managers6.696957687.6384397
137 Production managers in manufacturing6.537875365.2842719
123 Administration and planning managers6.2977859129.4982036
136 Production managers in construction and mining6.219263057.8361291
129 Administration and service managers not elsewhere classified5.652243174.9238199
131 Information and communications technology service managers5.193718595.3092758
159 Other social services managers5.1920288355.8557670
134 Architectural and engineering managers4.6458339186.5137309
262 Museum curators and librarians and related professionals4.4638502294.6004878
332 Insurance advisers, sales and purchasing agents4.250598557.1999444
179 Other services managers not elsewhere classified4.093101344.9824826
235 Teaching professionals not elsewhere classified3.9957394189.3385229
311 Physical and engineering science technicians3.9729301134.6441603
234 Primary- and pre-school teachers3.8498852140.3499938
233 Secondary education teachers3.7146273167.3564935
125 Sales and marketing managers3.648907117.5675213
241 Accountants, financial analysts and fund managers3.382071049.5445027
242 Organisation analysts, policy administrators and human resource specialists3.183301894.6038730
213 Biologists, pharmacologists and specialists in agriculture and forestry2.924202093.6360138
321 Medical and pharmaceutical technicians2.7274994130.1793959
133 Research and development managers2.537756828.8355719
173 Retail and wholesale trade managers2.47366755.9078862
335 Tax and related government associate professionals2.365883741.8703194
214 Engineering professionals2.3546977140.9367261
243 Marketing and public relations professionals2.345827631.1353096
232 Vocational education teachers2.330351427.2589020
334 Administrative and specialized secretaries2.276506920.8078772
342 Athletes, fitness instructors and recreational workers2.127940324.0465329
266 Social work and counselling professionals2.081820246.6894573
331 Financial and accounting associate professionals2.07269168.1402279
411 Office assistants and other secretaries2.011504760.8893100
523 Cashiers and related clerks1.813712413.9599411
524 Event seller and telemarketers1.726168813.0208742
251 ICT architects, systems analysts and test managers1.595848481.4664709
819 Process control technicians1.496379440.1366829
962 Newspaper distributors, janitors and other service workers1.394086338.9982657
812 Metal processing and finishing plant operators1.387829212.9570916
432 Stores and transport clerks1.237648343.6773646
341 Social work and religious associate professionals1.119553960.5012820
531 Child care workers and teachers aides1.081550616.6066709
264 Authors, journalists and linguists1.06363885.4839869
333 Business services agents1.00196993.5614940
522 Shop staff0.98353259.4858527
351 ICT operations and user support technicians0.98210777.4180769
441 Library and filing clerks0.977093117.5562894
941 Fast-food workers, food preparation assistants0.867501515.3917476
611 Market gardeners and crop growers0.80143136.0302572
817 Wood processing and papermaking plant operators0.72912277.4060017
217 Designers0.70889001.8475401
831 Train operators and related workers0.55489343.3520151
932 Manufacturing labourers0.55334984.2142587
343 Photographers, interior decorators and entertainers0.54984630.6786136
513 Waiters and bartenders0.54893402.0997794
312 Construction and manufacturing supervisors0.54750321.3980603
534 Attendants, personal assistants and related workers0.526467828.8067668
515 Building caretakers and related workers0.52211126.8499176
815 Machine operators, textile, fur and leather products0.46911932.1375421
422 Client information clerks0.38863231.7115048
533 Health care assistants0.32391494.1630503
818 Other stationary plant and machine operators0.29572611.0364217
512 Cooks and cold-buffet managers0.25481340.9265992
711 Carpenters, bricklayers and construction workers0.24170110.1572365
218 Specialists within environmental and health protection0.23974130.4222318
961 Recycling collectors0.19216050.8568597
541 Other surveillance and security workers0.18121970.4153472
511 Cabin crew, guides and related workers0.16583800.2150479
723 Machinery mechanics and fitters0.14101490.2170183
821 Assemblers0.06448920.0858323
813 Machine operators, chemical and pharmaceutical products0.03264760.0217158
833 Heavy truck and bus drivers0.03098750.0482259
816 Machine operators, food and related products0.00697800.0017000
532 Personal care workers in health services-0.03830030.1216850
352 Broadcasting and audio-visual technicians-0.07059320.0171995
814 Machine operators, rubber, plastic and paper products-0.14289820.2504506
722 Blacksmiths, toolmakers and related trades workers-0.15696860.4841203
732 Printing trades workers-0.35437640.5477171
911 Cleaners and helpers-0.38597184.5299051
834 Mobile plant operators-0.39799385.1952942
216 Architects and surveyors-0.69706061.2794827
516 Other service related workers-0.71064221.0123452
265 Creative and performing artists-1.43268416.0250927

Let’s check what we have found.

temp <- tbnum %>%
  filter(`occuptional  (SSYK 2012)` == "151 Health care managers")

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = year_n,y = salary, colour = eduyears)) +
    facet_grid(. ~ sex) +   
    labs(
      x = "Year",
      y = "Salary (SEK/month)"
    ) 

Highest increase in salary, 151 Health care managers

Figure 2: Highest increase in salary, 151 Health care managers

modelcont <- lm (log(salary) ~ year_n + bs(eduyears, degree = 1), data = temp)

contspline <- RegBsplineAsPiecePoly(modelcont, "bs(eduyears, degree = 1)")

tibble(eduyears = seq(11, 19, by=0.1)) %>%
  ggplot () + 
    geom_point (mapping = aes(x = eduyears,y = predict(contspline, eduyears))) +
  labs(
    x = "Years of education",
    y = "Salary"
  )

Model fit, Health care managers, Correlation between education and salary

Figure 3: Model fit, Health care managers, Correlation between education and salary

tibble(eduyears = seq(11, 19, by=0.1)) %>%
  ggplot () + 
    geom_point (mapping = aes(x = eduyears,y = (exp(predict(contspline, eduyears, deriv = 1)) - 1) * 100)) +
  labs(
    x = "Years of education",
    y = "Salary difference (%)"
  )

Model fit, Health care managers, The derivative for educaton

Figure 4: Model fit, Health care managers, The derivative for educaton

temp <- tbnum %>% 
  filter(`occuptional  (SSYK 2012)` == "265 Creative and performing artists")

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = year_n,y = salary, colour = eduyears)) +
    facet_grid(. ~ sex) +   
    labs(
      x = "Year",
      y = "Salary (SEK/month)"
    ) 

Lowest increase in salary, 265 Creative and performing artists

Figure 5: Lowest increase in salary, 265 Creative and performing artists

modelcont <- lm (log(salary) ~ year_n + bs(eduyears, degree = 1), data = temp)

contspline <- RegBsplineAsPiecePoly(modelcont, "bs(eduyears, degree = 1)")

tibble(eduyears = seq(11, 15, by=0.1)) %>%
  ggplot () + 
    geom_point (mapping = aes(x = eduyears,y = predict(contspline, eduyears))) +
  labs(
    x = "Years of education",
    y = "Salary"
  )

Model fit, Creative and performing artists, Correlation between education and salary

Figure 6: Model fit, Creative and performing artists, Correlation between education and salary

tibble(eduyears = seq(11, 15, by=0.1)) %>%
  ggplot () + 
    geom_point (mapping = aes(x = eduyears,y = (exp(predict(contspline, eduyears, deriv = 1)) - 1) * 100)) +
  labs(
    x = "Years of education",
    y = "Salary difference (%)"
  )

Model fit, Creative and performing artists, The derivative for education

Figure 7: Model fit, Creative and performing artists, The derivative for education

To leave a comment for the author, please follow the link and comment on their blog: R Analystatistics Sweden .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)