The significance of experience on the salary in Sweden, a comparison between different occupational groups

[This article was first published on R Analystatistics Sweden , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post, I found that experience has a significant impact on the salary of engineers. Is the significance of experience on wages unique to engineers or are there similar correlations in other occupational groups?

I will use the same model in principal as in my previous post to calculate the significance of age. I will not use sex as an explanatory variable since there are occupational groups that do not have enough data for both genders. I will also use a polynomial of degree three since this provides a significant model fit for some occupational groups.

There are still occupational groups with too little data for regression analysis. More than 30 posts are necessary to fit both age and year.

The R-value from the Anova table is used as the single value to discriminate how much the age and salary correlates. For exploratory analysis, the Anova value seems good enough.

In the figure below I will also use the estimate for the year to see how much the salaries are raised each year for the different occupational groups holding age as constant.

library (tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.0     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library (broom)
library (car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
library (polynom)

readfile <- function (file1){read_csv (file1, col_types = cols(), locale = readr::locale (encoding = "latin1"), na = c("..", "NA")) %>%
  gather (starts_with("19"), starts_with("20"), key = "year", value = salary) %>%
  drop_na() %>%
  mutate (year_n = parse_number (year))
}

The data table is downloaded from Statistics Sweden. It is saved as a comma-delimited file without heading, 000000D2.csv, http://www.statistikdatabasen.scb.se/pxweb/en/ssd/.

The table: Average basic salary, monthly salary and women´s salary as a percentage of men´s salary by sector, occupational group (SSYK 2012), sex and age. Year 2014 – 2018 Monthly salary All sectors

tb <- readfile("000000D2.csv") %>%
  rowwise() %>%
  mutate(age_l = unlist(lapply(strsplit(substr(age, 1, 5), "-"), strtoi))[1]) %>%
  rowwise() %>%
  mutate(age_h = unlist(lapply(strsplit(substr(age, 1, 5), "-"), strtoi))[2]) %>%
  mutate(age_n = (age_l + age_h) / 2)

summary_table = 0
anova_table = 0

for (i in unique(tb$`occuptional  (SSYK 2012)`)){
  temp <- filter(tb, `occuptional  (SSYK 2012)` == i)
  if (dim(temp)[1] > 30){
    model <-lm (log(salary) ~ year_n + poly(age_n, 3), data = temp)
    summary_table <- rbind (summary_table, mutate (tidy (summary (model)), ssyk = i))
    anova_table <- rbind (anova_table, mutate (tidy (Anova (model, type = 2)), ssyk = i))
  }
}

merge(summary_table, anova_table, by = "ssyk", all = TRUE) %>%
  filter (term.y == "poly(age_n, 3)") %>%
  filter (term.x == "year_n") %>%
  ggplot () +
    geom_point (mapping = aes(x = estimate, y = statistic.y)) +
    labs(
      x = "Increase in salaries (% / year)",
      y = "F-value for age"
    )


The significance of experience on the salary in Sweden, a comparison between different occupational groups, Year 2014 - 2018

Figure 1: The significance of experience on the salary in Sweden, a comparison between different occupational groups, Year 2014 – 2018

The table with all occupational groups sorted by F-value in descending order.

merge(summary_table, anova_table, by = "ssyk", all = TRUE) %>%
  filter (term.y == "poly(age_n, 3)") %>%
  filter (term.x == "year_n") %>%
  select (ssyk, estimate, statistic.y) %>%
  rename (`F-value for age` = statistic.y) %>%
  rename (`Increase in salary` = estimate) %>%
  arrange (desc (`F-value for age`)) %>%
  knitr::kable(
    booktabs = TRUE,
    caption = 'Correlation for F-value (age) and the yearly increase in salaries with age held as constant')

Table 1: Correlation for F-value (age) and the yearly increase in salaries with age held as constant
ssyk Increase in salary F-value for age
234 Primary- and pre-school teachers 0.0345563 1349.859088
233 Secondary education teachers 0.0294574 861.331070
532 Personal care workers in health services 0.0285338 800.259659
336 Police officers 0.0284911 675.571576
223 Nursing professionals (cont.) 0.0303955 625.404523
214 Engineering professionals 0.0192393 612.414362
235 Teaching professionals not elsewhere classified 0.0245885 578.686817
266 Social work and counselling professionals 0.0316617 551.888399
221 Medical doctors 0.0150176 449.792500
251 ICT architects, systems analysts and test managers 0.0249600 415.590103
534 Attendants, personal assistants and related workers 0.0191811 406.258604
231 University and higher education teachers 0.0254827 404.602202
222 Nursing professionals 0.0414071 371.107319
533 Health care assistants 0.0205813 345.075594
531 Child care workers and teachers aides 0.0219044 291.049608
351 ICT operations and user support technicians 0.0211211 271.091961
159 Other social services managers 0.0251218 191.570380
211 Physicists and chemists 0.0207272 186.366824
321 Medical and pharmaceutical technicians 0.0288946 177.137635
152 Managers in social and curative care 0.0387001 164.636802
243 Marketing and public relations professionals 0.0150173 154.310784
723 Machinery mechanics and fitters 0.0204993 146.299981
125 Sales and marketing managers 0.0187356 145.732333
141 Primary and secondary schools and adult education managers 0.0346753 142.578762
341 Social work and religious associate professionals 0.0255830 137.073911
133 Research and development managers 0.0137728 135.323107
153 Elderly care managers 0.0331514 132.163025
242 Organisation analysts, policy administrators and human resource specialists 0.0223881 132.013557
332 Insurance advisers, sales and purchasing agents 0.0176134 128.196288
218 Specialists within environmental and health protection 0.0258110 120.206634
311 Physical and engineering science technicians 0.0213202 119.371812
422 Client information clerks 0.0175877 117.057208
411 Office assistants and other secretaries 0.0250406 115.401389
264 Authors, journalists and linguists 0.0158766 107.667527
226 Dentists 0.0230213 99.061845
232 Vocational education teachers 0.0298647 93.534293
122 Human resource managers 0.0365348 86.595103
342 Athletes, fitness instructors and recreational workers 0.0162825 86.085107
515 Building caretakers and related workers 0.0188443 85.346469
123 Administration and planning managers 0.0423650 81.886461
137 Production managers in manufacturing 0.0267995 80.958767
227 Naprapaths, physiotherapists, occupational therapists 0.0212967 78.930141
132 Supply, logistics and transport managers 0.0135557 78.186301
817 Wood processing and papermaking plant operators 0.0289197 75.983376
441 Library and filing clerks 0.0210449 75.872685
131 Information and communications technology service managers 0.0431537 75.423080
343 Photographers, interior decorators and entertainers 0.0339142 75.132287
241 Accountants, financial analysts and fund managers 0.0270620 71.204029
216 Architects and surveyors 0.0241267 68.945982
134 Architectural and engineering managers 0.0236760 68.279874
228 Specialists in health care not elsewhere classified 0.0272838 64.426085
213 Biologists, pharmacologists and specialists in agriculture and forestry 0.0144849 63.378555
831 Train operators and related workers 0.0177987 55.404356
334 Administrative and specialized secretaries 0.0292702 52.477105
335 Tax and related government associate professionals 0.0227003 49.850281
224 Psychologists and psychotherapists 0.0270655 47.653074
511 Cabin crew, guides and related workers 0.0069736 47.413185
812 Metal processing and finishing plant operators 0.0176743 47.395879
331 Financial and accounting associate professionals 0.0229113 45.186053
261 Legal professionals 0.0292942 44.569161
819 Process control technicians 0.0232825 43.919550
333 Business services agents 0.0263028 43.327180
961 Recycling collectors 0.0225031 42.772133
312 Construction and manufacturing supervisors 0.0322029 41.767797
516 Other service related workers 0.0202784 41.325733
262 Museum curators and librarians and related professionals 0.0228651 40.378111
265 Creative and performing artists 0.0252235 39.119906
741 Electrical equipment installers and repairers 0.0221901 38.176541
524 Event seller and telemarketers 0.0203373 36.349688
941 Fast-food workers, food preparation assistants 0.0199578 35.998201
815 Machine operators, textile, fur and leather products 0.0128372 33.582965
962 Newspaper distributors, janitors and other service workers 0.0141958 32.540073
136 Production managers in construction and mining 0.0264825 31.006282
834 Mobile plant operators 0.0251599 30.439935
816 Machine operators, food and related products 0.0198706 29.543569
129 Administration and service managers not elsewhere classified 0.0171682 29.032377
212 Mathematicians, actuaries and statisticians 0.0240773 28.949679
352 Broadcasting and audio-visual technicians 0.0067079 28.725776
513 Waiters and bartenders 0.0214795 28.455515
813 Machine operators, chemical and pharmaceutical products 0.0254550 26.563325
151 Health care managers 0.0211530 24.870942
611 Market gardeners and crop growers 0.0089573 23.602904
732 Printing trades workers 0.0191704 23.581610
432 Stores and transport clerks 0.0217702 22.969527
217 Designers 0.0252062 22.823943
161 Financial and insurance managers 0.0518758 21.908728
711 Carpenters, bricklayers and construction workers 0.0136555 20.268520
541 Other surveillance and security workers 0.0239438 19.245270
179 Other services managers not elsewhere classified 0.0272448 17.108091
911 Cleaners and helpers 0.0176513 16.284355
512 Cooks and cold-buffet managers 0.0278549 15.787404
814 Machine operators, rubber, plastic and paper products 0.0245275 15.256042
267 Religious professionals and deacons 0.0268407 11.266331
761 Butchers, bakers and food processors 0.0153660 11.168879
722 Blacksmiths, toolmakers and related trades workers 0.0192713 10.890741
121 Finance managers 0.0276643 9.785317
752 Wood treaters, cabinet-makers and related trades workers 0.0269102 9.779896
713 Painters, Lacquerers, Chimney-sweepers and related trades workers 0.0259098 9.415854
932 Manufacturing labourers 0.0266336 9.113769
522 Shop staff 0.0267679 8.247675
818 Other stationary plant and machine operators 0.0237780 6.983074
344 Driving instructors and other instructors 0.0286480 6.971261
523 Cashiers and related clerks 0.0041737 4.970851
833 Heavy truck and bus drivers 0.0188392 4.786235
912 Washers, window cleaners and other cleaning workers 0.0382761 4.701424
821 Assemblers 0.0286219 1.405402

Let’s check what we have found.

temp <- tb %>%
  filter(`occuptional  (SSYK 2012)` == "234 Primary- and pre-school teachers")
 
temp %>%
  ggplot () +
    geom_point (mapping = aes(x = year_n,y = salary, colour = age)) +
    facet_grid(. ~ sex) +   
    labs(
      x = "Year",
      y = "Salary (SEK/month)"
    )


Highest F-value, Primary- and pre-school teachers

Figure 2: Highest F-value, Primary- and pre-school teachers

model <-lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp)

summod <- tidy(summary (model))

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) +
    labs(
      x = "Age",
      y = "Salary"
    )


Model fit, Primary- and pre-school teachers, Correlation between age and salary

Figure 3: Model fit, Primary- and pre-school teachers, Correlation between age and salary

pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5])))

temp %>%
  ggplot () + 
    geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) +
    labs(
      x = "Age",
      y = "Salary raise (%)"
    )


Model fit, Primary- and pre-school teachers, The derivative for age

Figure 4: Model fit, Primary- and pre-school teachers, The derivative for age

temp <- tb %>%
  filter(`occuptional  (SSYK 2012)` == "821 Assemblers")

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = year_n,y = salary, colour = age)) +
    facet_grid(. ~ sex) +   
    labs(
      x = "Year",
      y = "Salary (SEK/month)"
    )


Lowest F-value, Assemblers

Figure 5: Lowest F-value, Assemblers

model <-lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp)

summod <- tidy(summary (model))

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) +
    labs(
      x = "Age",
      y = "Salary"
    )


Model fit, Assemblers, Correlation between age and salary

Figure 6: Model fit, Assemblers, Correlation between age and salary

pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5])))

temp %>%
  ggplot () + 
    geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) +
    labs(
      x = "Age",
      y = "Salary raise (%)"
    )


Model fit, Assemblers, The derivative for age

Figure 7: Model fit, Assemblers, The derivative for age

temp <- tb %>%
  filter(`occuptional  (SSYK 2012)` == "161 Financial and insurance managers")

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = year_n,y = salary, colour = age)) +
    facet_grid(. ~ sex) + 
      labs(
        x = "Year",
        y = "Salary (SEK/month)"
      )


Highest yearly salary increase, Financial and insurance managers

Figure 8: Highest yearly salary increase, Financial and insurance managers

model <- lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp)

summod <- tidy(summary (model))

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) +
    labs(
      x = "Age",
      y = "Salary"
    )


Model fit, Financial and insurance managers, Correlation between age and salary

Figure 9: Model fit, Financial and insurance managers, Correlation between age and salary

pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5])))

temp %>%
  ggplot () + 
    geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) +
    labs(
      x = "Age",
      y = "Salary raise (%)"
    )


Model fit, Financial and insurance managers, The derivative for age

Figure 10: Model fit, Financial and insurance managers, The derivative for age

temp <- tb %>%
  filter(`occuptional  (SSYK 2012)` == "523 Cashiers and related clerks")
temp %>%
  ggplot () +
    geom_point (mapping = aes(x = year_n,y = salary, colour = age)) +
    facet_grid(. ~ sex) + 
    labs(
      x = "Year",
      y = "Salary (SEK/month)"
  )


Lowest yearly salary increase, Cashiers and related clerks

Figure 11: Lowest yearly salary increase, Cashiers and related clerks

model <-lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp)

summod <- tidy(summary (model))

temp %>%
  ggplot () +
    geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) +
    labs(
      x = "Age",
      y = "Salary"
    )


Model fit, Cashiers and related clerks, Correlation between age and salary

Figure 12: Model fit, Cashiers and related clerks, Correlation between age and salary

pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5])))

temp %>%
  ggplot () + 
    geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) +
    labs(
      x = "Age",
      y = "Salary raise (%)"
    )


Model fit, Cashiers and related clerks, The derivative for age

Figure 13: Model fit, Cashiers and related clerks, The derivative for age

https://www.r-bloggers.com/

https://rweekly.org

To leave a comment for the author, please follow the link and comment on their blog: R Analystatistics Sweden .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)