The significance of gender on the salary of engineers in Sweden

[This article was first published on R Analystatistics Sweden , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my previous posts, I have shown that there exists a correlation between salaries and experience as well as between salaries and education within many occupational groups. Within the data I used from Statistics Sweden there was also information about gender. Some of the models and graphs also showed that there is a difference in salaries between the different genders.

In this post, I will examine the interaction between the predictors. Apart from gender the interaction between year also needs to be investigated.

It would have been very interesting to examine the interaction between experience and education. However, this information is not available at Statistics Sweden, at least not for free.

First, define libraries and functions.

library (tidyverse) 
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.0     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library (broom) 
library (car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
library (splines)
#install_github("ZheyuanLi/SplinesUtils")
library(SplinesUtils)
library(sjPlot)

readfile <- function (file1){  
  read_csv (file1, col_types = cols(), locale = readr::locale (encoding = "latin1"), na = c("..", "NA")) %>%  
    gather (starts_with("19"), starts_with("20"), key = "year", value = salary) %>%  
    drop_na() %>%  
    mutate (year_n = parse_number (year))
}

The data table is downloaded from Statistics Sweden. It is saved as a comma-delimited file without heading, 00000031.csv, http://www.statistikdatabasen.scb.se/pxweb/en/ssd/.

Average monthly pay (total pay), non-manual workers private sector (SLP), SEK by occupational group (SSYK), age, sex and year. Year 2014 – 2018

There are no women in the age group 65-66 years which explains the uncertainty for this group.

We are interested to see if the interaction terms between gender, age and year are important factors in salaries. As a null hypothesis, we assume that the interaction terms between gender, age and year is not related to the salary and examine if we can reject this hypothesis with the data from Statistics Sweden.

tb <- readfile("00000031.csv") %>% 
  rowwise() %>% 
  mutate(age_l = unlist(lapply(strsplit(substr(age, 1, 5), "-"), strtoi))[1]) %>%
  rowwise() %>% 
  mutate(age_h = unlist(lapply(strsplit(substr(age, 1, 5), "-"), strtoi))[2]) %>%
  mutate(age_n = (age_l + age_h) / 2) 
   
tb %>%
  ggplot () +  
    geom_point (mapping = aes(x = year_n,y = salary, colour = age, shape = sex)) + 
  labs(
    x = "Year",
    y = "Salary (SEK/month)"
  )

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 1: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

model <- lm (log(salary) ~ year_n + sex + age, data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 1: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-28.18183273.5502758-7.9379280
year_n0.01908960.001761010.8399360
sexwomen-0.03387020.0051006-6.6404200
age25-29 years0.11946460.010820011.0410480
age30-34 years0.26701570.010820024.6778850
age35-39 years0.37980430.010820035.1019290
age40-44 years0.45072980.010820041.6569420
age45-49 years0.48260820.010820044.6031780
age50-54 years0.49578580.010820045.8210640
age55-59 years0.49993860.010820046.2048700
age60-64 years0.49268520.010820045.5345010
age65-66 years0.48505400.014545733.3469950
summary(model)$adj.r.squared  
## [1] 0.9817476
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##            Sum Sq Df F value    Pr(>F)    
## year_n    0.06878  1 117.504 < 2.2e-16 ***
## sex       0.02581  1  44.095 3.166e-09 ***
## age       2.81366  9 534.074 < 2.2e-16 ***
## Residuals 0.04800 82                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 2: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[38,]
## Source: local data frame [1 x 9]
## Groups: 
## 
## # A tibble: 1 x 9
##   `occuptional  (SSYK 2012~ age      sex   year  salary year_n age_l age_h age_n
##                                    
## 1 214 Engineering professi~ 18-24 y~ men   2016   27300   2016    18    24    21
tb[56,]
## Source: local data frame [1 x 9]
## Groups: 
## 
## # A tibble: 1 x 9
##   `occuptional  (SSYK 2012~ age      sex   year  salary year_n age_l age_h age_n
##                                    
## 1 214 Engineering professi~ 65-66 y~ men   2016   51500   2016    65    66  65.5
tb[72,]
## Source: local data frame [1 x 9]
## Groups: 
## 
## # A tibble: 1 x 9
##   `occuptional  (SSYK 2012~ age      sex   year  salary year_n age_l age_h age_n
##                                    
## 1 214 Engineering professi~ 55-59 y~ women 2017   45000   2017    55    59    57

I will examine models with both the categorical and continuous predictors to see if they can make the same predictions. The models with continuous predictors show approximately the same as the models with categorical predictors as you can see below.

We start by examining the interaction between year and age. We find that the only significant interaction between age and year is in the age group 65-66 years old.

model <- lm (log(salary) ~ sex + year_n * age, data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 2: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-41.845379410.9079056-3.83624330.0002631
sexwomen-0.03387020.0051012-6.63962840.0000000
year_n0.02586720.00541074.78077330.0000088
age25-29 years14.362157915.42610770.93102930.3549073
age30-34 years27.517635915.42610771.78383530.0786064
age35-39 years20.993410015.42610771.36090130.1777316
age40-44 years16.249163115.42610771.05335470.2956526
age45-49 years5.753771715.42610770.37298920.7102373
age50-54 years0.690614715.42610770.04476920.9644135
age55-59 years14.203975915.42610770.92077510.3602008
age60-64 years13.302650015.42610770.86234650.3913210
age65-66 years44.064478119.77396372.22840900.0289332
year_n:age25-29 years-0.00706480.0076518-0.92328520.3589003
year_n:age30-34 years-0.01351720.0076518-1.76652640.0814879
year_n:age35-39 years-0.01022500.0076518-1.33628080.1856080
year_n:age40-44 years-0.00783650.0076518-1.02413630.3091527
year_n:age45-49 years-0.00261470.0076518-0.34170410.7335553
year_n:age50-54 years-0.00009660.0076518-0.01262980.9899576
year_n:age55-59 years-0.00679760.0076518-0.88836670.3772614
year_n:age60-64 years-0.00635410.0076518-0.83040830.4090156
year_n:age65-66 years-0.02161490.0098077-2.20388220.0306865
summary(model)$adj.r.squared  
## [1] 0.9817432
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##             Sum Sq Df  F value    Pr(>F)    
## sex        0.02581  1  44.0847 4.817e-09 ***
## year_n     0.06878  1 117.4762 < 2.2e-16 ***
## age        2.81366  9 533.9464 < 2.2e-16 ***
## year_n:age 0.00526  9   0.9978    0.4497    
## Residuals  0.04274 73                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "age"), colors = "Set3")
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 3: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 4: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

model <- lm (log(salary) ~ sex + year_n * bs(age_n, knots = c(35, 50)), data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 2: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-42.216909410.4483489-4.04053400.0001206
sexwomen-0.03399070.0048836-6.96016770.0000000
year_n0.02605130.00518275.02658340.0000029
bs(age_n, knots = c(35, 50))115.946933523.89377000.66740970.5064075
bs(age_n, knots = c(35, 50))237.385589623.75879811.57354720.1194903
bs(age_n, knots = c(35, 50))3-8.149359325.1608613-0.32389030.7468559
bs(age_n, knots = c(35, 50))48.614235122.33286120.38572020.7007151
bs(age_n, knots = c(35, 50))537.815397618.02851622.09753240.0390644
year_n:bs(age_n, knots = c(35, 50))1-0.00788540.0118521-0.66531590.5077385
year_n:bs(age_n, knots = c(35, 50))2-0.01833020.0117851-1.55537430.1237557
year_n:bs(age_n, knots = c(35, 50))30.00429390.01248050.34405190.7316989
year_n:bs(age_n, knots = c(35, 50))4-0.00402480.0110776-0.36332400.7173094
year_n:bs(age_n, knots = c(35, 50))5-0.01851610.0089421-2.07067130.0415743
summary(model)$adj.r.squared  
## [1] 0.98315
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                                      Sum Sq Df   F value    Pr(>F)    
## sex                                 0.02618  1   48.4439 8.049e-10 ***
## year_n                              0.06883  1  127.3747 < 2.2e-16 ***
## bs(age_n, knots = c(35, 50))        2.81344  5 1041.2680 < 2.2e-16 ***
## year_n:bs(age_n, knots = c(35, 50)) 0.00445  5    1.6454    0.1574    
## Residuals                           0.04377 81                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("age_n", "year_n"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 5: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 6: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

Examine the interaction between gender and age. We can see that women have a significantly lower salary for the age groups 35-39 years, 45-49 years, 50-54 years, 55-59 years and 60-64 years.

The F-value from the Anova table for the interaction term between gender and age is 3,2 (Pr(>F) < 0.003564), sufficient for rejecting the null hypothesis that the interaction between gender and age has no effect on the salary holding year as constant.

model <- lm (log(salary) ~ year_n + sex * age, data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 3: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-28.20222083.2212924-8.75493960.0000000
year_n0.01908960.001597911.94701900.0000000
sexwomen0.00690600.01388390.49741270.6203728
age25-29 years0.12876350.01388399.27431480.0000000
age30-34 years0.28413570.013883920.46513810.0000000
age35-39 years0.39940820.013883928.76775120.0000000
age40-44 years0.46526560.013883933.51118900.0000000
age45-49 years0.50763840.013883936.56313480.0000000
age50-54 years0.51757340.013883937.27870980.0000000
age55-59 years0.53892530.013883938.81660090.0000000
age60-64 years0.52981530.013883938.16044490.0000000
age65-66 years0.50544210.014731534.31028810.0000000
sexwomen:age25-29 years-0.01859790.0196348-0.94719130.3466242
sexwomen:age30-34 years-0.03423990.0196348-1.74383720.0853402
sexwomen:age35-39 years-0.03920780.0196348-1.99685330.0495192
sexwomen:age40-44 years-0.02907150.0196348-1.48061070.1429556
sexwomen:age45-49 years-0.05006040.0196348-2.54957860.0128556
sexwomen:age50-54 years-0.04357520.0196348-2.21928660.0295322
sexwomen:age55-59 years-0.07797340.0196348-3.97118880.0001643
sexwomen:age60-64 years-0.07426030.0196348-3.78207790.0003131
summary(model)$adj.r.squared  
## [1] 0.9849736
Anova(model, type = 2)
## Note: model has aliased coefficients
##       sums of squares computed by model comparison
## Anova Table (Type II tests)
## 
## Response: log(salary)
##            Sum Sq Df  F value    Pr(>F)    
## year_n    0.06878  1 142.7313 < 2.2e-16 ***
## sex       0.02581  1  53.5620 2.492e-10 ***
## age       2.81366  9 648.7344 < 2.2e-16 ***
## sex:age   0.01234  8   3.2006  0.003564 ** 
## Residuals 0.03566 74                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("age", "sex"))
## Warning in predict.lm(model, newdata = fitfram, type = "response", se.fit =
## se, : prediction from a rank-deficient fit may be misleading
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 7: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 8: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[75,]
## Source: local data frame [1 x 9]
## Groups: 
## 
## # A tibble: 1 x 9
##   `occuptional  (SSYK 2012~ age      sex   year  salary year_n age_l age_h age_n
##                                    
## 1 214 Engineering professi~ 65-66 y~ men   2017   46500   2017    65    66  65.5
model <- lm (log(salary) ~ year_n + sex * bs(age_n, knots = c(35, 50)), data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 3: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-28.20385363.1195378-9.04103590.0000000
year_n0.01909020.001547412.33710710.0000000
sexwomen0.00710070.01340780.52959580.5978408
bs(age_n, knots = c(35, 50))10.05691460.02164532.62941480.0102324
bs(age_n, knots = c(35, 50))20.45957990.021446521.42913440.0000000
bs(age_n, knots = c(35, 50))30.51317880.022554522.75281800.0000000
bs(age_n, knots = c(35, 50))40.55859900.019544328.58119880.0000000
bs(age_n, knots = c(35, 50))50.50569930.013945936.26137320.0000000
sexwomen:bs(age_n, knots = c(35, 50))1-0.01560190.0309691-0.50378990.6157765
sexwomen:bs(age_n, knots = c(35, 50))2-0.05206440.0314890-1.65341510.1021172
sexwomen:bs(age_n, knots = c(35, 50))3-0.01734550.0347256-0.49950180.6187800
sexwomen:bs(age_n, knots = c(35, 50))4-0.10831420.0346022-3.13026740.0024289
sexwomen:bs(age_n, knots = c(35, 50))5-0.05090370.0386819-1.31595630.1919009
summary(model)$adj.r.squared  
## [1] 0.9859063
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                                   Sum Sq Df   F value    Pr(>F)    
## year_n                           0.06880  1  152.2042 < 2.2e-16 ***
## sex                              0.02612  1   57.7814 4.558e-11 ***
## bs(age_n, knots = c(35, 50))     2.81344  5 1244.9030 < 2.2e-16 ***
## sex:bs(age_n, knots = c(35, 50)) 0.01161  5    5.1354 0.0003844 ***
## Residuals                        0.03661 81                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("age_n", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 9: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 10: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

Examine the interaction between gender and year. Both the model with categorical predictors and the model with continuous predictors indicates that women’s salaries increase more than men’s salaries.

The F-value from the Anova table for the interaction term between gender and year is 3,1 (Pr(>F) < 0.08053), with 90 % significance we can reject the null hypothesis that the interaction between gender and year has no effect on the salary holding age as constant.

model <- lm (log(salary) ~ year_n * sex + age, data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 4: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-22.26002784.8457687-4.5937040.0000158
year_n0.01615220.00240376.7198630.0000000
sexwomen-12.45321107.0175325-1.7745850.0797236
age25-29 years0.11946460.010682111.1836640.0000000
age30-34 years0.26701570.010682124.9966460.0000000
age35-39 years0.37980430.010682135.5553360.0000000
age40-44 years0.45072980.010682142.1950200.0000000
age45-49 years0.48260820.010682145.1793120.0000000
age50-54 years0.49578580.010682146.4129280.0000000
age55-59 years0.49993860.010682146.8016920.0000000
age60-64 years0.49268520.010682146.1226640.0000000
age65-66 years0.48578830.014366233.8147590.0000000
year_n:sexwomen0.00616040.00348091.7697590.0805306
summary(model)$adj.r.squared  
## [1] 0.9822101
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##             Sum Sq Df F value    Pr(>F)    
## year_n     0.06878  1 120.559 < 2.2e-16 ***
## sex        0.02581  1  45.242 2.266e-09 ***
## age        2.81430  9 548.084 < 2.2e-16 ***
## year_n:sex 0.00179  1   3.132   0.08053 .  
## Residuals  0.04621 81                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 11: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

model <- lm (log(salary) ~ year_n * sex + bs(age_n, knots = c(35, 50)), data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 4: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-22.28101634.7408239-4.6998190.0000099
year_n0.01616250.00235166.8729900.0000000
sexwomen-12.43257406.8662746-1.8106720.0737248
bs(age_n, knots = c(35, 50))10.04978870.01686082.9529200.0040694
bs(age_n, knots = c(35, 50))20.43233070.016783625.7590710.0000000
bs(age_n, knots = c(35, 50))30.50647970.017811028.4363850.0000000
bs(age_n, knots = c(35, 50))40.50144880.015918231.5016920.0000000
bs(age_n, knots = c(35, 50))50.48503170.013234336.6495260.0000000
year_n:sexwomen0.00615010.00340591.8057280.0745008
summary(model)$adj.r.squared  
## [1] 0.9829655
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                               Sum Sq Df   F value    Pr(>F)    
## year_n                       0.06883  1  125.9950 < 2.2e-16 ***
## sex                          0.02612  1   47.8063 8.182e-10 ***
## bs(age_n, knots = c(35, 50)) 2.81407  5 1030.2198 < 2.2e-16 ***
## year_n:sex                   0.00178  1    3.2607    0.0745 .  
## Residuals                    0.04644 85                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 12: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

Examine the interaction between year, gender and age. It’s becoming a bit difficult to make any significant predictions about the interactions from the model, the data is not sufficiently large, the degrees of freedom has been reduced from 82 in the model with no interaction terms to 56.

The following interaction terms are significant to 90 %.

year_n:sexwomen

sexwomen:age35-39 years

year_n:sexwomen:age35-39 years

In the graph below you can that the salaries for men in the age group 35-39 years has increased more than for women.

The F-value from the Anova table for the interaction term between gender and age is 3,3 (Pr(>F) < 0.00373), sufficient for rejecting the null hypothesis that the interaction between gender and age has no effect on the salary holding year as constant.

model <- lm (log(salary) ~ year_n * sex * age, data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 5: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-23.050025013.7844076-1.67218100.1000676
year_n0.01653400.00683752.41812900.0188783
sexwomen-37.624578919.4940962-1.93004990.0586714
age25-29 years0.305724719.49409620.01568290.9875431
age30-34 years13.421232319.49409620.68847680.4939950
age35-39 years-6.483521919.4940962-0.33258900.7406864
age40-44 years-4.355924919.4940962-0.22344840.8239997
age45-49 years-8.769029219.4940962-0.44983000.6545695
age50-54 years-0.878726319.4940962-0.04507650.9642067
age55-59 years-6.366465319.4940962-0.32658430.7451998
age60-64 years2.433150519.49409620.12481470.9011172
age65-66 years25.269123820.17964141.25220880.2156985
year_n:sexwomen0.01866640.00966971.93040470.0586265
year_n:age25-29 years-0.00008780.0096697-0.00907770.9927894
year_n:age30-34 years-0.00651640.0096697-0.67390150.5031462
year_n:age35-39 years0.00341420.00966970.35307780.7253558
year_n:age40-44 years0.00239150.00966970.24731550.8055686
year_n:age45-49 years0.00460150.00966970.47587070.6360177
year_n:age50-54 years0.00069260.00966970.07162680.9431541
year_n:age55-59 years0.00342530.00966970.35422990.7244970
year_n:age60-64 years-0.00094410.0096697-0.09763650.9225696
year_n:age65-66 years-0.01228170.0100091-1.22705930.2249341
sexwomen:age25-29 years28.112866527.56881521.01973430.3122398
sexwomen:age30-34 years28.192807327.56881521.02263400.3108784
sexwomen:age35-39 years54.953863927.56881521.99333430.0511051
sexwomen:age40-44 years41.210176027.56881521.49481130.1405778
sexwomen:age45-49 years29.045601927.56881521.05356730.2966058
sexwomen:age50-54 years3.138682027.56881520.11384900.9097647
sexwomen:age55-59 years41.140882327.56881521.49229780.1412337
sexwomen:age60-64 years21.738998927.56881520.78853580.4337089
year_n:sexwomen:age25-29 years-0.01395410.0136750-1.02040920.3119226
year_n:sexwomen:age30-34 years-0.01400150.0136750-1.02387620.3102964
year_n:sexwomen:age35-39 years-0.02727830.0136750-1.99475690.0509451
year_n:sexwomen:age40-44 years-0.02045600.0136750-1.49586620.1403033
year_n:sexwomen:age45-49 years-0.01443240.0136750-1.05538340.2957821
year_n:sexwomen:age50-54 years-0.00157850.0136750-0.11542960.9085175
year_n:sexwomen:age55-59 years-0.02044590.0136750-1.49512650.1404958
year_n:sexwomen:age60-64 years-0.01082010.0136750-0.79122970.4321485
summary(model)$adj.r.squared  
## [1] 0.9854223
Anova(model, type = 2)
## Note: model has aliased coefficients
##       sums of squares computed by model comparison
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                 Sum Sq Df  F value    Pr(>F)    
## year_n         0.06878  1 147.1250 < 2.2e-16 ***
## sex            0.02581  1  55.2108 6.727e-10 ***
## age            2.81430  9 668.8557 < 2.2e-16 ***
## year_n:sex     0.00113  1   2.4097   0.12622    
## year_n:age     0.00460  9   1.0927   0.38295    
## sex:age        0.01234  8   3.2991   0.00373 ** 
## year_n:sex:age 0.00310  8   0.8277   0.58192    
## Residuals      0.02618 56                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "age", "sex"))
## Warning in predict.lm(model, newdata = fitfram, type = "response", se.fit =
## se, : prediction from a rank-deficient fit may be misleading
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning: Removed 10 rows containing missing values (geom_path).

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 13: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 14: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

model <- lm (log(salary) ~ year_n * sex * bs(age_n, knots = c(35, 50)), data = tb)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 5: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-24.086894212.7931370-1.88279810.0638836
year_n0.01704810.00634582.68651470.0090115
sexwomen-35.959392218.0942402-1.98733920.0507987
bs(age_n, knots = c(35, 50))113.547102129.20578080.46385000.6441944
bs(age_n, knots = c(35, 50))2-3.530862228.9253132-0.12206820.9031947
bs(age_n, knots = c(35, 50))3-3.000375030.3944067-0.09871470.9216469
bs(age_n, knots = c(35, 50))4-10.681835526.2596260-0.40677790.6854122
bs(age_n, knots = c(35, 50))524.695664918.40127531.34206270.1839126
year_n:sexwomen0.01784050.00897531.98773320.0507540
year_n:bs(age_n, knots = c(35, 50))1-0.00669130.0144870-0.46188600.6455954
year_n:bs(age_n, knots = c(35, 50))20.00197900.01434780.13792860.8906928
year_n:bs(age_n, knots = c(35, 50))30.00174350.01507650.11564290.9082666
year_n:bs(age_n, knots = c(35, 50))40.00557460.01302550.42797810.6699809
year_n:bs(age_n, knots = c(35, 50))5-0.01199740.0091271-1.31448290.1929745
sexwomen:bs(age_n, knots = c(35, 50))1-3.088243341.7901518-0.07389880.9413017
sexwomen:bs(age_n, knots = c(35, 50))296.053315842.48355882.26095270.0268729
sexwomen:bs(age_n, knots = c(35, 50))3-33.360444746.8350731-0.71229620.4786489
sexwomen:bs(age_n, knots = c(35, 50))473.564170046.63142601.57756640.1191741
sexwomen:bs(age_n, knots = c(35, 50))5-29.663172752.0527929-0.56986710.5705923
year_n:sexwomen:bs(age_n, knots = c(35, 50))10.00152390.02072920.07351460.9416063
year_n:sexwomen:bs(age_n, knots = c(35, 50))2-0.04767090.0210732-2.26216100.0267943
year_n:sexwomen:bs(age_n, knots = c(35, 50))30.01653860.02323160.71189870.4788935
year_n:sexwomen:bs(age_n, knots = c(35, 50))4-0.03654290.0231306-1.57985190.1186493
year_n:sexwomen:bs(age_n, knots = c(35, 50))50.01468700.02581960.56883150.5712911
summary(model)$adj.r.squared  
## [1] 0.9873689
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                                          Sum Sq Df  F value    Pr(>F)    
## year_n                                  0.06880  1 169.8286 < 2.2e-16 ***
## sex                                     0.03780  6  15.5511 3.011e-11 ***
## bs(age_n, knots = c(35, 50))            2.82569 10 697.5506 < 2.2e-16 ***
## year_n:sex                              0.00126  1   3.1090 0.0822248 .  
## year_n:bs(age_n, knots = c(35, 50))     0.00394  5   1.9452 0.0977419 .  
## sex:bs(age_n, knots = c(35, 50))        0.01163  5   5.7405 0.0001702 ***
## year_n:sex:bs(age_n, knots = c(35, 50)) 0.00253  5   1.2489 0.2958798    
## Residuals                               0.02836 70                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("age_n", "year_n", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 15: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 16: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

Average basic salary, monthly salary and women´s salary as a percentage of men´s salary by sector, occupational group (SSYK 2012), gender and educational level (SUN). Year 2014 – 2018 Monthly salary All sectors

We expect that gender is an important factor in salaries. As a null hypothesis, we assume that the interaction between gender and education is not related to the salary and examine if we can reject this hypothesis with the data from Statistics Sweden.

tb <- readfile("000000CY.csv") %>% 
  filter(`occuptional  (SSYK 2012)` == "214 Engineering professionals") %>%    
  mutate(edulevel = `level of education`)

numedulevel <- read.csv("edulevel.csv") 

tbnum <- tb %>% 
  right_join(numedulevel, by = c("level of education" = "level.of.education")) %>%
  filter(!is.na(eduyears))
## Warning: Column `level of education`/`level.of.education` joining character
## vector and factor, coercing into character vector
numedulevel %>%
  knitr::kable(
  booktabs = TRUE,
  caption = 'Initial approach, length of education') 
Table 6: Initial approach, length of education
level.of.educationeduyears
primary and secondary education 9-10 years (ISCED97 2)9
upper secondary education, 2 years or less (ISCED97 3C)11
upper secondary education 3 years (ISCED97 3A)12
post-secondary education, less than 3 years (ISCED97 4+5B)14
post-secondary education 3 years or more (ISCED97 5A)15
post-graduate education (ISCED97 6)19
no information about level of educational attainmentNA
tb %>%
  ggplot () +  
    geom_point (mapping = aes(x = year_n,y = salary, colour = `level of education`, shape = sex)) + 
  labs(
    x = "Year",
    y = "Salary (SEK/month)"
  )

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 17: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

model <- lm (log(salary) ~ year_n + sex + edulevel, data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 6: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-34.87418184.5648337-7.6397490
year_n0.02267380.002264310.0134130
sexwomen-0.07431140.0063438-11.7139900
edulevelpost-secondary education 3 years or more (ISCED97 5A)-0.11043560.0108816-10.1487960
edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)-0.14465730.0108816-13.2937000
edulevelprimary and secondary education 9-10 years (ISCED97 2)-0.22075000.0111971-19.7148920
edulevelupper secondary education 3 years (ISCED97 3A)-0.18988660.0108816-17.4501720
edulevelupper secondary education, 2 years or less (ISCED97 3C)-0.21550650.0108816-19.8045830
summary(model)$adj.r.squared  
## [1] 0.9311952
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##            Sum Sq Df F value    Pr(>F)    
## year_n    0.05936  1  100.27 1.232e-13 ***
## sex       0.08124  1  137.22 4.476e-16 ***
## edulevel  0.34338  5  116.00 < 2.2e-16 ***
## Residuals 0.03019 51                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 18: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[20,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ men   no information a~ 2015   48400   2015 no infor~
tb[41,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ women no information a~ 2016   39400   2016 no infor~
tb[55,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ women no information a~ 2017   35300   2017 no infor~

We start by examining the interaction between year and education. The salaries for the education upper secondary education, 2 years or less increase more than other educations during the time period 2014-2018, however, this can not be shown with 95% significance from this data.

model <- lm(log(salary) ~ sex + year_n * edulevel, data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 7: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-29.304115310.9645273-2.67262910.0103775
sexwomen-0.07434990.0063563-11.69712410.0000000
year_n0.01991090.00543883.66092450.0006464
edulevelpost-secondary education 3 years or more (ISCED97 5A)3.509216515.50618260.22631080.8219623
edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)-3.749852315.5061826-0.24182950.8099871
edulevelprimary and secondary education 9-10 years (ISCED97 2)-6.819656716.6031078-0.41074580.6831661
edulevelupper secondary education 3 years (ISCED97 3A)-0.331111515.5061826-0.02135350.9830560
edulevelupper secondary education, 2 years or less (ISCED97 3C)-27.143154315.5061826-1.75047300.0867042
year_n:edulevelpost-secondary education 3 years or more (ISCED97 5A)-0.00179550.0076916-0.23343290.8164610
year_n:edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)0.00178830.00769160.23250050.8171806
year_n:edulevelprimary and secondary education 9-10 years (ISCED97 2)0.00327320.00823510.39746840.6928605
year_n:edulevelupper secondary education 3 years (ISCED97 3A)0.00007010.00769160.00910770.9927726
year_n:edulevelupper secondary education, 2 years or less (ISCED97 3C)0.01335700.00769161.73657530.0891557
summary(model)$adj.r.squared  
## [1] 0.9312476
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                  Sum Sq Df  F value    Pr(>F)    
## sex             0.08094  1 136.8227 2.212e-15 ***
## year_n          0.05936  1 100.3448 3.853e-13 ***
## edulevel        0.34338  5 116.0852 < 2.2e-16 ***
## year_n:edulevel 0.00298  5   1.0078     0.424    
## Residuals       0.02721 46                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "edulevel"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 19: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 20: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[10,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ women post-secondary e~ 2014   38800   2014 post-sec~
tb[25,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ women post-secondary e~ 2015   41200   2015 post-sec~
tb[29,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ men   upper secondary ~ 2016   42100   2016 upper se~
model <- lm(log(salary) ~ sex + year_n * bs(eduyears, knots = c(14)), data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 7: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-37.563885212.5188542-3.00058490.0042648
sexwomen-0.07441920.0064006-11.62693430.0000000
year_n0.02389810.00620923.84884370.0003504
bs(eduyears, knots = c(14))1-33.512994734.8826450-0.96073550.3415000
bs(eduyears, knots = c(14))252.974379667.55048920.78421900.4367647
bs(eduyears, knots = c(14))3-26.148533182.0008933-0.31888110.7512002
bs(eduyears, knots = c(14))48.284905916.69140800.49635750.6219077
year_n:bs(eduyears, knots = c(14))10.01661810.01730240.96044990.3416422
year_n:bs(eduyears, knots = c(14))2-0.02624120.0335071-0.78315290.4373844
year_n:bs(eduyears, knots = c(14))30.01306250.04067480.32114390.7494955
year_n:bs(eduyears, knots = c(14))4-0.00399970.0082789-0.48311820.6312082
summary(model)$adj.r.squared  
## [1] 0.9302821
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                                     Sum Sq Df  F value    Pr(>F)    
## sex                                0.08110  1 135.1856 1.453e-15 ***
## year_n                             0.05941  1  99.0272 2.972e-13 ***
## bs(eduyears, knots = c(14))        0.34299  4 142.9343 < 2.2e-16 ***
## year_n:bs(eduyears, knots = c(14)) 0.00179  4   0.7457    0.5657    
## Residuals                          0.02880 48                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "eduyears"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 21: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 22: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[25,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ women post-secondary e~ 2015   41200   2015 post-sec~

Examine the interaction between gender and education. The only significant difference in salaries for the length in education is for the group upper secondary education, 2 years or less.

The F-value from the Anova table for the interaction term between gender and education is 3,3 (Pr(>F) < 0.01195), sufficient for rejecting the null hypothesis that the interaction between gender and education has no effect on the salary holding year as constant.

model <- lm(log(salary) ~ sex * edulevel + year_n, data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 8: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-34.74304604.1272331-8.41799940.0000000
sexwomen-0.07710830.0138850-5.55335080.0000013
edulevelpost-secondary education 3 years or more (ISCED97 5A)-0.11680220.0138850-8.41210580.0000000
edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)-0.15971280.0138850-11.50253410.0000000
edulevelprimary and secondary education 9-10 years (ISCED97 2)-0.22510880.0138850-16.21236150.0000000
edulevelupper secondary education 3 years (ISCED97 3A)-0.19489420.0138850-14.03630720.0000000
edulevelupper secondary education, 2 years or less (ISCED97 3C)-0.19310890.0138850-13.90772380.0000000
year_n0.02260940.002047211.04389610.0000000
sexwomen:edulevelpost-secondary education 3 years or more (ISCED97 5A)0.01273310.01963640.64844500.5199213
sexwomen:edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)0.03011090.01963641.53342430.1320210
sexwomen:edulevelprimary and secondary education 9-10 years (ISCED97 2)0.00948970.02026660.46824540.6418203
sexwomen:edulevelupper secondary education 3 years (ISCED97 3A)0.01001530.01963640.51003850.6124632
sexwomen:edulevelupper secondary education, 2 years or less (ISCED97 3C)-0.04479520.0196364-2.28123680.0272117
summary(model)$adj.r.squared  
## [1] 0.9439866
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##               Sum Sq Df  F value    Pr(>F)    
## sex          0.08124  1 168.5530 < 2.2e-16 ***
## edulevel     0.34338  5 142.4863 < 2.2e-16 ***
## year_n       0.05879  1 121.9676 1.576e-14 ***
## sex:edulevel 0.00802  5   3.3293   0.01195 *  
## Residuals    0.02217 46                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("edulevel", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 23: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 24: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[12,]
## # A tibble: 1 x 8
##   sector  `occuptional  (S~ sex   `level of educa~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering ~ women post-graduate e~ 2014   45000   2014 post-gra~
model <- lm(log(salary) ~ sex * bs(eduyears, knots = c(14)) + year_n, data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 8: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-35.01379874.1599657-8.41684790.0000000
sexwomen-0.06916300.0148436-4.65946320.0000254
bs(eduyears, knots = c(14))10.05635050.03030351.85953840.0690857
bs(eduyears, knots = c(14))2-0.04517020.0604863-0.74678420.4588363
bs(eduyears, knots = c(14))30.28845460.07310103.94597250.0002585
bs(eduyears, knots = c(14))40.22487850.013979616.08613240.0000000
year_n0.02263220.002063510.96802360.0000000
sexwomen:bs(eduyears, knots = c(14))1-0.13629890.0434669-3.13569350.0029251
sexwomen:bs(eduyears, knots = c(14))20.23393280.08557792.73356540.0087471
sexwomen:bs(eduyears, knots = c(14))3-0.20772230.1036306-2.00444830.0506812
sexwomen:bs(eduyears, knots = c(14))4-0.00791400.0203999-0.38794510.6997718
summary(model)$adj.r.squared  
## [1] 0.943092
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                                  Sum Sq Df  F value    Pr(>F)    
## sex                             0.08128  1 165.9766 < 2.2e-16 ***
## bs(eduyears, knots = c(14))     0.34299  4 175.1086 < 2.2e-16 ***
## year_n                          0.05891  1 120.2975 1.128e-14 ***
## sex:bs(eduyears, knots = c(14)) 0.00708  4   3.6147   0.01181 *  
## Residuals                       0.02350 48                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("eduyears", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 25: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 26: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

Examine the interaction between gender and year There is no evidence that the salaries for men and women are interacting with year, holding education constant.

model <- lm(log(salary) ~ sex * year_n + edulevel, data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 9: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-33.23920456.3870956-5.20411880.0000037
sexwomen-3.47773099.2153884-0.37738300.7074862
year_n0.02186280.00316826.90070090.0000000
edulevelpost-secondary education 3 years or more (ISCED97 5A)-0.11043560.0109750-10.06250260.0000000
edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)-0.14465730.0109750-13.18066600.0000000
edulevelprimary and secondary education 9-10 years (ISCED97 2)-0.22094860.0113059-19.54270070.0000000
edulevelupper secondary education 3 years (ISCED97 3A)-0.18988660.0109750-17.30179550.0000000
edulevelupper secondary education, 2 years or less (ISCED97 3C)-0.21550650.0109750-19.63618780.0000000
sexwomen:year_n0.00168820.00457100.36931930.7134493
summary(model)$adj.r.squared  
## [1] 0.93001
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##             Sum Sq Df  F value    Pr(>F)    
## sex        0.08124  1 134.8940 8.258e-16 ***
## year_n     0.05936  1  98.5705 2.047e-13 ***
## edulevel   0.34343  5 114.0493 < 2.2e-16 ***
## sex:year_n 0.00008  1   0.1364    0.7134    
## Residuals  0.03011 50                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 27: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 28: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[18,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ men   post-secondary e~ 2015   44800   2015 post-sec~
model <- lm(log(salary) ~ sex * year_n + bs(eduyears, knots = c(14)), data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 9: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-33.46060366.3648511-5.25709140.0000029
sexwomen-3.51138369.1831982-0.38237040.7037755
year_n0.02186280.00315726.92481860.0000000
bs(eduyears, knots = c(14))1-0.01108250.0239981-0.46180720.6461835
bs(eduyears, knots = c(14))20.07204560.04728641.52360200.1337855
bs(eduyears, knots = c(14))30.18530330.05725463.23647790.0021284
bs(eduyears, knots = c(14))40.22141750.011251519.67885740.0000000
sexwomen:year_n0.00170490.00455510.37427670.7097502
summary(model)$adj.r.squared  
## [1] 0.9304967
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                              Sum Sq Df  F value    Pr(>F)    
## sex                         0.08128  1 135.8986 5.361e-16 ***
## year_n                      0.05941  1  99.3330 1.446e-13 ***
## bs(eduyears, knots = c(14)) 0.34304  4 143.3973 < 2.2e-16 ***
## sex:year_n                  0.00008  1   0.1401    0.7098    
## Residuals                   0.03050 51                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("eduyears", "sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 29: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 30: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

Examine the interaction between year, gender and education It’s becoming a bit difficult to make any significant predictions from the model, the data is not sufficiently large, the degrees of freedom has been reduced from 51 in the model with no interaction terms to 35.

There are no significant interaction terms in the summary.

The F-value from the Anova table for the interaction term between gender and education is 3,0 (Pr(>F) < 0.02377), sufficient for rejecting the null hypothesis that the interaction between gender and education has no effect on the salary holding year as constant.

model <- lm(log(salary) ~ sex * year_n * edulevel, data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 10: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-28.659871614.7502103-1.94301440.0600924
sexwomen-1.362837320.8599474-0.06533270.9482808
year_n0.01959200.00731662.67775440.0112090
edulevelpost-secondary education 3 years or more (ISCED97 5A)5.706353120.85994740.27355550.7860341
edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)1.892675720.85994740.09073250.9282224
edulevelprimary and secondary education 9-10 years (ISCED97 2)-6.522890120.8599474-0.31269930.7563648
edulevelupper secondary education 3 years (ISCED97 3A)3.404109320.85994740.16318880.8713084
edulevelupper secondary education, 2 years or less (ISCED97 3C)-32.837679920.8599474-1.57419760.1244390
sexwomen:year_n0.00063780.01034720.06163630.9512031
sexwomen:edulevelpost-secondary education 3 years or more (ISCED97 5A)-4.394273129.5004205-0.14895630.8824431
sexwomen:edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)-11.285056129.5004205-0.38253880.7043750
sexwomen:edulevelprimary and secondary education 9-10 years (ISCED97 2)1.350464832.98574390.04094090.9675757
sexwomen:edulevelupper secondary education 3 years (ISCED97 3A)-7.470441629.5004205-0.25323170.8015707
sexwomen:edulevelupper secondary education, 2 years or less (ISCED97 3C)11.389051329.50042050.38606400.7017862
year_n:edulevelpost-secondary education 3 years or more (ISCED97 5A)-0.00288850.0103472-0.27915490.7817689
year_n:edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)-0.00101800.0103472-0.09838900.9221848
year_n:edulevelprimary and secondary education 9-10 years (ISCED97 2)0.00312390.01034720.30190790.7645090
year_n:edulevelupper secondary education 3 years (ISCED97 3A)-0.00178520.0103472-0.17253180.8640133
year_n:edulevelupper secondary education, 2 years or less (ISCED97 3C)0.01619270.01034721.56494060.1265947
sexwomen:year_n:edulevelpost-secondary education 3 years or more (ISCED97 5A)0.00218600.01463310.14938790.8821050
sexwomen:year_n:edulevelpost-secondary education, less than 3 years (ISCED97 4+5B)0.00561270.01463310.38355960.7036250
sexwomen:year_n:edulevelprimary and secondary education 9-10 years (ISCED97 2)-0.00066520.0163603-0.04065840.9677993
sexwomen:year_n:edulevelupper secondary education 3 years (ISCED97 3A)0.00371050.01463310.25357130.8013105
sexwomen:year_n:edulevelupper secondary education, 2 years or less (ISCED97 3C)-0.00567160.0146331-0.38758260.7006721
summary(model)$adj.r.squared  
## [1] 0.9377879
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                      Sum Sq Df  F value    Pr(>F)    
## sex                 0.08897  6  27.6985 6.428e-12 ***
## year_n              0.05879  1 109.8150 2.456e-12 ***
## edulevel            0.35144 10  65.6498 < 2.2e-16 ***
## sex:year_n          0.00007  1   0.1338   0.71677    
## sex:edulevel        0.00800  5   2.9902   0.02377 *  
## year_n:edulevel     0.00298  5   1.1127   0.37150    
## sex:year_n:edulevel 0.00039  5   0.1445   0.98038    
## Residuals           0.01874 35                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("year_n", "edulevel","sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 31: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 32: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

tb[29,]
## # A tibble: 1 x 8
##   sector  `occuptional  (~ sex   `level of educat~ year  salary year_n edulevel 
##                                         
## 1 0 all ~ 214 Engineering~ men   upper secondary ~ 2016   42100   2016 upper se~
model <- lm(log(salary) ~ sex * year_n * bs(eduyears, knots = c(14)), data = tbnum)

summary(model) %>%  
  tidy() %>%
  knitr::kable( 
  booktabs = TRUE,
  caption = 'Summary from linear model fit')
Table 10: Summary from linear model fit
termestimatestd.errorstatisticp.value
(Intercept)-36.429707214.8747312-2.44910020.0189181
sexwomen-1.414602425.7220917-0.05499560.9564227
year_n0.02333450.00737833.16257100.0030250
bs(eduyears, knots = c(14))1-52.025513845.6462798-1.13975360.2613408
bs(eduyears, knots = c(14))293.320708491.11082781.02425490.3120264
bs(eduyears, knots = c(14))3-61.4602190110.1124840-0.55815850.5799266
bs(eduyears, knots = c(14))47.800498221.05762280.37043580.7130621
sexwomen:year_n0.00066700.01275690.05228890.9585654
sexwomen:bs(eduyears, knots = c(14))135.684538268.00140050.52476180.6027192
sexwomen:bs(eduyears, knots = c(14))2-81.1592876129.0648894-0.62882550.5331293
sexwomen:bs(eduyears, knots = c(14))369.2953946157.15611310.44093350.6616959
sexwomen:bs(eduyears, knots = c(14))40.040741233.25524250.00122510.9990288
year_n:bs(eduyears, knots = c(14))10.02583430.02264201.14098840.2608328
year_n:bs(eduyears, knots = c(14))2-0.04631240.0451939-1.02475090.3117952
year_n:bs(eduyears, knots = c(14))30.03062930.05461930.56077830.5781565
year_n:bs(eduyears, knots = c(14))4-0.00375770.0104452-0.35975670.7209699
sexwomen:year_n:bs(eduyears, knots = c(14))1-0.01776780.0337292-0.52677760.6013316
sexwomen:year_n:bs(eduyears, knots = c(14))20.04037380.06402020.63064180.5319531
sexwomen:year_n:bs(eduyears, knots = c(14))3-0.03447530.0779537-0.44225310.6607490
sexwomen:year_n:bs(eduyears, knots = c(14))4-0.00002380.0164940-0.00144280.9988562
summary(model)$adj.r.squared  
## [1] 0.9364598
Anova(model, type = 2)
## Anova Table (Type II tests)
## 
## Response: log(salary)
##                                         Sum Sq Df  F value    Pr(>F)    
## sex                                    0.08817  5  32.2521 7.295e-13 ***
## year_n                                 0.05891  1 107.7412 8.809e-13 ***
## bs(eduyears, knots = c(14))            0.35011  8  80.0434 < 2.2e-16 ***
## sex:year_n                             0.00008  1   0.1517   0.69908    
## sex:bs(eduyears, knots = c(14))        0.00706  4   3.2268   0.02218 *  
## year_n:bs(eduyears, knots = c(14))     0.00179  4   0.8168   0.52231    
## sex:year_n:bs(eduyears, knots = c(14)) 0.00032  4   0.1468   0.96335    
## Residuals                              0.02132 39                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot_model(model, type = "pred", terms = c("eduyears", "year_n","sex"))
## Model has log-transformed response. Back-transforming predictions to original response scale. Standard errors are still on the log-scale.

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 33: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

plot(model, which = 1)

SSYK 214, Architects, engineers and related professionals, Year 2014 - 2018

Figure 34: SSYK 214, Architects, engineers and related professionals, Year 2014 – 2018

To leave a comment for the author, please follow the link and comment on their blog: R Analystatistics Sweden .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)