The Hitchhiker’s Guide to Linear Models is now complete
[This article was first published on pacha.dev/blog, and kindly contributed to Rbloggers]. (You can report issue about the content on this page here)
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
The book can be downloaded for free but you will need a Leanpub account, same if you buy it.
The Hitchhiker’s Guide to Linear Models is finally complete. It took me a while to finish it but I’m happy with the result. I hope you enjoy it as much as I did writing it.
The Github repository contains the code for the book so that the readers can avoid copy and paste from the PDF.
Table of contents:
Contents 



Preface  i 


1 R Setup  1 
1.1 R and Rstudio  1 
1.2 Installing R  1 
1.2.1 Windows and Mac  1 
1.2.2 Linux  1 
1.3 Installing RStudio  2 
1.3.1 Windows and Mac  2 
1.3.2 Linux  2 
1.4 Installing R Packages.  2 
1.5 Changing RStudio colors and font  4 
1.6 Installing Quarto  4 
1.6.1 Windows and Mac  4 
1.6.2 Linux1  4 


2 Linear algebra review  5 
2.1 Using R as a calculator  5 
2.2 System of linear equations  5 
2.3 Matrix  5 
2.4 Transpose matrix  6 
2.5 Matrix multiplication  6 
2.6 Matrix representation of a system of linear equations  6 
2.7 Identity matrix  7 
2.8 Inverse matrix  7 
2.9 Solving systems of linear equations  7 


3 Statistics review 

3.1 Using R as a calculator  11 
3.2 Data and dataset  11 
3.3 Summation  11 
3.4 Probability  11 
3.5 Descriptive statistics  13 
3.5.1 Mean  13 
3.5.2 Variance  13 
3.5.3 Standard deviation  14 
3.5.4 Covariance  15 
3.5.5 Correlation  16 
3.6 Distributions  20 
3.6.1 Normal distribution  20 
3.6.2 Poisson distribution  22 
3.6.3 Student’s tdistribution  23 
3.6.4 Computing probabilities with the normal distribution  24 
3.6.5 Computing probabilities with the Poisson distribution  27 
3.6.6 Computing probabilities with the tdistribution  28 
3.7 Sample size  29 


4 Recommended workflow  30 
4.1 Creating projects  30 
4.2 Creating scripts  30 
4.3 Creating notebooks  32 
4.4 Organizing code sections  33 
4.5 Customizing notebooks’ output  34 


5 Read, Manipulate, and Plot Data  35 
5.1 The datasauRus dataset in R format.  35 
5.2 The Quality of Government dataset in CSV format.  40 
5.3 The Quality of Government dataset in SAV (SPSS) format  44 
5.4 The Quality of Government dataset in DTA (Stata) format  48 
5.5 The Freedom House dataset in XLSX (Excel) format  50 


6 Linear Model with One Explanatory Variable  60 
6.1 Model specification  60 
6.2 The Galton dataset  64 
6.3 A word of caution about Galton’s work  64 
6.4 Loading the Galton dataset  65 
6.5 Estimating linear models’ coefficients  66 
6.5.1 Linear model as correlation  66 
6.5.2 Linear model as matrix multiplication  67 
6.5.3 Relation between correlation and matrix multiplication  71 
6.5.4 Computational note  75 
6.6 Logarithmic transformations  75 
6.7 Plotting model results  76 
6.8 Linear model does not equal straight line  81 
6.9 Transforming variables  85 
6.10 Regression with weights  89 


7 Linear Model with Multiple Explanatory Variables  91 
7.1 Model specification  91 
7.2 Life expectancy, GDP and wellbeing in the Quality of Government dataset  94 
7.3 Estimating linear models’ coefficients  96 
7.4 Model accuracy  103 
7.4.1 Root Mean Squared Error and Mean Absolute Error  103 
7.4.2 RMSE and MAE interpretation  104 
7.5 Model summary  107 
7.5.1 Coefficient’s standard error  107 
7.5.2 Coefficient’s tstatistic  108 
7.5.3 Coefficient’s pvalue  108 
7.5.4 Residual standard error  109 
7.5.5 Model’s multiple Rsquared (or unadjusted Rsquared)  109 
7.5.6 Model’s adjusted Rsquared  110 
7.5.7 Model’s Fstatistic  111 
7.6 Error’s assumptions  111 
7.6.1 Error’s normality  112 
7.6.2 Error’s homoscedasticity (homogeneous variance)  113 


8 Linear Model with Binary and Categorical Explanatory Variables  114 
8.1 Model specification with binary variables  114 
8.1.1 ANOVA is a particular case of a linear model with binary variables  114 
8.1.2 Corruption and popular vote in the Quality of Government dataset  114 
8.1.3 Estimating a linear model and ANOVA with one predictor and two categories  116 
8.1.4 Corruption and regime type in the Quality of Government dataset  118 
8.1.5 Estimating a linear model and ANOVA with one predictor and multiple categories  120 
8.1.6 Estimating a linear model with continuous and categorical predictors  126 
8.2 Model specification with binary interactions  128 
8.2.1 Corruption and interaction variables in the Quality of Government dataset  128 
8.2.2 Estimating a linear model with binary interactions  131 
8.2.3 Confidence intervals with binary interactions  133 
8.3 Model specification with categorical interactions  136 
8.3.1 Estimating a linear model with categorical interactions  136 
8.3.2 Confidence intervals with categorical interactions  137 


9 Linear Model with Fixed Effects  140 
9.1 Year fixed effects  140 
9.1.1 Model specification  140 
9.1.2 Corruption and popular vote in the Quality of Government dataset  140 
9.1.3 Estimating year fixed effects’ coefficients  142 
9.2 Country fixed effects  145 
9.2.1 Model specification  145 
9.2.2 Corruption and popular vote in the Quality of Government dataset  145 
9.2.3 Estimating countrytime fixed effects’ coefficients  145 
9.3 Countryyear fixed effects  148 
9.3.1 Model specification  148 
9.3.2 Corruption and popular vote in the Quality of Government dataset  149 
9.3.3 Estimating countrytime fixed effects’ coefficients  149 


10 Generalized Linear Model with One Explanatory Variable  152 
10.1 Model specification  152 
10.2 Model families.  152 
10.2.1 Gaussian model  153 
10.2.2 Poisson model  153 
10.2.3 QuasiPoisson model  154 
10.2.4 Binomial model (or logit model)  157 


11 Generalized Linear Model with Multiple Explanatory Variables  165 
11.1 Obtaining the original codes and data  165 
11.2 Loading the original data  165 
11.3 Ordinary Least Squares  166 
11.4 Poisson Pseudo Maximum Likelihood  167 
11.5 Tobit  169 
11.6 Reporting multiple models  170 


References  172 
Don’t panic!
To leave a comment for the author, please follow the link and comment on their blog: pacha.dev/blog.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.