# Microeconomic Theory and Linear Regression (Part 2)

April 21, 2017
By

(This article was first published on Pachá (Batteries Included), and kindly contributed to R-bloggers)

# Introduction

In the first part of this article I explained and compared four different functional forms to model apples’ production.

This is what I need to retake the previously explained examples:

# Libraries

#install.packages(c("micEcon","lmtest","bbmle","miscTools"))
library(micEcon)
library(lmtest)
library(stats4) #this is a base package so I don't install this
library(bbmle)
library(miscTools)

data("appleProdFr86", package = "micEcon")
data = appleProdFr86
rm(appleProdFr86)

data$qCap = data$vCap/data$pCap data$qLab = data$vLab/data$pLab
data$qMat = data$vMat/data$pMat # Fit models ## Linear specification prodLin = lm(qOut ~ qCap + qLab + qMat, data = data) data$qOutLin = fitted(prodLin)

prodQuad = lm(qOut ~ qCap + qLab + qMat + I(0.5*qCap^2) + I(0.5*qLab^2) + I(0.5*qMat^2)
+ I(qCap*qLab) + I(qCap*qMat) + I(qLab*qMat), data = data)
data$qOutQuad = fitted(prodQuad) ## Cobb-Douglas specification prodCD = lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat), data = data) data$qOutCD = fitted(prodCD)

## Translog specification
prodTL = lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat) + I(0.5*log(qCap)^2)
+ I(0.5*log(qLab)^2) + I(0.5*log(qMat)^2) + I(log(qCap)*log(qLab))
+ I(log(qCap)*log(qMat)) + I(log(qLab)*log(qMat)), data = data)
dataqOutTL = fitted(prodTL) These four specifications presented different problems that suggested model misspecification. Ideally, I should find a model with positive predicted output and positive factor elasticity to make sense of the context of production I’m working with. # Model misspectification RESET test (the name stands for Regression Equation Specification Error Test) is a test for functional form misspecification. As a general test, RESET helps detecting ommited variables. Package lmtest provides resettest, a function that provides one of the many versions of the test, and it fits this model by default: $$y_i = \beta_0 + \beta_1 x_1 + \ldots + \beta_p x_p + \gamma_1 \hat{y}_i^2 + \gamma_2 \hat{y}_i^3 + e_i$$ The key idea behind fitting another model that includes the output is to detect if the model ignores an important nonlinearity. The hypothesis testing is: $$H_0:\: \gamma_i = 0 \text{ for any } i \\ H_1:\: \gamma_i \neq 0 \text{ for some } i$$ If there is evidence to reject the null hypthosis, the test does not provide an alternative model and it just states that the model is misspecified. The statistic will be a t statistic if we just include $$\gamma_1 \hat{y}_i^2$$, and it will change to a F statistic if we include $$\gamma_2 \hat{y}_i^3$$ or a higher power. In the context of the worked specifications: resettest(prodLin) RESET test data: prodLin RESET = 17.639, df1 = 2, df2 = 134, p-value = 1.584e-07 resettest(prodQuad) RESET test data: prodQuad RESET = 7.3663, df1 = 2, df2 = 128, p-value = 0.0009374 resettest(prodCD) RESET test data: prodCD RESET = 2.9224, df1 = 2, df2 = 134, p-value = 0.05724 resettest(prodTL) RESET test data: prodTL RESET = 1.2811, df1 = 2, df2 = 128, p-value = 0.2813 $$\Longrightarrow$$ The null hypothesis is rejected both for Linear and Quadratic specifications while Cobb-Douglas specification is rejected at 10% of significance. The null hypothesis can’t be rejected for the Translog Specification. I can also modify the powers in the model: resettest(prodTL, power = 2) RESET test data: prodTL RESET = 2.358, df1 = 1, df2 = 129, p-value = 0.1271 resettest(prodTL, power = 2:4) RESET test data: prodTL RESET = 1.3475, df1 = 3, df2 = 127, p-value = 0.262 $$\Longrightarrow$$ The null hypothesis can’t be rejected for Translog Specification, not even by adding a higher power. # Quasiconcavity A function \begin{align*} f: \mathbb{R}^n &\to \mathbb{R} \cr x &\mapsto f(x) \end{align*} is quasiconcave if for $$0 \leq t \leq 1$$ $$f(tx_1 + (1-t)x_2) \geq \min\{f(x_1),f(x_2)\}$$ In particular, any concave and twice continuously differentiable function is quasiconcave. In really raw terms, quasiconcavity guarantees there is a production plan where the company is minimizing cost. Adding differentiability, you can find an optimal production plan by using derivatives. To cehck the details please check Advanced Microeconomic Theory by Geoffrey A. Jehle. I’ll check the quasiconcavity in an indirect way, by checking the concavity at each observation using the hessian method: # Quadratic Specification ## Regression coefficients b1 = coef(prodQuad)["qCap"] b2 = coef(prodQuad)["qLab"] b3 = coef(prodQuad)["qMat"] b11 = coef(prodQuad)["I(0.5 * qCap^2)"] b22 = coef(prodQuad)["I(0.5 * qLab^2)"] b33 = coef(prodQuad)["I(0.5 * qMat^2)"] b12 = b21 = coef(prodQuad)["I(qCap * qLab)"] b13 = b31 = coef(prodQuad)["I(qCap * qMat)"] b23 = b32 = coef(prodQuad)["I(qLab * qMat)"] ## Marginal productivity datampCapQuad = with(data, b1 + b11*qCap + b12*qLab + b13*qMat)
data$mpLabQuad = with(data, b2 + b21*qCap + b22*qLab + b23*qMat) data$mpMatQuad = with(data, b3 + b31*qCap + b32*qLab + b33*qMat)

## Quasiconcavity
data$quasiConcQuad = NA for(obs in 1:nrow(data)) { hmLoop = matrix(0, nrow = 3, ncol = 3) hmLoop[1,1] = b11 hmLoop[1,2] = hmLoop[2,1] = b12 hmLoop[1,3] = hmLoop[3,1] = b13 hmLoop[2,2] = b22 hmLoop[2,3] = hmLoop[3,2] = b23 hmLoop[3,3] = b33 data$quasiConcQuad[obs] = hmLoop[1,1] < 0 & det(hmLoop[1:2,1:2] > 0) & det(hmLoop < 0)
}

sum(!data$quasiConcQuad) [1] 140 # Translog Specification ## Regression coefficients b1 = coef(prodTL)["log(qCap)"] b2 = coef(prodTL)["log(qLab)"] b3 = coef(prodTL)["log(qMat)"] b11 = coef(prodTL)["I(0.5 * log(qCap)^2)"] b22 = coef(prodTL)["I(0.5 * log(qLab)^2)"] b33 = coef(prodTL)["I(0.5 * log(qMat)^2)"] b12 = b21 = coef(prodTL)["I(log(qCap) * log(qLab))"] b13 = b31 = coef(prodTL)["I(log(qCap) * log(qMat))"] b23 = b32 = coef(prodTL)["I(log(qLab) * log(qMat))"] ## Scale elasticity data$eCapTL = with(data, b1 + b11*log(qCap) + b12*log(qLab) + b13*log(qMat))
data$eLabTL = with(data, b2 + b21*log(qCap) + b22*log(qLab) + b23*log(qMat)) data$eMatTL = with(data, b3 + b31*log(qCap) + b32*log(qLab) + b33*log(qMat))

## Marginal productivity
data$mpCapTL = with(data, eCapTL * qOutTL / qCap) data$mpLabTL = with(data, eLabTL * qOutTL / qLab)
data$mpMatTL = with(data, eMatTL * qOutTL / qMat) ## Derivatives of marginal productivity data$dmpCapCapTL = with(data, (qOutTL / qCap^2) * (b11 + eCapTL^2 - eCapTL))
data$dmpLabLabTL = with(data, (qOutTL / qLab^2) * (b22 + eLabTL^2 - eLabTL)) data$dmpMatMatTL = with(data, (qOutTL / qLab^2) * (b33 + eMatTL^2 - eMatTL))

data$dmpCapLabTL = with(data, (qOutTL / (qCap * qLab)) * (b12 + eCapTL * eLabTL)) data$dmpCapMatTL = with(data, (qOutTL / (qCap * qMat)) * (b13 + eCapTL * eMatTL))

data$dmpLabMatTL = with(data, (qOutTL / qLab * qMat) * (b23 + eLabTL * qMat)) ## Quasiconcavity data$quasiConcTL = NA

for(obs in 1:nrow(data)) {
hmLoop = matrix(0, nrow = 3, ncol = 3)
hmLoop[1,1] = data$dmpCapCapTL[obs] hmLoop[1,2] = hmLoop[2,1] = data$dmpCapLabTL[obs]
hmLoop[1,3] = hmLoop[3,1] = data$dmpCapMatTL[obs] hmLoop[2,2] = data$dmpLabLabTL[obs]
hmLoop[2,3] = hmLoop[3,2] = data$dmpLabMatTL[obs] hmLoop[3,3] = data$dmpMatMatTL[obs]

data$quasiConcTL[obs] = hmLoop[1,1] < 0 & det(hmLoop[1:2,1:2] > 0) & det(hmLoop < 0) } sum(!data$quasiConcTL)
[1] 126

With this information I can extend the information from the last table in the first part of the post:

$$R^2$$ on $$y$$ 0.85 0.77
$$R^2$$ on $$ln(y)$$ 0.55 0.63
Obs. with negative predicted output 0 0
Obs. that violate the monotonicity condition 39 48
Obs. with negative scale elasticity 22 0
Obs. that violate the quasiconcavity condition 140 126

$$\Longrightarrow$$ Now I have more evidence in favour of a different approach as it was mentioned at the end of the first part of the post.

# Non-parametric regression

A non-parametric model is of the form:

$$Y_i = m(X_i) + \varepsilon_i$$

The functional form is not defined beforehand
Obtains a best fit function $$m$$ from the data
* Obtains marginal effects

Higher computational cost
There are no regression coefficients (the focus is the regression function)
* Hard to interpret

## Single variable non-parametric regression

I can predict the output by using capital as the only input:

library(KernSmooth)

qCap = as.vector(data$qCap) qOut = as.vector(data$qOut)

bw = dpill(qCap,qOut)
fit1 = locpoly(qCap, qOut, bandwidth = bw, degree = 1) # 1st degree polynomial
plot(fit1$x, fit1$y, type = "l")

Here are different specifications:

# 2nd degree polynomial
fit2.1 <- locpoly(qCap, qOut, bandwidth = bw, degree = 2)
fit2.2 <- locpoly(qCap, qOut, bandwidth = bw*2, degree = 2) # less variance / more bias
fit2.3 <- locpoly(qCap, qOut, bandwidth = bw/2, degree = 2) # more variance / less bias

# linear regression
fit.lm <- lm(qOut ~ qCap)

# plots
plot(fit1$x, fit1$y, type = "l");par(mfrow=c(2,2))
plot(fit1$x, fit1$y, type = "l");lines(fit2.1, col = "blue")
plot(fit1$x, fit1$y, type = "l");lines(fit2.2, col = "red")
plot(fit1$x, fit1$y, type = "l");lines(fit2.3, col = "green")
plot(fit1$x, fit1$y, type = "l");abline(fit.lm, col = "purple")

Checking the models:

# model 1
sum(is.na(fit1$y)) [1] 50 # models 2, 3 y 4 sum(is.na(fit2.1$y))
[1] 50
sum(is.na(fit2.2$y)) [1] 9 sum(is.na(fit2.3$y))
[1] 84

$$\Longrightarrow$$ I should use model 2.2 and adjust it.

I’ll try with a neoclassical function with diminishing marginal returns:

# se ajusta la banda y el grado
fit.sqrt = locpoly(qCap, qOut, bandwidth = bw*3, degree = 0.5)
sum(is.na(fit.sqrt$y)) [1] 0 sum(fit.sqrt$y < 0)
[1] 0

$$\Longrightarrow$$ In this model there is no indetermined or negative predicted output.

plot(fit.sqrt$x, fit.sqrt$y, type = "l")

I can obtain the median productivity:

median(fit.sqrt$y)/median(fit.sqrt$x)
[1] 18.05637

$$\Longrightarrow$$ In median term each capital unit increases the output in 18 units.

If I adjust the bandwidth the function will look more and more like a neoclassical function from textbooks, with an initial part of increasing marginal returns and from the point where marginal and median productivities are equal (the same point where marginal productivity is at maximum) the return becomes increasing at decreasing scale.

This is an example of the last paragraph:

fit.sqrt.2 = locpoly(qCap, qOut, bandwidth = bw*5, degree = 0.5)
plot(fit.sqrt.2$x, fit.sqrt.2$y, type = "l")

## Multivariable non-parametric regression

Among many alternatives, Epanechnikov’s method is a non-parametric method that weights the contribution of each observation and iterating obtains the best fit function.

I’ll fit a regression applying logs to the variables:

library(np)
options(np.messages=FALSE)

prodNP = npreg(log(qOut) ~ log(qCap) + log(qLab) + log(qMat), regtype = "ll",
bwmethod = "cv.aic", ckertype = "epanechnikov",  data = data,
summary(prodNP)
Regression Data: 140 training points, in 3 variable(s)
log(qCap) log(qLab) log(qMat)
Bandwidth(s):  1.039647    332644 0.8418465

Kernel Regression Estimator: Local-Linear
Bandwidth Type: Fixed
Residual standard error: 0.6227669
R-squared: 0.6237078

Continuous Kernel Type: Second-Order Epanechnikov
No. Continuous Explanatory Vars.: 3

These are the regression plots:

the variables that do not appear in a plot are considered to be fixed to a value equal to their median.

I can test the significance for this model:

options(np.messages=FALSE)
npsigtest(prodNP)
Kernel Regression Significance Test
Type I Test with IID Bootstrap (399 replications, Pivot = TRUE, joint = FALSE)
Explanatory variables tested for significance:
log(qCap) (1), log(qLab) (2), log(qMat) (3)

log(qCap) log(qLab) log(qMat)
Bandwidth(s):  1.039647    332644 0.8418465

Individual Significance Tests
P Value:
log(qCap) 0.11779
log(qLab) < 2e-16 ***
log(qMat) < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$$\Longrightarrow$$ at 10% of significance the only variable with non-statistical significant effects is capital.

Finally I can obtain information about Product-Factor Elasticity and Scale Elasticity:

Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-0.10300  0.08598  0.14360  0.14500  0.18810  0.30060
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.02208 0.63050 0.69880 0.70970 0.80670 0.97890
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.3873  0.4546  0.5669  0.5850  0.6502  1.3480
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.267   1.363   1.416   1.440   1.479   1.880
par(mfrow=c(2,2))

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...