**Pachá (Batteries Included)**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Introduction

In the first part of this article I explained and compared four different functional forms to model apples’ production.

This is what I need to retake the previously explained examples:

```
# Libraries
#install.packages(c("micEcon","lmtest","bbmle","miscTools"))
library(micEcon)
library(lmtest)
library(stats4) #this is a base package so I don't install this
library(bbmle)
library(miscTools)
# Load data
data("appleProdFr86", package = "micEcon")
data = appleProdFr86
rm(appleProdFr86)
data$qCap = data$vCap/data$pCap
data$qLab = data$vLab/data$pLab
data$qMat = data$vMat/data$pMat
# Fit models
## Linear specification
prodLin = lm(qOut ~ qCap + qLab + qMat, data = data)
data$qOutLin = fitted(prodLin)
## Quadratic specification
prodQuad = lm(qOut ~ qCap + qLab + qMat + I(0.5*qCap^2) + I(0.5*qLab^2) + I(0.5*qMat^2)
+ I(qCap*qLab) + I(qCap*qMat) + I(qLab*qMat), data = data)
data$qOutQuad = fitted(prodQuad)
## Cobb-Douglas specification
prodCD = lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat), data = data)
data$qOutCD = fitted(prodCD)
## Translog specification
prodTL = lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat) + I(0.5*log(qCap)^2)
+ I(0.5*log(qLab)^2) + I(0.5*log(qMat)^2) + I(log(qCap)*log(qLab))
+ I(log(qCap)*log(qMat)) + I(log(qLab)*log(qMat)), data = data)
data$qOutTL = fitted(prodTL)
```

These four specifications presented different problems that suggested model misspecification. Ideally, I should find a model with positive predicted output and positive factor elasticity to make sense of the context of production I’m working with.

# Model misspectification

RESET test (the name stands for *Regression Equation Specification Error Test*) is a test for functional form misspecification.

As a general test, RESET helps detecting ommited variables. Package `lmtest`

provides `resettest`

, a function that provides one of the many versions of the test, and it fits this model by default:

The key idea behind fitting another model that includes the output is to detect if the model ignores an important nonlinearity.

The hypothesis testing is:

H_1:\: \gamma_i \neq 0 \text{ for some } i$$

If there is evidence to reject the null hypthosis, the test does not provide an alternative model and it just states that the model is misspecified.

The statistic will be a *t* statistic if we just include \(\gamma_1 \hat{y}_i^2\), and it will change to a *F* statistic if we include \(\gamma_2 \hat{y}_i^3\) or a higher power.

In the context of the worked specifications:

```
resettest(prodLin)
```

```
RESET test
data: prodLin
RESET = 17.639, df1 = 2, df2 = 134, p-value = 1.584e-07
```

```
resettest(prodQuad)
```

```
RESET test
data: prodQuad
RESET = 7.3663, df1 = 2, df2 = 128, p-value = 0.0009374
```

```
resettest(prodCD)
```

```
RESET test
data: prodCD
RESET = 2.9224, df1 = 2, df2 = 134, p-value = 0.05724
```

```
resettest(prodTL)
```

```
RESET test
data: prodTL
RESET = 1.2811, df1 = 2, df2 = 128, p-value = 0.2813
```

\(\Longrightarrow\) The null hypothesis is rejected both for Linear and Quadratic specifications while Cobb-Douglas specification is rejected at 10% of significance. The null hypothesis can’t be rejected for the Translog Specification.

I can also modify the powers in the model:

```
resettest(prodTL, power = 2)
```

```
RESET test
data: prodTL
RESET = 2.358, df1 = 1, df2 = 129, p-value = 0.1271
```

```
resettest(prodTL, power = 2:4)
```

```
RESET test
data: prodTL
RESET = 1.3475, df1 = 3, df2 = 127, p-value = 0.262
```

\(\Longrightarrow\) The null hypothesis can’t be rejected for Translog Specification, not even by adding a higher power.

# Quasiconcavity

A function

\begin{align*}

f: \mathbb{R}^n &\to \mathbb{R} \cr

x &\mapsto f(x)

\end{align*}

$$

is quasiconcave if for \(0 \leq t \leq 1\)

In particular, any concave and twice continuously differentiable function is quasiconcave.

In really raw terms, quasiconcavity guarantees there is a production plan where the company is minimizing cost. Adding differentiability, you can find an optimal production plan by using derivatives.

To cehck the details please check Advanced Microeconomic Theory by Geoffrey A. Jehle.

I’ll check the quasiconcavity in an indirect way, by checking the concavity at each observation using the hessian method:

```
# Quadratic Specification
## Regression coefficients
b1 = coef(prodQuad)["qCap"]
b2 = coef(prodQuad)["qLab"]
b3 = coef(prodQuad)["qMat"]
b11 = coef(prodQuad)["I(0.5 * qCap^2)"]
b22 = coef(prodQuad)["I(0.5 * qLab^2)"]
b33 = coef(prodQuad)["I(0.5 * qMat^2)"]
b12 = b21 = coef(prodQuad)["I(qCap * qLab)"]
b13 = b31 = coef(prodQuad)["I(qCap * qMat)"]
b23 = b32 = coef(prodQuad)["I(qLab * qMat)"]
## Marginal productivity
data$mpCapQuad = with(data, b1 + b11*qCap + b12*qLab + b13*qMat)
data$mpLabQuad = with(data, b2 + b21*qCap + b22*qLab + b23*qMat)
data$mpMatQuad = with(data, b3 + b31*qCap + b32*qLab + b33*qMat)
## Quasiconcavity
data$quasiConcQuad = NA
for(obs in 1:nrow(data)) {
hmLoop = matrix(0, nrow = 3, ncol = 3)
hmLoop[1,1] = b11
hmLoop[1,2] = hmLoop[2,1] = b12
hmLoop[1,3] = hmLoop[3,1] = b13
hmLoop[2,2] = b22
hmLoop[2,3] = hmLoop[3,2] = b23
hmLoop[3,3] = b33
data$quasiConcQuad[obs] = hmLoop[1,1] < 0 & det(hmLoop[1:2,1:2] > 0) & det(hmLoop < 0)
}
sum(!data$quasiConcQuad)
```

```
[1] 140
```

```
# Translog Specification
## Regression coefficients
b1 = coef(prodTL)["log(qCap)"]
b2 = coef(prodTL)["log(qLab)"]
b3 = coef(prodTL)["log(qMat)"]
b11 = coef(prodTL)["I(0.5 * log(qCap)^2)"]
b22 = coef(prodTL)["I(0.5 * log(qLab)^2)"]
b33 = coef(prodTL)["I(0.5 * log(qMat)^2)"]
b12 = b21 = coef(prodTL)["I(log(qCap) * log(qLab))"]
b13 = b31 = coef(prodTL)["I(log(qCap) * log(qMat))"]
b23 = b32 = coef(prodTL)["I(log(qLab) * log(qMat))"]
## Scale elasticity
data$eCapTL = with(data, b1 + b11*log(qCap) + b12*log(qLab) + b13*log(qMat))
data$eLabTL = with(data, b2 + b21*log(qCap) + b22*log(qLab) + b23*log(qMat))
data$eMatTL = with(data, b3 + b31*log(qCap) + b32*log(qLab) + b33*log(qMat))
## Marginal productivity
data$mpCapTL = with(data, eCapTL * qOutTL / qCap)
data$mpLabTL = with(data, eLabTL * qOutTL / qLab)
data$mpMatTL = with(data, eMatTL * qOutTL / qMat)
## Derivatives of marginal productivity
data$dmpCapCapTL = with(data, (qOutTL / qCap^2) * (b11 + eCapTL^2 - eCapTL))
data$dmpLabLabTL = with(data, (qOutTL / qLab^2) * (b22 + eLabTL^2 - eLabTL))
data$dmpMatMatTL = with(data, (qOutTL / qLab^2) * (b33 + eMatTL^2 - eMatTL))
data$dmpCapLabTL = with(data, (qOutTL / (qCap * qLab)) * (b12 + eCapTL * eLabTL))
data$dmpCapMatTL = with(data, (qOutTL / (qCap * qMat)) * (b13 + eCapTL * eMatTL))
data$dmpLabMatTL = with(data, (qOutTL / qLab * qMat) * (b23 + eLabTL * qMat))
## Quasiconcavity
data$quasiConcTL = NA
for(obs in 1:nrow(data)) {
hmLoop = matrix(0, nrow = 3, ncol = 3)
hmLoop[1,1] = data$dmpCapCapTL[obs]
hmLoop[1,2] = hmLoop[2,1] = data$dmpCapLabTL[obs]
hmLoop[1,3] = hmLoop[3,1] = data$dmpCapMatTL[obs]
hmLoop[2,2] = data$dmpLabLabTL[obs]
hmLoop[2,3] = hmLoop[3,2] = data$dmpLabMatTL[obs]
hmLoop[3,3] = data$dmpMatMatTL[obs]
data$quasiConcTL[obs] = hmLoop[1,1] < 0 & det(hmLoop[1:2,1:2] > 0) & det(hmLoop < 0)
}
sum(!data$quasiConcTL)
```

```
[1] 126
```

With this information I can extend the information from the last table in the first part of the post:

Quadratic | Translog | |
---|---|---|

\(R^2\) on \(y\) | 0.85 | 0.77 |

\(R^2\) on \(ln(y)\) | 0.55 | 0.63 |

Obs. with negative predicted output | 0 | 0 |

Obs. that violate the monotonicity condition | 39 | 48 |

Obs. with negative scale elasticity | 22 | 0 |

Obs. that violate the quasiconcavity condition | 140 | 126 |

\(\Longrightarrow\) Now I have more evidence in favour of a different approach as it was mentioned at the end of the first part of the post.

# Non-parametric regression

A non-parametric model is of the form:

The advantages are:

* The functional form is not defined beforehand
* Obtains a best fit function \(m\) from the data

* Obtains marginal effects

The disadvantages are:

* Higher computational cost
* There are no regression coefficients (the focus is the regression function)

* Hard to interpret

## Single variable non-parametric regression

I can predict the output by using capital as the only input:

```
library(KernSmooth)
qCap = as.vector(data$qCap)
qOut = as.vector(data$qOut)
bw = dpill(qCap,qOut)
fit1 = locpoly(qCap, qOut, bandwidth = bw, degree = 1) # 1st degree polynomial
plot(fit1$x, fit1$y, type = "l")
```

Here are different specifications:

```
# 2nd degree polynomial
fit2.1 <- locpoly(qCap, qOut, bandwidth = bw, degree = 2)
fit2.2 <- locpoly(qCap, qOut, bandwidth = bw*2, degree = 2) # less variance / more bias
fit2.3 <- locpoly(qCap, qOut, bandwidth = bw/2, degree = 2) # more variance / less bias
# linear regression
fit.lm <- lm(qOut ~ qCap)
# plots
plot(fit1$x, fit1$y, type = "l");par(mfrow=c(2,2))
plot(fit1$x, fit1$y, type = "l");lines(fit2.1, col = "blue")
plot(fit1$x, fit1$y, type = "l");lines(fit2.2, col = "red")
plot(fit1$x, fit1$y, type = "l");lines(fit2.3, col = "green")
plot(fit1$x, fit1$y, type = "l");abline(fit.lm, col = "purple")
```

Checking the models:

```
# model 1
sum(is.na(fit1$y))
```

```
[1] 50
```

```
# models 2, 3 y 4
sum(is.na(fit2.1$y))
```

```
[1] 50
```

```
sum(is.na(fit2.2$y))
```

```
[1] 9
```

```
sum(is.na(fit2.3$y))
```

```
[1] 84
```

\(\Longrightarrow\) I should use model 2.2 and adjust it.

I’ll try with a neoclassical function with diminishing marginal returns:

```
# se ajusta la banda y el grado
fit.sqrt = locpoly(qCap, qOut, bandwidth = bw*3, degree = 0.5)
sum(is.na(fit.sqrt$y))
```

```
[1] 0
```

```
sum(fit.sqrt$y < 0)
```

```
[1] 0
```

\(\Longrightarrow\) In this model there is no indetermined or negative predicted output.

```
plot(fit.sqrt$x, fit.sqrt$y, type = "l")
```

I can obtain the median productivity:

```
median(fit.sqrt$y)/median(fit.sqrt$x)
```

```
[1] 18.05637
```

\(\Longrightarrow\) In median term each capital unit increases the output in 18 units.

If I adjust the bandwidth the function will look more and more like a neoclassical function from textbooks, with an initial part of increasing marginal returns and from the point where marginal and median productivities are equal (the same point where marginal productivity is at maximum) the return becomes increasing at decreasing scale.

This is an example of the last paragraph:

```
fit.sqrt.2 = locpoly(qCap, qOut, bandwidth = bw*5, degree = 0.5)
plot(fit.sqrt.2$x, fit.sqrt.2$y, type = "l")
```

## Multivariable non-parametric regression

Among many alternatives, *Epanechnikov’s method* is a non-parametric method that weights the contribution of each observation and iterating obtains the best fit function.

I’ll fit a regression applying logs to the variables:

```
library(np)
options(np.messages=FALSE)
prodNP = npreg(log(qOut) ~ log(qCap) + log(qLab) + log(qMat), regtype = "ll",
bwmethod = "cv.aic", ckertype = "epanechnikov", data = data,
gradients = TRUE)
summary(prodNP)
```

```
Regression Data: 140 training points, in 3 variable(s)
log(qCap) log(qLab) log(qMat)
Bandwidth(s): 1.039647 332644 0.8418465
Kernel Regression Estimator: Local-Linear
Bandwidth Type: Fixed
Residual standard error: 0.6227669
R-squared: 0.6237078
Continuous Kernel Type: Second-Order Epanechnikov
No. Continuous Explanatory Vars.: 3
```

These are the regression plots:

the variables that do not appear in a plot are considered to be fixed to a value equal to their median.

I can test the significance for this model:

```
options(np.messages=FALSE)
npsigtest(prodNP)
```

```
Kernel Regression Significance Test
Type I Test with IID Bootstrap (399 replications, Pivot = TRUE, joint = FALSE)
Explanatory variables tested for significance:
log(qCap) (1), log(qLab) (2), log(qMat) (3)
log(qCap) log(qLab) log(qMat)
Bandwidth(s): 1.039647 332644 0.8418465
Individual Significance Tests
P Value:
log(qCap) 0.11779
log(qLab) < 2e-16 ***
log(qMat) < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

\(\Longrightarrow\) at 10% of significance the only variable with non-statistical significant effects is capital.

Finally I can obtain information about Product-Factor Elasticity and Scale Elasticity:

```
summary(gradients(prodNP)[,1]) # capital elasticity
```

```
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.10300 0.08598 0.14360 0.14500 0.18810 0.30060
```

```
summary(gradients(prodNP)[,2]) # labour elasticity
```

```
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02208 0.63050 0.69880 0.70970 0.80670 0.97890
```

```
summary(gradients(prodNP)[,3]) # materials elasticity
```

```
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3873 0.4546 0.5669 0.5850 0.6502 1.3480
```

```
summary(rowSums(gradients(prodNP))) # scale elasticity
```

```
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.267 1.363 1.416 1.440 1.479 1.880
```

```
par(mfrow=c(2,2))
hist(gradients(prodNP)[,1], main = "Capital")
hist(gradients(prodNP)[,2], main = "Labour")
hist(gradients(prodNP)[,3], main = "Materials")
hist(rowSums(gradients(prodNP)), main = "Scale")
```

**leave a comment**for the author, please follow the link and comment on their blog:

**Pachá (Batteries Included)**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.