**R Programming – DataScience+**, and kindly contributed to R-bloggers)

Categories

Tags

Regression analysis is the most demanding machine learning method in 2018. One group of regression analysis for measuring economic efficiency is stochastic frontier analysis (SFA). This method is well suited for benchmarking and finding improvements for optimization in companies and organizations. It can, therefore, be used to design companies so they generate more value for employees and customers.

In this article, you will learn how to program an SFA model in R and also use this model in a benchmarking evaluation setting – before and after changes in organizations or companies.

### Introduction to Benchmarking with SFA Machine Learning

One of the main advantages of SFA models is that they can predict how you can create more value in companies – either for employees or customers. This is due to the models’ ability to measure efficiency improvements that can give room for optimization. Basically – an SFA model consists of a parameter you want to explain and optimize on (y) and a number of variables that you know will affect this outcome in either a positive or a negative way (the x’s). In this respect, it is like a multiple ols regression model. The difference is that the explanatory variables (the x’s) are considered inputs that generate the outcome variable (the y). These inputs can either be production inputs or cost inputs. This means that the SFA model can either be a production function or a cost function. So you either want to add value by finding room for optimization in cost or production inputs of the company. This can be illustrated with the following graph:

### Benchmarking with SFA Machine Learning in R

This makes the SFA model very suitable for commercial machine learning analysis. There are already published some really efficient and great SFA packages in R – these are `sfa`

and `frontier`

. The package `sfa`

focuses on implementing stochastic frontier analysis in R and you can make both production- and cost models. The package `frontier`

are focused upon more specialised SFA models. In this article we will mainly work with `sfa`

. The below coding creates an SFA model in R.

First we load packages into the R library:

# Stochastic frontier analysis in R # SFA packages library(sfa) library(frontier) # Other packages library(foreign) library(readxl) library(tmvtnorm) library(htmlTable)

Next step is to generate a dataset in R and also plotting the dataset in R:

# Generation of random data for dataframe ns = 15 set.seed(121181) x = runif(ns, 0, 1) ybar = x^(1/2) set.seed(121181) u = rtmvnorm(n = ns, mean = c(0.25), sigma = c(0.2), lower = c(0)) y = ybar/(1 + u) plot(y ~ x, type = "p", col = "red", ylim = c(0,1))

The above coding gives the following plot:

3rd step is creation of the SFA model:

# Representation of the efficiency frontier x.seq <- seq(0, 1, by = 0.01) t.fr <- x.seq^(1/2) lines(t.fr ~ x.seq, col = "blue") # The frontier efficiency measurement lambda = y/sqrt(x) theta = y^2/x delta = 1/theta # Table for Frontier Efficiency measures htmlTable(print(tab1))lambda theta delta 1 0.6475214 0.4192839 2.385019 2 0.7918257 0.6269879 1.594927 3 0.9576361 0.9170669 1.090433 4 0.7704356 0.5935711 1.684718 5 0.7528086 0.5667209 1.764537 6 0.6541730 0.4279423 2.336763 7 0.8588736 0.7376639 1.355631 8 0.7548694 0.5698277 1.754916 9 0.8929326 0.7973287 1.254188 10 0.5870898 0.3446745 2.901288 11 0.6966463 0.4853160 2.060513 12 0.5375163 0.2889238 3.461120 13 0.6908752 0.4773085 2.095081 14 0.5192932 0.2696654 3.708299 15 0.6621521 0.4384453 2.280786

The above coding also generates the following SFA graph:

### Introduction to SFA Machine Learning Analysis for Benchmarking Evaluation

In this section, I will show how you can use SFA analysis for benchmarking evaluation of changes in companies or organizations, before- and after these changes. In the first part of the analysis, the SFA estimates are calculated, using the method described in the above section. In order to find the average treatment effect, before and after the intervention, the estimates are divided in a before category and an after category. Thereafter, the estimates are divided into a treatment and control category. This creates four different categories of the SFA estimates, depicted in table I:

As the table shows, the difference estimates are first calculated as the difference between the pre-intervention minus the post-intervention of the treatment group. Thereafter, the second difference estimate is calculated as the pre-intervention minus the post-intervention of the control group. These two different estimates form the DiD design by subtracting the two estimates. This creates the DiD formula: (D-C) – (B-A).

### SFA Machine Learning for Benchmarking Evaluation in R

Now it is time to code the SFA benchmarking evaluation model in R. The below coding are using a DRG healthcare dataset, that are not open source. The coding can be applied to any sort of dataset with a time variable (Post) and treatment variable (Treatment) that are coded in the above table format. The below coding creates data management a SFA difference-in-difference model for Benchmarking Evaluation:

# Load dataset into R drg_data <- read_excel("healthcaredata.xlsx")

# Create data for each Benchmarking Evaluation group drg.data.A <- subset((drg_data,drg_data$Post < 1 & drg_data$Treatment < 1)) drg.data.B 0 & drg_data$Treatment < 1)) drg.data.C <- subset((drg_data,drg_data$Post 0)) drg.data.D 0 & drg_data$Treatment > 0))

Now its time to benchmarking the SFA difference-in-difference model for benchmarking evaluation:

#SFA - group A SFA_A <- sfa(log(drg) ~ log(los) + log(readmis) + log(agecat) + log(sex), data = drg.data.A) #SFA - group B SFA_B <- sfa(log(drg) ~ log(los) + log(readmis) + log(agecat) + log(sex), data = drg.data.B) #SFA - group C SFA_C <- sfa(log(drg) ~ log(los) + log(readmis) + log(agecat) + log(sex), data = drg.data.C) #SFA - group D SFA_D <- sfa(log(drg) ~ log(los) + log(readmis) + log(agecat) + log(sex), data = drg.data.D) # Create a table for the model stargazer(SFA_A,SFA_B,SFA_C,SFA_D, type="text",out="/SFADiDModel.doc")---------------------------------------------------------------------------- Table I – Estimated results for the various groups (1) (2) (3) (4) VARIABLES Group A Group B Group C Group D LOS -0.174*** -0.123*** -0.310*** -0.342*** (0.0297) (0.0369) (0.0305) (0.0437) Sex -0.0571*** -0.0537* 0.236*** 0.00239 (0.0221) (0.0282) (0.0248) (0.0380) Age category 0.221*** 0.170*** 0.542*** 0.335*** (0.0417) (0.0527) (0.0338) (0.0529) Readmissions 0.504*** 0.477*** 0.556*** 0.714*** (0.0134) (0.0180) (0.0144) (0.0206) Constant 2.689*** 2.778*** 2.197*** 2.632** (0.674) (0.841) (0.779) (1.255) Observations 8,461 5,327 8,594 4,523 --------------------------------------------------------------------------- Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1

Now it is time to predict the technical efficens measures for the above period and post estimates, and display them in a table:

#Technical efficiens - group A eff_A <- efficiencies( SFA_A, margEff = TRUE ) #Technical efficiens - group B eff_B <- efficiencies( SFA_B, margEff = TRUE ) #Technical efficiens - group C eff_C <- efficiencies( SFA_C, margEff = TRUE ) #Technical efficiens - group D eff_D <- efficiencies( SFA_D, margEff = TRUE ) # Create a table for the TE model stargazer(eff_A,eff_B,eff_C,eff_D, type="text",out="/SFADiDTEModel.doc")Table II – Technical Efficiency estimates: DRG as outcome --------------------------------- VARIABLES Mean Group A 0.967 Group B 0.982 Group C 0.966 Group D 0.970 Aggregate DiD 0.00969 --------------------------------

As described above: Group A is (pre, treat), Group B is (post, treat), Group C is (pre, comp) and Group D is (post, comp). Aggregate DiD is DiD estimate = (D-C) – (B-A).

### References

- Using Stargazer in R – CRAN.R-project.org
- Using sfa in R – CRAN.R-project.org
- Using frontier in R – CRAN.R-project.org
- Using tmvtnorm in R – CRAN.R-project.org
- Using htmlTable in R – CRAN.R-project.org
- Using readxl in R – CRAN.R-project.org
- Using foreign in R – CRAN.R-project.org

Related Post

- neuralnet: Train and Test Neural Networks Using R
- Keras: Regression-based neural networks
- Working with panel data in R: Fixed vs. Random Effects (plm)
- Understanding Titanic Dataset with H2O’s AutoML, DALEX, and lares library
- K-fold cross-validation in Stan

**leave a comment**for the author, please follow the link and comment on their blog:

**R Programming – DataScience+**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...