Part II – Using R in Excel – Linear Regression

[This article was first published on Adam's Software Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In the first part of this series, I looked at using R in Excel to obtain descriptive statistics. In this second part of the series I am going to look at using R in Excel to perform linear regression, specifically using the lm() function. lm() is a real workhorse function. It can be used to carry out both single and multiple regression and different types of analysis of variance. For this demonstration I will only focus on single and multiple regression.

The workbook for this part of the series is: “Part II – R in Excel – Linear Regression.xlsx”. As before, the ‘References’ worksheet lists links to external references. The ‘Libraries’ worksheet loads additional (non-default) packages. In this demonstration, I use the datarium and broom packages. The ‘Datasets’ worksheet contains the data referenced in the worksheets.

The Data

The dataset is the marketing dataset from the datarium package. It consists of a data frame containing the impact of three advertising medias (youtube, facebook and newspaper) on sales. Data are the advertising budget in thousands of dollars along with the sales. The advertising experiment has been repeated 200 times. The dataset is described here.

Marketing Dataset

If the dataset is not available, check that the datarium library is loaded by going to the Libraries worksheet and evaluating:

=RScript.Evaluate(library(datarium), TRUE)

Simple Linear Regression

In the Simple Linear Regression worksheet we create a simple linear model.

smodel <- lm(sales ~ youtube, data = marketing)

As we did in the first part of this series, we unpack the ‘smodel’. Mostly this is simply a question of evaluating the corresponding R objects, for example the residuals and the coefficients. We also request the 95% confidence intervals for the coefficients.

Residuals and Coefficients

At this point, we can compare the results to the Summary Output provided by the Analysis ToolPak’s regression function (including the 95% confidence intervals).

It’s worth emphasising that the output that is returned to Excel via the add-in is not ‘formatted’ as it is in R/RStudio. If we call summary(smodel) in R we are presented with a familiar tabular output summarising the main features of the model. However, what is returned to Excel via the add-in is more basic. So it is worth spending some time looking at this. We can see from the add-in environment, that the smodel_summary is a list of 11 items. To use this we need to unpack some of the individual list items.

Model Summary

Here we can see the actual call made and the details of the model including the statistics (sigma, r-squared and so on). It needs some extra work to unpack the summary data from the model residuals in order to produce a tabular output.

as.data.frame(as.array(summary(smodel$residuals)))

This formula summarises the model residuals and coerces the data into an array first (to get the names) and then as a data frame. We perform a similar operation to get the coefficients into a tabular format.

With all the data available in this form, and with a little effort, we can now construct a summary output similar to R.

Summary Output

Multiple Linear Regression

The next worksheet demonstrates multiple linear regression using the same marketing data. This is similar to the previous worksheet and illustrates how to extract relevant values from the returned model data.

However, in this case we make use of the package broom. This can help tidy up the output data.

Tidy Output

In previous examples, we have extracted model results by concatenating the returned model name with labels from the model. To make it easier to retrieve results, the add-in provides some additional functions for querying models: Model.Results outputs a list of results from the model. Model.Result outputs the result obtained from one item of the list of model results. Optionally, the result can be formatted as a data frame. This is somewhat more convenient than having to evaluate scripts of the form model name'$coeffcients, etc. By extracting the model data and the summary, we can construct an output that is similar to that provided by R.

Logistic Regression

The final example demonstrates logistic regression. At this point you might be thinking that this massaging and extracting output data is not entirely satisfactory. It’s a lot of work and it can be quite brittle (if the cell references change, for example). Furthermore, as the models become more complicated (see Part III in the series) this approach potentially becomes messier. To mitigate this, the add-in provides a couple of wrapper functions around regression. These help with both the setup and the output of more complex models.

Regression.GLM

The model inputs are specified as a block of parameters, and the function Regression.GLM is used instead of the R script equivalent using glm. The model outputs can be ‘queried’ using the Model.Results and the Model.Result formulas. However, even with this, it takes some effort to output a summary similar to the one provided by R.

Wrap Up

In this post, we have looked at how to use R in Excel to perform linear regression, and we have spent some time demonstrating how to extract the different components from the model data. This will be useful in the next two parts of this series.

To leave a comment for the author, please follow the link and comment on their blog: Adam's Software Lab.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)