My Own R Function and Script for Simple Linear Regression – An Illustration with Exponential Decay of DDT in Trout

[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here is the function that I wrote for doing simple linear regression, as alluded to in my blog post about simple linear regression on log-transformed data on the decay of DDT concentration in trout in Lake Michigan.  My goal was to replicate the 4 columns of the output from applying summary() to the output of lm().

To use this file and this script,

1) I saved this file as “simple linear regression.r”.

2) In the same folder, I saved a script called “DDT trout regression.r” that used this function to implement simple linear regression on the log-transformed DDT data.

3) I changed the working directory to this folder containing the function and the script.

4) I made sure “DDT trout regression.r” used the source() function to call my user-defined function for simple linear regression.

5) I ran “DDT trout regression.r”.

##### A function for simple linear regression
##### Written by Eric Cai - The Chemical Statistician
simple.linear.regression = function(predictor, target)
{
     sample.size = length(target)

     mean.predictor = mean(predictor)
     mean.target = mean(target)

     sxx = sum((predictor - mean.predictor)^2)
     sxy = sum((predictor - mean.predictor)*(target - mean.target))

     beta1.hat = sxy/sxx
     beta0.hat = mean.target - beta1.hat*mean.predictor

     target.fitted = beta0.hat + beta1.hat*predictor
     target.residuals = target - target.fitted
     sum.squared.residuals = sum(target.residuals^2)

     degree.freedom = sample.size - 2
     mean.squared.error = sum.squared.residuals/degree.freedom
     standard.error = sqrt(mean.squared.error)

     beta0.hat.sample.variance = mean.squared.error*(1/sample.size + mean.predictor^2/sxx)
     beta0.hat.standard.error = sqrt(beta0.hat.sample.variance)

     beta1.hat.sample.variance = mean.squared.error/sxx
     beta1.hat.standard.error = sqrt(beta1.hat.sample.variance)

     t.beta0.hat = beta0.hat/beta0.hat.standard.error
     t.beta1.hat = beta1.hat/beta1.hat.standard.error

     p.value.beta0.hat = 2*pt(abs(t.beta0.hat), degree.freedom, lower.tail = F)
     p.value.beta1.hat = 2*pt(abs(t.beta1.hat), degree.freedom, lower.tail = F)

     # aggregate inferential results into a table, similar to output of summary(lm(y~x))
     beta0.inference = c(beta0.hat, beta0.hat.standard.error, t.beta0.hat, p.value.beta0.hat)
     beta1.inference = c(beta1.hat, beta1.hat.standard.error, t.beta1.hat, p.value.beta1.hat)

     # combine inferential results of beta0 and beta1 into a matrix
     regression.inference = rbind(beta0.inference, beta1.inference)

     # provide names to columns and rows for table of inferential results
     colnames(regression.inference) = c('Estimated Coefficient', 'Standard Error', 't-statistic', 'p-value')
     rownames(regression.inference) = c('beta0', 'beta1')

     # display table
     regression.inference
}

Here is “DDT in trout regression.r”.  In this script, I entered the data and conducted simple linear regression.

##### Linear Regression of DDT Decay in Michigan Trout
##### Data from "Elements of Environmental Chemistry" by Ronald A. Hites, Page 49
##### Script written by Eric Cai - The Chemical Statistician

# clear all variables in the workspace
rm(list=ls(all=TRUE))

# call my user-defined function, 'simple linear regression.r'
source('simple linear regression.r')

year = c(1970:1982, seq(1984, 1992, by=2), 1995, 1998) 
DDT = c(19.19, 13.00, 11.31, 9.96, 8.42, 7.50, 5.65, 6.34, 4.58, 6.91, 4.74, 3.22, 2.74, 2.22, 1.10, 1.44, 1.39, 1.16, 0.98, 0.85)

DDT.data = cbind(year, DDT)
colnames(DDT.data) = c('Time (Year)', 'DDT Concentration')
DDT.data

# let's plot the untransformed DDT concentration data with respect to time 
png("INSERT YOUR DIRECTORY HERE/DDT in trout.png")
plot(year, DDT, xlab = 'Year', ylab = 'DDT Concentration in Trout (ppm)', main = 'DDT Concentration in Trout in Lake Michigan')
dev.off()

# let's conduct simple linear regression on the untransformed data
DDT.regression = simple.linear.regression(year, DDT)
DDT.regression

# now transform the data using natural logarithm to linearize the DDT-time relationship 
ln.DDT = log(DDT)

# let's plot the log-transformed DDT concentration data with respect to time 
 png("INSERT YOUR DIRECTORY HERE/ln(DDT) in trout.png")
plot(year, ln.DDT, xlab = 'Year', ylab = 'Natural Logarithm of DDT Concentration in Trout in Lake', main = 'Natural Logarithm of DDT Concentration in Trout in Lake Michigan')
dev.off()

# conduct simple linear regression on the log-transformed data 
ln.DDT.regression = simple.linear.regression(year, ln.DDT)
ln.DDT.regression

Filed under: Applied Statistics, Environmental Chemistry, Plots, R programming Tagged: data analysis, linear regression, R, R function, regression, simple linear regression

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)