My Own R Function and Script for Simple Linear Regression – An Illustration with Exponential Decay of DDT in Trout

(This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers)

Here is the function that I wrote for doing simple linear regression, as alluded to in my blog post about simple linear regression on log-transformed data on the decay of DDT concentration in trout in Lake Michigan.  My goal was to replicate the 4 columns of the output from applying summary() to the output of lm().

To use this file and this script,

1) I saved this file as “simple linear regression.r”.

2) In the same folder, I saved a script called “DDT trout regression.r” that used this function to implement simple linear regression on the log-transformed DDT data.

3) I changed the working directory to this folder containing the function and the script.

4) I made sure “DDT trout regression.r” used the source() function to call my user-defined function for simple linear regression.

5) I ran “DDT trout regression.r”.

##### A function for simple linear regression
##### Written by Eric Cai - The Chemical Statistician
simple.linear.regression = function(predictor, target)
{
     sample.size = length(target)

     mean.predictor = mean(predictor)
     mean.target = mean(target)

     sxx = sum((predictor - mean.predictor)^2)
     sxy = sum((predictor - mean.predictor)*(target - mean.target))

     beta1.hat = sxy/sxx
     beta0.hat = mean.target - beta1.hat*mean.predictor

     target.fitted = beta0.hat + beta1.hat*predictor
     target.residuals = target - target.fitted
     sum.squared.residuals = sum(target.residuals^2)

     degree.freedom = sample.size - 2
     mean.squared.error = sum.squared.residuals/degree.freedom
     standard.error = sqrt(mean.squared.error)

     beta0.hat.sample.variance = mean.squared.error*(1/sample.size + mean.predictor^2/sxx)
     beta0.hat.standard.error = sqrt(beta0.hat.sample.variance)

     beta1.hat.sample.variance = mean.squared.error/sxx
     beta1.hat.standard.error = sqrt(beta1.hat.sample.variance)

     t.beta0.hat = beta0.hat/beta0.hat.standard.error
     t.beta1.hat = beta1.hat/beta1.hat.standard.error

     p.value.beta0.hat = 2*pt(abs(t.beta0.hat), degree.freedom, lower.tail = F)
     p.value.beta1.hat = 2*pt(abs(t.beta1.hat), degree.freedom, lower.tail = F)

     # aggregate inferential results into a table, similar to output of summary(lm(y~x))
     beta0.inference = c(beta0.hat, beta0.hat.standard.error, t.beta0.hat, p.value.beta0.hat)
     beta1.inference = c(beta1.hat, beta1.hat.standard.error, t.beta1.hat, p.value.beta1.hat)

     # combine inferential results of beta0 and beta1 into a matrix
     regression.inference = rbind(beta0.inference, beta1.inference)

     # provide names to columns and rows for table of inferential results
     colnames(regression.inference) = c('Estimated Coefficient', 'Standard Error', 't-statistic', 'p-value')
     rownames(regression.inference) = c('beta0', 'beta1')

     # display table
     regression.inference
}

Here is “DDT in trout regression.r”.  In this script, I entered the data and conducted simple linear regression.

##### Linear Regression of DDT Decay in Michigan Trout
##### Data from "Elements of Environmental Chemistry" by Ronald A. Hites, Page 49
##### Script written by Eric Cai - The Chemical Statistician

# clear all variables in the workspace
rm(list=ls(all=TRUE))

# call my user-defined function, 'simple linear regression.r'
source('simple linear regression.r')

year = c(1970:1982, seq(1984, 1992, by=2), 1995, 1998) 
DDT = c(19.19, 13.00, 11.31, 9.96, 8.42, 7.50, 5.65, 6.34, 4.58, 6.91, 4.74, 3.22, 2.74, 2.22, 1.10, 1.44, 1.39, 1.16, 0.98, 0.85)

DDT.data = cbind(year, DDT)
colnames(DDT.data) = c('Time (Year)', 'DDT Concentration')
DDT.data

# let's plot the untransformed DDT concentration data with respect to time 
png("INSERT YOUR DIRECTORY HERE/DDT in trout.png")
plot(year, DDT, xlab = 'Year', ylab = 'DDT Concentration in Trout (ppm)', main = 'DDT Concentration in Trout in Lake Michigan')
dev.off()

# let's conduct simple linear regression on the untransformed data
DDT.regression = simple.linear.regression(year, DDT)
DDT.regression

# now transform the data using natural logarithm to linearize the DDT-time relationship 
ln.DDT = log(DDT)

# let's plot the log-transformed DDT concentration data with respect to time 
 png("INSERT YOUR DIRECTORY HERE/ln(DDT) in trout.png")
plot(year, ln.DDT, xlab = 'Year', ylab = 'Natural Logarithm of DDT Concentration in Trout in Lake', main = 'Natural Logarithm of DDT Concentration in Trout in Lake Michigan')
dev.off()

# conduct simple linear regression on the log-transformed data 
ln.DDT.regression = simple.linear.regression(year, ln.DDT)
ln.DDT.regression

Filed under: Applied Statistics, Environmental Chemistry, Plots, R programming Tagged: data analysis, linear regression, R, R function, regression, simple linear regression

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)