[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

#### Introduction

Today, I will begin a series of posts on numerical integration, which has a wide range of applications in many fields, including statistics.  I will introduce with trapezoidal integration by discussing its conceptual foundations, write my own R function to implement trapezoidal integration, and use it to check that the Beta(2, 5) probability density function actually integrates to 1 over its support set.  Fully commented and readily usable R code will be provided at the end. Given a probability density function (PDF) and its support set as vectors in an array programming language like R, how do you integrate the PDF over its support set to ensure that it equals to 1?  Read the rest of this post to view my own R function to implement trapezoidal integration and learn how to use it to numerically approximate integrals.

#### The Conceptual Intuition Behind Trapezoidal Integration

Trapezoidal integration uses the area of a trapezoid to approximate the definite integral of a function over a given interval of integration.  To calculate the area of a trapezoid, it’s best if you visualize 2 trapezoids of the same shape and size stacked like this: It is clear from this image that these 2 trapezoids stack to make a rectangle of width $w$ and length $h_1 + h_2$.  Thus, the area of the bottom trapezoid is simply half the area of the rectangle: $\text{Area of Trapezoid} = 0.5 \times w \times (h_1 + h_2)$

Now, consider the following function: Image courtesy of DieBuche via Wikimedia

Following what we just learned above, the area of the trapezoid that approximates this integral is $\text{Area of Trapezoid} = 0.5 \times (b-a) \times (f(a) + f(b))$

Of course, the approximation becomes much better when the interval of integration is split into many smaller intervals, each of which acts as the base of a smaller trapezoid.  A sufficiently small width will lead to more trapezoids being used, resulting in a more accurate approximation. Image Courtesy of Cdang via Wikimedia

To approximate the integral, simply add the areas of the smaller trapezoids that are fit under the function over the interval of integration. $\int_{a}^{b} f(x)\, dx \approx 0.5 \sum_{k=1}^{n} \left( x_{k+1} - x_{k} \right) \left( f(x_{k+1}) + f(x_{k}) \right)$

As you can see in both of the above images, trapezoidal integration is not perfect; depending on the integrand, the trapezoids could underestimate or overestimate the integral.  Error analysis of trapezoidal integration and other numerical integration techniques will be covered in a later post.

#### Implementation in R

Here is a function that I wrote in R to implement trapezoidal integration.

• 2 inputs are needed: the variable of integration and the integrand
• 1 output is given: the definite integral over the interval of integration
##### Trapezoidal Integration
##### By Eric Cai - The Chemical Statistician
##### define the function for trapezoidal integration

trapezoidal.integration = function(x, f)
{
### 3 checks to ensure that the arguments are numeric and of equal lengths
# check if the variable of integration is numeric
if (!is.numeric(x))
{
stop('The variable of integration "x" is not numeric.')
}

# check if the integrand is numeric
if (!is.numeric(f))
{
stop('The integrand "f" is not numeric.')
}

# check if the variable of integration and the integrand have equal lengths
if (length(x) != length(f))
{
stop('The lengths of the variable of integration and the integrand do not match.')
}

### finish checks

# obtain length of variable of integration and integrand
n = length(x)

# integrate using the trapezoidal rule
integral = 0.5*sum((x[2:n] - x[1:(n-1)]) * (f[2:n] + f[1:(n-1)]))

# print the definite integral
return(integral)
}

Let’s test this on a probability density function (PDF) for a continuous random variable.  The integral of a PDF over its support set is equal to 1.  For the distribution $Beta(2, 5)$, the support set is the interval $[0, 1]$.  The image produced by the plot() command and exported by the png() and dev.off() commands is the one shown in the introduction of this post.

# define the support set
beta.support = seq(0, 1, by = 0.005)

# obtain the Beta(2, 5) PDF based on the given support set
beta.pdf = dbeta(beta.support, 2, 5)

# export image as PNG file to a user-defined folder
png('INSERT YOUR DIRECTORY PATH HERE/beta pdf.png')

# plot the PDF
plot(beta.support, beta.pdf, main = 'Probability Density Function\nBeta(2, 5)', xlab = 'x', ylab = 'f(x)')

dev.off()

# use trapezoidal integration to integrate the PDF over its support set
trapezoidal.integration(beta.support, beta.pdf)
 0.9999607

The definite integral obtained is close to but not exactly 1.  This error is caused by

• the finite number of points that I used to define the Beta(2, 5) PDF, which leads to the finite number of trapezoids that can be used to approximate the integral
• the inherent error in trapezoidal integration

As mentioned previously, I will discuss the error analysis of trapezoidal integration and other numerical integration techniques in a later post.

Filed under: Applied Mathematics, Mathematical Statistics, Mathematics, Numerical Analysis, Plots, R programming, Statistical Computing Tagged: applied math, applied mathematics, beta distribution, integrand, integration, math, mathematical statistics, mathematics, numerical analysis, numerical integration, pdf, plot, plots, plotting, probability density function, R, R programming, statistics, support set, trapezoid, trapezoidal integration, trapezoidal rule  