**Cartesian Faith ยป R**, and kindly contributed to R-bloggers)

*This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. Consequently these lectures will not always be as rigorous as they could be.*

This week let’s take a closer look at integration. People often describe integration as area under the curve. This is indeed true, yet I always found it a bit difficult to understand how you get from area under the curve to the Fundamental Theorem of Calculus. This theorem can be cast two different ways, and I’m referring to it asย ,ย where is the antiderivative of .

I like starting with simple examples since it’s a lot easier to understand the behavior of something when you minimize the variables introduced. Hence, let’s start by looking at a line.

xs <- seq(-1,5, by=0.02) f <- function(x) x - 1 plot(xs, f(xs), type='l', col='green') abline(h=0, v=0, col='grey')

There’s nothing particularly remarkable here, so let’s change that. What happens if we add to this graph the cumulative Riemann sum of for the interval [-1,5]? In other words let’s graph .

lines(xs, cumsum(f(xs)*.02), col='blue')

Well this looks kind of like a parabola, and obviously the limit is, but what’s the intuition around it? The simplest thing to do is to see what the cumulative sum of is.

head(cumsum(f(xs)*.02), 20) [1] -0.0400 -0.0796 -0.1188 -0.1576 -0.1960 -0.2340 -0.2716 -0.3088 -0.3456 [10] -0.3820 -0.4180 -0.4536 -0.4888 -0.5236 -0.5580 -0.5920 -0.6256 -0.6588 [19] -0.6916 -0.7240

This is telling us that the area of a thin strip is rather small. It’s also telling us that since the slope is positive, a little bit less negative area is being added each time. Eventually something interesting happens as . The value of the original function starts to get really small, eventually approaching 0 when . Consequently, there isn’t much contribution to the area from these parts of the line. At the slope is exactly 0, which is where . Once the area contribution becomes positive and consequently the value of the Riemann sum begins to increase.

The second form of the Fundamental Theorem of Calculus is similar to our construction of the Riemann sum. It states that . This alternate construction gives the integral as a function of such that the derivative yields . The graph above confirms this since the slope and this is exactly the value of .

Let’s explore the relationship of this version of the Fundamental Theorem of Calculus and the Riemann sum further. Both formulations describe a function in terms of a starting point up to some value . Consider the interval [-1,1], where . At , the value is -0.04. This initial value is always going to be close to 0, since we take the limit of to 0. Getting back to our function , at the total area is -2. We can verify this with some geometry since this is a triangle with area . In R this looks like

F <- function(x) sum(f(x) * .02) > F(xs[xs <= 1]) [1] -2.02

Hence it seems reasonable that the integral for this special case is F(1), or . As shown above, the value computed in R is -2.02. I’ll leave it as an exercise to explain why this is so. Another useful point to look at is 3. Visually we can see that the area from -1 to 3 is represented by two congruent triangles with opposite sign, so the value must surely be 0.

> F(xs[xs <= 3]) [1] 1.665335e-16

Indeed, this value is close. We’ve successfully illustrated the relationship between area under the curve and the Fundamental Theorem of Calculus. However, this is the second version of the theorem and we started with the first. This second version relies on some constant point with a starting value whereas the first version uses two arbitrary points. Remember that with the Riemann sum we need to start with an initial starting point and the area will be close to 0 with small . Suppose we want the value of . By shifting the starting point to , we could use the same technique so that F(3) gives us the right value.

lines(xs[xs >= 1], cumsum(f(xs[xs >= 1])*.02), col='brown')

This has the effect of shifting the parabola by 2, which is essentially . Of course we don’t need to shift the starting point at all. Instead we can simply compute the difference of the two Riemann sums. This has the effect of cancelling any fixed starting point and give us the two arbitrary end points of the interval.

> F(xs[xs <= 3]) - F(xs[xs <= 1]) [1] 2.02

This gives us that . Taking the limit then gets us to the familiar .

#### Exercises

- Why is F(xs[xs <= 1]) = -2.02 and not -2?
- What happens when you use an interval of 0.5 instead of 0.02?
- Draw the Riemann sum so that it’s value is consistent with the interval [1,3]
- Is it necessary for the initial area to be small for this approach to be correct?

**leave a comment**for the author, please follow the link and comment on their blog:

**Cartesian Faith ยป R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...