Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here, I demonstrate how to find the point where two histograms overlap. While this is an approximation, it seems to have a very high level of precision.

### Prepare simulated data

I created two data sets, `gamma_dist` and `norm_dist`, which are made up of a different number of values sampled randomly from a gamma distribution and normal distribution, respectively. I specicially made the data sets different sizes to make the point that this method is still applicable.

```library(tibble)
set.seed(0)
gamma_dist <- rgamma(1e5, shape = 2, scale = 2)
norm_dist <- rnorm(5e5, mean = 20, sd = 5)
df <- tibble(
x = c(gamma_dist, norm_dist),
original_dataset = c(rep("gamma_dist", 1e5), rep("norm_dist", 5e5))
)
df
#> # A tibble: 600,000 x 2
#> x original_dataset
#>
#> 1 6.89 gamma_dist
#> 2 2.25 gamma_dist
#> 3 1.30 gamma_dist
#> 4 4.10 gamma_dist
#> 5 7.77 gamma_dist
#> 6 5.08 gamma_dist
#> 7 4.58 gamma_dist
#> 8 2.30 gamma_dist
#> 9 1.36 gamma_dist
#> 10 1.67 gamma_dist
#> # … with 599,990 more rows
```

I used ‘ggplot2’ to plot the densities of the two data sets. The gamma distribution is in red and the normal distribution is in blue. I broke the creation of the plot into two steps: the essential step to create the density curves, and the styling step to make the plot look nice. Of course, these could be combined into a single long ggplot statement.

```library(ggplot2)
p <- ggplot(df) +
geom_density(aes(x = x, color = original_dataset))
p <- p +
scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
scale_color_manual(values = c("tomato", "dodgerblue")) +
theme_minimal() +
theme(
legend.title = element_blank(),
plot.title = element_text(hjust = 0.5)
) +
labs(x = "values",
title = "Two density curves")
``` ### Finding the point of intersection

To find the point of intersection, I first binned the data sets using `density`. It is essential to use the same `from` and `to` values for each data set. The `density` function creates 512 bins, thus, providing the same starting and ending parameters makes `density` use the same bins for each data set.

```from <- 0
to <- 40
gamma_density <- density(gamma_dist, from = from, to = to)
norm_density <- density(norm_dist, from = from, to = to)
```

The final step was to find where the density of the gamma distribution was less than the normal distribution. Therefore, I applied this logic to create the boolean vector `idx`. I also included two other filters to contain the result between 5 to 20 because, from the plot above, I can see that the intersection falls within this range.

```idx <- (gamma_density\$y < norm_density\$y) &
(gamma_density\$x > 5) &
(gamma_density\$x < 20)
poi <- min(gamma_density\$x[idx])
poi
#> 10.64579
```

That’s it, the point of intersection has been approximated to a high precision. A vertical line was added to the plot below at `poi`.

```p <- p +
geom_vline(xintercept = poi, linetype = 2, size = 0.3, color = "black") +
annotate(geom = "text", label = round(poi, 3),
x = poi - 1, y = 0.1, size = 4, angle = 90)
``` 