# Find the intersection of overlapping histograms in R

**Posts | Joshua Cook**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here, I demonstrate how to find the point where two histograms overlap. While this is an approximation, it seems to have a very high level of precision.

### Prepare simulated data

I created two data sets, `gamma_dist`

and `norm_dist`

, which are made up of a different number of values sampled randomly from a gamma distribution and normal distribution, respectively. I specicially made the data sets different sizes to make the point that this method is still applicable.

library(tibble) set.seed(0) gamma_dist <- rgamma(1e5, shape = 2, scale = 2) norm_dist <- rnorm(5e5, mean = 20, sd = 5) df <- tibble( x = c(gamma_dist, norm_dist), original_dataset = c(rep("gamma_dist", 1e5), rep("norm_dist", 5e5)) ) df #> # A tibble: 600,000 x 2 #> x original_dataset #> <dbl> <chr> #> 1 6.89 gamma_dist #> 2 2.25 gamma_dist #> 3 1.30 gamma_dist #> 4 4.10 gamma_dist #> 5 7.77 gamma_dist #> 6 5.08 gamma_dist #> 7 4.58 gamma_dist #> 8 2.30 gamma_dist #> 9 1.36 gamma_dist #> 10 1.67 gamma_dist #> # … with 599,990 more rows

I used ‘ggplot2’ to plot the densities of the two data sets. The gamma distribution is in red and the normal distribution is in blue. I broke the creation of the plot into two steps: the essential step to create the density curves, and the styling step to make the plot look nice. Of course, these could be combined into a single long ggplot statement.

library(ggplot2) p <- ggplot(df) + geom_density(aes(x = x, color = original_dataset)) p <- p + scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) + scale_color_manual(values = c("tomato", "dodgerblue")) + theme_minimal() + theme( legend.title = element_blank(), plot.title = element_text(hjust = 0.5) ) + labs(x = "values", title = "Two density curves")

### Finding the point of intersection

To find the point of intersection, I first binned the data sets using `density`

. It is essential to use the same `from`

and `to`

values for each data set. The `density`

function creates 512 bins, thus, providing the same starting and ending parameters makes `density`

use the same bins for each data set.

from <- 0 to <- 40 gamma_density <- density(gamma_dist, from = from, to = to) norm_density <- density(norm_dist, from = from, to = to)

The final step was to find where the density of the gamma distribution was less than the normal distribution. Therefore, I applied this logic to create the boolean vector `idx`

. I also included two other filters to contain the result between 5 to 20 because, from the plot above, I can see that the intersection falls within this range.

idx <- (gamma_density$y < norm_density$y) & (gamma_density$x > 5) & (gamma_density$x < 20) poi <- min(gamma_density$x[idx]) poi #> 10.64579

That’s it, the point of intersection has been approximated to a high precision. A vertical line was added to the plot below at `poi`

.

p <- p + geom_vline(xintercept = poi, linetype = 2, size = 0.3, color = "black") + annotate(geom = "text", label = round(poi, 3), x = poi - 1, y = 0.1, size = 4, angle = 90)

**leave a comment**for the author, please follow the link and comment on their blog:

**Posts | Joshua Cook**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.