# Wanted: A Perfect Scatterplot (with Marginals)

**Win-Vector Blog » R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We saw this scatterplot with marginal densities the other day, in a blog post by Thomas Wiecki:

The graph was produced in Python, using the seaborn package. Seaborn calls it a “jointplot;” it’s called a “scatterhist” in Matlab, apparently. The seaborn version also shows the strength of the linear relationship between the x and y variables. Nice.

I like this plot a lot, but we’re mostly an R shop here at Win-Vector. So we asked: can we make this plot in ggplot2? Natively, ggplot2 can add rugs to a scatterplot, but doesn’t immediately offer marginals, as above.

However, you can use Dean Attali’s ggExtra package. Here’s an example using the same data as the seaborn jointplot above; you can download the dataset here.

```
library(ggplot2)
library(ggExtra)
frm = read.csv("tips.csv")
plot_center = ggplot(frm, aes(x=total_bill,y=tip)) +
geom_point() +
geom_smooth(method="lm")
# default: type="density"
ggMarginal(plot_center, type="histogram")
```

I didn’t bother to add the internal annotation for the goodness of the linear fit, though I could.

The `ggMarginal()`

function goes to heroic effort to line up the coordinate axes of all the graphs, and is probably the best way to do a scatterplot-plus-marginals in ggplot (you can also do it in base graphics, of course). Still, we were curious how close we could get to the seaborn version: marginal density and histograms together, along with annotations. Below is our version of the graph; we report the linear fit’s R-squared, rather than the Pearson correlation.

```
# our own (very beta) plot package: details later
library(WVPlots)
frm = read.csv("tips.csv")
ScatterHist(frm, "total_bill", "tip",
smoothmethod="lm",
annot_size=3,
title="Tips vs. Total Bill")
```

You can see that (at the moment) we’ve resorted to padding the axis labels with underbars to force the x-coordinates of the top marginal plot and the scatterplot to align; white space gets trimmed. This is profoundly unsatisfying, and less robust than the `ggMarginal`

version. If you’re curious, the code is here. It relies on some functions in the file `sharedFunctions.R`

in the same repository. Our more general version will do either a linear or lowess/spline smooth, and you can also adjust the histogram and density plot parameters.

Thanks to Slawa Rokicki’s excellent *ggplot2: Cheatsheet for Visualizing Distributions* for our basic approach. Check out the graph at the bottom of her post — and while you’re at it, check out the rest of her blog too.

**leave a comment**for the author, please follow the link and comment on their blog:

**Win-Vector Blog » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.