In Example 9.1, we showed a binning approach to plotting bivariate relationships in a large data set. Here we show more sophisticated approaches: transparent overplotting and formal two-dimensional kernel density estimation. We use the 10,000 simulated bivariate normals shown in Example 9.1.
In SAS, transparency can be found in proc sgplot, with results shown above. The options here are fairly self-explanatory.
proc sgplot data=mvnorms; scatter x=x1 y=x2 / markerattrs=(symbol=CircleFilled size = .05in) transparency=0.85; run;
The image gives a good sense of the overall density, with the darker (overplotted) areas reflecting more observations. Overplotting was the problem we sought to avoid with the binning, but here it becomes an advantage.
Another approach is to use bivariate kernel density estimation. This is perhaps more similar to the binning shown previously, but without the stricture of regular polygons. It also offers some default values for smoothing, though whether or not these are good default values could be debated.
proc kde data=mvnorms; bivar x1 x2 / plots=contour; run;
In R, the basic plot() function appears to include transparency, though you must select a suitably pale color to see it. The pch, col, and cex parameters govern the shape, color, and size of the plotted symbols, respectively.
plot(xvals[,1], xvals[,2], pch=19, col="#00000022", cex=0.1)
Bivariate kernel density estimation is available in the smoothScatter() function, which is in included in the R distribution as part of the graphics package.