R’s plot function is probably the most used visualization function in R. It’s simple, easy and gets the job done. It’s also highly customizable. Adding unnecessary styling and information on a visualization/plot is not really recommended because it can take away from what’s being portrayed, but there are times when you have just have to. Whether it’s for pure aesthetics, to convey multiple things in one plot, or any other reason, here are the options you can use in R’s base plot() function.
The Data Points
We’re going to be using the cars dataset that is built in R. To follow along with real code, here’s an interactive R Notebook. Feel free to copy it and play around with the code as you read along.
So if we were to simply plot the dataset using just the data as the only parameter, it’d look like this:
The default data points are circles with an empty fill. To change the style of the dots (the data points), we use the option ‘pch’:
The ‘pch’ has accepts several codes, here is a grid with all the different data point styles (the default is 1):
Data Point Size
To change the size of the data point, we use the ‘cex’ option:
Data Point Color
The default color for the data points is black, to change that we use the ‘col’ option:
The ‘col’ option takes in both words and integers to identify the color. So you can use words like ‘green’, ‘wheat’, ‘red’ etc… and color codes. To get all the colors, just run colors() and it will return all the colors available. So if you know the location of your favorite color in the return value of the colors() function, you can just run plot(dataset,col=colors()) and you’ll have a plot with your favorite color (or just save the color in a variable somewhere in the document if you want to keep re-using it). There are 657 colors in the colors() function.
If you work in a team, or even if you work alone (but especially if you work with a group), always label your axes. If the dataset you’re using has headers or column titles, the plot function will automatically use those as the labels for the plot axes. To add your own, use the xlab() and the ylab() options:
plot(dataset, xlab("Speed (km/h)"), ylab("Distance (km)"))
Plot legends can be called using the legend() function. The first two parameters are the x-position and y-position of the legend on the plot. You call this function right after you plot:
You want the legend symbol to match the symbol used in the plot. The legend takes in the same pch codes used in the plot() function for the symbols. In addition, you should of course have the same color for the symbols in the legend and the symbols in the plot. Here’s some of the options you can play around with in the legend() function:
legend(xPosition: int, yPosition: int, labels: array, col :int|string, cex: int, pch: int)
These are just what I call the essentials, a lot more in the documentation (see below).
And that’s it. Like I said before, there are several other options you can use like regression/trend lines, plot sizing etc… These are just the essentials when you want a little something extra on your visualization. In particular stages of the data analysis process, the less you add to your plots, the better.
Reference and Documentation