I recently learnt how to build basic R Shiny apps. To practice using Shiny, I created a simple app that you can use to perform simple exploratory data analysis. You can use the app here to play around with the
diamonds dataset from the
ggplot2 package. To use the app for your own dataset, download the raw R code here (just the
app.R file) and assign your dataset to
raw_df. In the rest of this post, I outline how to use this app.
(Credits: I worked off the source code for the “Diamonds Explorer” app. There are a few versions of this app out there and I can’t find the exact source I used, but it was very close to the source code of this version.)
As you can see from the screenshot below, the main panel (on the right) has 4 tabs. The last two tabs simply give the output of calling the
str functions on the entire dataset; they are not affected by the controls in the side panel. The “Data Snippet” panel prints up to 15 rows of the dataset for a peek into what the dataset looks like. (These 15 rows are the first 15 rows of the dataset used to create the plot on the “Plot” tab.)
The most interesting tab is probably the “Plot” tab. First let me describe how the app selects the dataset it makes the plot for. By default, it picks 1000 random observations or all the observations if the dataset has less than 1000 rows. The user can input the random seed for reproducibility. The user can also control the number of observations using the slider, and choose the observations randomly or take the top from the dataset.
The type of plot the app makes depends on the type of variables given to it. In the screenshot above, one numeric variable and one non-numeric variable is given, so the app makes a boxplot. If two numeric variables are given, it makes a scatterplot:
For scatterplots, the user has the option to jitter the points and/or to add a smoothing line:
If two non-numeric variables are given, the app makes a heatmap depicting how often each combination is present in the data:
The plots above depict the joint distribution of two variables in the dataset. If the user wants a visualization for just one variable, the user can set the “Y” variable to “None”. If the “X” variable is numeric, the app plots a histogram:
If the “X” variable is non-numeric, the app plots a bar plot showing counts:
Finally, let’s talk about color. For simplicity, the app only allows plots to be colored by non-numeric variables. Below is an example of a colored scatterplot:
As the screenshot below shows, color works for boxplots too. (Color does not work for heatmaps.)
Color can be used for the one-dimensional plots as well:
There are certainly several improvements that can be made to this app. For one, it would be nice if the user could upload their dataset through the app (instead of downloading my source code and assigning their dataset to the
raw_df variable). There could also be more controls to let the user change certain aspects of the plot (e.g. transparency of points), but at some point the UI might become too overwhelming for the user.