Visualize Univariate Distribution of a Dataset (using Plotly)

[This article was first published on R – My Software Learning & Experiences, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Plotly library makes interactive graphs. Using this library a function ddist has been written for visualization of data distribution of each variable within a dataset. This function may become quite handy during the exploration of any dataset.

Since this function uses plotly library, therefore you must install and load this library before calling the ddist function.


library(plotly)

ddist function takes a dataset (of type data.frame) as an input parameter and returns a list containing plotly plot object for each variable. See below:


#Function for generating plot objects for each variable within the dataset (using plotly library). For numeric and integer variables, a histogram plot is generated, while for others a barplot is generated.
#param: data.frame  
#returns: list
ddist <- function(dataset) {
  
  #create a list for holding the plot objects
  plots <- list(length(dataset))
  
  #iterate through each variable
  for(i in 1:length(dataset)) {
    
    #for numeric and integer variables plot histogram  
    if(is.numeric(dataset[,i]) || is.integer(dataset[,i])) {            
      plots[[i]] <- plot_ly(x=dataset[,i]) %>% add_histogram(name=names(dataset)[i])  
    } 
    #for remaining plot barplot
    else {
      tbl = table(dataset[,i])
      plots[[i]] <- plot_ly(x=names(tbl), y=tbl, name=names(dataset)[i], type='bar')
    }
  } 
  #return list of plots
  return(plots)
}

Now by using the above function we can easily explore the data distribution of each variable within any dataset. For example, the below code passes irisdataset to ddist function and then calls subplot function (of plotly library) to display the resulting plots on two rows:

#generate plotly plot objects
plots <- ddist(iris)

#display plots on two rows
subplot(plots, nrows=2)


Click to view interactive plots of univariate distributions of iris dataset

Similarly passing the diamonds dataset to ddist function results in:


Click to view interactive plots of univariate distributions of diamonds dataset

To leave a comment for the author, please follow the link and comment on their blog: R – My Software Learning & Experiences.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)