parcats 0.0.1 released

December 4, 2019
By

[This article was first published on R on datistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



parcats was released on CRAN. It is an htmlwidget providing bindings to the plotly.js parcats trace, which is not supported by the plotly R package. Also adds marginal histograms for numerical variables.

demogif

demogif

What it can do

I wanted to add interactivity to easyalluvial plots for a while now and found that the parcats trace of plotly.js would be perfect because brushing with the mouse highlights the entire flow and not just everything flowing in and out of a specific node as in most D3 Sankey chart implementations. Unfortunately the parcats trace was not available in the plotly R package so I decided to build a new html widget to create R bindings for specifically this trace.

  • converts any easyalluvial plot to an interactive parallel categories diagram
  • interactive marginal histograms
  • multidimensional partial dependency and model response plots

easyalluvial

parcats requires an alluvial plot created with easyalluvial to create an interactive parrallel categories diagram.

Demo

Examples

suppressPackageStartupMessages( require(tidyverse) )
suppressPackageStartupMessages( require(easyalluvial) )
suppressPackageStartupMessages( require(parcats) )

Parcats from alluvial from data in wide format

p = alluvial_wide(mtcars2, max_variables = 5)
parcats(p, marginal_histograms = TRUE, data_input = mtcars2)

Parcats from model response alluvial

Machine Learning models operate in a multidimensional space and their response is hard to visualise. Model response and partial dependency plots attempt to visualise ML models in a two dimensional space. Using alluvial plots or parrallel categories diagrams we can increase the number of dimensions.

Here we see the response of a random forest model if we vary the three variables with the highest importance while keeping all other features at their median/mode value.

df = select(mtcars2, -ids )
m = randomForest::randomForest( disp ~ ., df)
imp = m$importance
dspace = get_data_space(df, imp, degree = 3)
pred = predict(m, newdata = dspace)
p = alluvial_model_response(pred, dspace, imp, degree = 3)
parcats(p, marginal_histograms = TRUE, imp = TRUE, data_input = df)

To leave a comment for the author, please follow the link and comment on their blog: R on datistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)