Are R ecosystems the future?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Some random thoughts…
Over the past 6 months I’ve been creating, refining, and delivering a variety of ‘Introduction to R’ training courses. The more I do this, the more I come to the view that not nearly enough is made of taking an ecosystem-oriented view to packages.
A good way of talking about #rstats functionality is in terms of ecosystems, rather than individual packages. Tidyverse, tidymodels, RMarkdown & Co, and HTML widgets are all worth highlighting…any others?
— Jamie Lendrum (@jamie_lendrum) January 30, 2019
The CRAN task views go some way to try to address this, but these are not recommendations, but more of a list of packages that broadly do the same thing with no indication as to whether they will work well with the other package you’re using to do another task. It’s like giving someone a 100 page cook book listing various ingredients, with no recipes.
Aside from the language syntax itself, I’ve discovered that when people go off to try to apply R in their own work, the first stumbling block is often their choice of package. Whilst there is not necessarily a ‘wrong’ choice, I do find instances where they pick packages that have either been superseded or are not the easiest of packages to use. I talk to people at an R User Group, and they tell me of these outdated packages they’re using, not realising that something much more effective and user friendly came out years ago. Wouldn’t it be good if people could narrow their search down to an ecosystem of packages? Even better, wouldn’t some interactive visualisation showing the composition of these ecosystems be useful?
So far I’ve identified the following ecosystems of packages (leave a comment if you know of any more!):
- Tidyverse
- Tidymodels
- Tidyverts
- ggplot2 extensions
- HTML widgets
- Bioconductor
- RMarkdown and extensions
- lib-r
- cloudyr
Some of these are well-defined ecosystems, however I think that others could be brought together more coherently under the same umbrella (e.g. HTML widgets and RMarkdown and co.)
Furthermore, there are some really great packages, such as DataExplorer and janitor, which currently don’t seem to have a broader ‘home’. I’d really like to see these integrated into existing ecosystems, or baked into new ecosystems.
Then there’s the issue of who gets to decide how packages converge into ecosystems? Package maintainers have a good understanding of the intent of their package, and are probably well placed to know what differentiates their package from others like it, it possibly being borne out of necessity as the alternatives didn’t do the job well enough. However, package maintainers may not have pitched their package at the right level for the average user, and therefore may be missing that all important independent view.
I dare say there will be some who are strongly opposed to the idea. In fact I remember reading a blog some time ago about it being difficult for packages to make a name for themselves when faced with the competition of ‘de-facto’ ecosystems. I don’t dispute that. However, I think that good packages should strive to find their place in the grander scheme of things and avoid duplicating existing functionality, which can only lead to more collaboration and cross-compatibility…and that’s got to be a good thing, right?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.