A Major Contribution to Learning R

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Prominent statistician Frank Harrell has come out with a radically new R tutorial, rflow. The name is short for “R workflow,” but I call it “R in a box” –everything one needs for beginning serious usage of R, starting from little or no background.

By serious usage I mean real applications in which the user has a substantial computational need. This could be a grad student researcher, a person who needs to write data reports for her job, or simply a person who is doing personal analysis such as stock picking.

Like other tutorials/books, rflow covers data manipulation, generation of tables and graphics, etc. But UNLIKE many others, rflow empowers the user to handle general issues as they inevitably pop up, as opposed to just teaching a few basic, largely ungeneralizable operations. I’ve criticized the tidyverse in particular for that latter problem, but really no tutorial, including my own, has this key “R in a box” quality.

The tutorial is arranged into 19 short “chapters,” beginning with R Basics, all the way through such advanced topics as Manipulating Longitudinal Data and Parallel Computing. The exciting new Quarto presentation tool by RStudio is featured, as is the data.table package.

Note carefully that this tutorial is the product of Frank’s long experience “in the trenches,” conducting intensive data analysis in biomedical applications. (This specific field of application is irrelevant; rflow is just as useful to, say, marketing analysts, as it is for medicine.) His famous monograph, Regression Modeling Strategies, is a standard reference in the field. Even I, as the author of my own regression book, often find myself checking out what Frank has to say in his book about various topics.

This point about rflow arising from Frank’s long experience dealing with real data is absolutely key, in my view. And his choice of topics, and especially their ordering, reflects that. For instance, he brings in the topic of missing data early in the tutorial.

Anyone who teaches R, or is learning R, should check out rflow.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)