Site icon R-bloggers

Equipping Your Data Science Team to Work from Home

[This article was first published on RStudio | Open source & professional software for data science teams on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Photo by Djurdjica Boskovic on Unsplash

If your data science team experienced an abrupt transition to working at home, it may be a good time to rethink their development tools. In this post, I’ll talk about why laptop-centric data science gets in the way of strong data science teams and why you should consider deploying development and publishing servers.

Working from Home Has Affected Both People and Data

Like tigers and koalas, we data scientists are fairly solitary creatures. We typically eschew meetings, embrace focus time, and block out distractions to focus on our work. And on those rare times when we need help, our typical reaction is to walk over to a colleague’s desk and brainstorm an answer.

Enter COVID-19 and the new work-from-home environment. At first glance, it would appear nothing really has to change for the typical data science workflow; team members armed with laptops appear well-equipped to continue their data science work. However, many data science teams are now struggling with:

Serious Data Science Requires Collaborative Tools

To be able to do their work collaboratively and repeatably, data science teams need infrastructure that encourages it and is supported by the organization and IT. That typically means shared servers for:

Which Servers Should You Choose?

Which server-based tools you choose obviously depend on factors such as team size, workload, and company software policies. RStudio offers both open source and commercial alternatives, allowing organizations to choose whichever satisfies their needs best. Table 1 summarizes both approaches.

In addition to providing enhanced security, auditing, and usage monitoring, Pro solutions add other benefits that are less quantifiable. Specifically:

Open Source Solution Value Pro Solution Added Value in Pro
RStudio Server
  • Broadens access to development tools
  • Boosts compute and memory resources available
  • Ensures common development environment
RStudio Server Pro*
  • Adds collaborative editing and projects
  • Supports multiple R versions and sessions
  • Provides Launcher support for back end execution clusters
  • Supports bilingual data science teams with Jupyter
Shiny Server,
Homegrown Web Servers
  • Eases publishing of Shiny applications
  • Allows broad access to data science results
RStudio Connect
  • Consolidates many types of content on one server
  • Allows scheduled production and emails
  • Hosts R- and Python-based APIs
miniCRAN Mirror
  • Maintains a local copy of packages from approved sources
RStudio Package Manager
  • Speeds installs using binaries
  • Allows use of multiple package versions and checkpoints for roll back
  • Provides package use insights for IT

Table 1: Open Source and Professional Server Options To Support Data Scientists.

*RStudio Server Pro, RStudio Connect, and RStudio Package Manager are also available bundled as RStudio Team.

Don’t Be Afraid To Mix and Match Servers As Your Needs Dictate

The collaboration processes data science teams have used for years have already been disrupted by COVID-19 and work from home mandates. The question for data science leaders is what they can do to provide new ways of working that are as good or better than what went before. Centralizing your data science development and production processes is a way to do that.

Emily Riederer, an Analytics Manager at Capital One, summarized some of the benefits she’s seen from this centralized approach at RStudio::conf 2020.

With that said, using servers to make your work-from-home data science team more productive doesn’t have to be a Manhattan Project all-or-nothing proposition. If your data scientists are comfortable developing code on their laptops, you may want to begin by installing a publishing platform like RStudio Connect, and put off development and package management servers for another day. Similarly, some teams start by installing RStudio Server for centralized development and defer publishing and package management. But for teams doing serious data science, they have to start somewhere.

We’ll be posting additional commentary and case studies on equipping data science teams to work from home in the coming weeks. In the meantime, we recommend a recent post about how Appsilon has used Connect to create a remote work-friendly culture.

For More Information

If you’d like to learn more about how to better equip your data science team to work from home, we recommend:

To leave a comment for the author, please follow the link and comment on their blog: RStudio | Open source & professional software for data science teams on RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.