The Team Data Science Process

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As more and more organizations are setting up teams of data scientists to make sense of the massive amounts of data they collect, the need grows for a standardized process for managing the work of those teams. To help with this, the data science team at Microsoft has drawn on their experience with large-scale data science projects to develop the Team Data Science Process. The process is built around this data science lifecycle:

Dsprocess

The Team Data Science Process proposes a standardised directory structure for managing the data, code and documents for a data science project, and provides for tracking of those artifacts using a version control system such as Git. It also proposes a shared distributed analytics infrastucture to provide the computational and storage resources that the data scientist tools rely on. It also provides two open-source utilities to support data scientists:

You can find more background on the team data science process in this blog post, and you can also watch this presentation from the developers of the process from the Data Science Summit, embedded below.

You can download the various artifacts of the Team Data Science Process (and even suggest your improvements via a pull request) at the Github repository linked below.

Github (Azure): Team Data Science Process from Microsoft

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)