I really like teaching people. My parents were both teachers, and I very seriously considered following in their footsteps. While I ultimately didn’t pursue a career in teaching, I’ve been fortunate to fill that void through teaching both formally and informally as a part of my career in tech.
I’ve also written a lot of tutorials. I’d like to think that over the years I’ve learned a thing or two about making content that is easy to follow. In this blog post, I will share my bag of tips and tricks in the hopes that they may help other tutorial makers and their audiences. Please let me know if you have any additions to this list!
Note that I’ve tried to make this blog generalized to any tech, but it does contain a lot of specific tips for data science and R tutorials.
Help your reader with dependencies
Tutorials very often rely on the participant having accounts created, software installed or data downloaded. While some of these things may seem obvious to the tutorial creator, please remember that this may be your readers first interaction with the particular product or package. It’s not always obvious to see how the parts work together or how to get started on the pre-reqs.
Below is an example from my R package installation tutorial. It shows how a simple package install may end up being difficult for the end reader. For example; if we want to install a R package that is only available on GitHub, we first need to install and load the devtools package. We then use that package to install the desired package from GitHub. This isn’t especially hard, but your audience may not be aware of this technique. Providing the code to install every package from scratch can be kindness to new users.
Install devtools, a requirement for installing packages from GitHub
Install the package of choice from GitHub
install_github("hadley/emo") # OR # devtools::install_github("hadley/emo")
File or data dependencies
Sometimes you will want to make an example file available for your participant to work with. Specifically in data tutorials, we often share a data file. There are a few things that we can do to make our assets more readily available.
To start, you can provide your readers with the code to perform a direct data download. Where possible, also try to host your own data file for the participant to download. You don’t want to point to someone else’s data file and then have it become unavailable or change over time. Previously I wrote a tutorial on screen scraping and then performed additional analysis on the gathered data. I should have realized that the scraped page structure would change and that my tutorial would break. It’s always a good idea to provide your users with a static data file to protect against these issues.
If you are looking for easy ways to host your data files, you can upload data to your GitHub repository. With most languages, you are able to directly download from GitHub by pointing to the URL for the “raw” file, as below.
In R, the commands to download directly from a URL are below.
#install.packages("readr") library(readr) df= read_csv('https://raw.githubusercontent.com/lgellis/STEM/master/DATA-ART-1/Data/FinalData.csv', col_names = TRUE)
Create a package! When I asked my Twitter data friends, how they handle reproducibility, David Meza suggested an excellent best practice. He creates a package that loads all dependencies and includes the data required. Talk about a seamless user experience! I haven’t gone this far with my tutorials yet, but it is definitely in my future. If you are using R and need help figuring out how to create a package, please check out Corinne Leopold’s blog post on package creation.
make your code more accessible
Access to your code
There really is nothing worse than a tutorial with only screenshots of the code! If you are going to show your participants code, please make sure that it can be copy and pasted at a minimum. Better yet, host the code in GitHub! Even better would be to host it in GitHub with a markdown file to display your code and output in-line. This was a suggestion that Alison Hill had made in my thread about reproducibility. Since getting it set up, I’m hooked!
To get a .md file displayable in GitHub, simply create a R markdown file and include the following code in the header. Create your R notebook as you would normally, and then knit to md_document. Upload the markdown document and all other created folders to GitHub.
--- title: "Replace with Title" author: Replace with Name date: Replace with Date output: md_document: variant: markdown_github ---
It took me a while to understand the impact on comprehension that proper code style can have. I’ve found that using a code styler can make it much easier for your participants to digest the parts of code. Have a look at how much easier it is to understand the parts of the function call when the code is spaced out with a styling package.
To implement this example you can use the styler package
After the package is installed, you can simply highlight the text and use CMD+SHIFT+A to style the code.
Where possible, try to only include code that is necessary to understand the subject of the blog. For data science blogs in particular, we often have a lot of code to format the data for the given subject area. It can feel tempting to include all of the data transformation code. I would warn you not to do this. Including code that is not core to the subject area makes it harder for the reader to navigate to the code which is relevant to the tutorial subject.
Include screenshots for UI based activities
If your blog does depend on some portion of UI work (such as account or API key creation), please include some screenshots for your users. I understand that including screenshots in our tutorials are hard because the user interface can change. Minimally, even if your screenshots are out of date your users will be able to confirm that the button or other UI object you reference has been moved!
Supply environment information
Include your session info. It will help your users understand the differences between your environments. In R, the following command will print version information about R, the OS and attached or loaded packages.
If you are writing a tutorial, chances are that you have learned some of the material from other people or places. Please make sure to pass along the credit to all of those whose material you have built upon or who have inspired you!
Thank you for reading through my blog on creating more reproducible technical content. Please post below or tweet at me if you have opinions on this content, or would like to add to it!