Mark Sellors, Head of Data Engineering
If you use RStudio Connect to publish your Shiny app (and even if you don’t) take care with how your arrange your projects. If you have a single project that includes both your data prep and your Shiny app, packrat (which RSConnect uses to resolve package dependencies for your project) will assume the packages you used for both parts are required on the RSConnect server and will try to install them all.
This means that if your Shiny app uses three packages and your data prep uses six, packrat and RSconnect will attempt to install all nine on the server. This can be time consuming as packages are often built from source in Connect-based environments, so this will increase the deployment time considerably. Furthermore, some packages may require your server admin to resolve system-level package dependency issues, which may even be for packages that your app doesn’t use while it’s running.
Keeping data prep and your app within a single project can also confuse people who come on to your project as collaborators later in the development process, since the scope of the project will be less clear. Plus, documenting the pieces separately also helps to improve clarity.
Lastly, separating the two will make your life easier if you ever get to the stage where you want to start automating parts of your workflow as the data prep stage will already be separate from the rest of the project.
Clear separation of individual projects (and by extension, source code repositories) may cause some short term pain, but the long term benefits are hard to understate:
- Smoother and faster RStudio Connect deployments
- Easier collaboration
- More straightforward automation (easier to build out into a pipeline)
- Simpler to document – one set for the app, another for your data prep
Of course, if your Shiny app actually does data prep as part of the apps internal processing, then all bets are off!