Introduction to LabKey and R Integration
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
How and What to deliver are two main themes of my journey to look for an effective way of developing data products. For the former, decent web technologies encompassing HTML, CSS and Javascript are important. However, as I’m not interested in a CRUD-only application, how well to integrate R with an application can be more important. Shiny may be a good choice but it wouldn’t be the case for the open source edition. Even the enterprise edition or shinyapps might not be suitable as the per-server-based cost model of the former could be too restrictive in AWS environment and I’m not sure how to handle processes that are not related to R in the latter. If Shiny is considered as inappropriate, another R specific option would be OpenCPU. In spite of interesting examples in the project website, I’m not sure if it catches up recent development trends well especially in reactivity. Then the remaining option is getting involved in typical web development where the How part can take much more time and effort than the What part. In some cases, this would be a good way to go but, if development projects don’t involve a team of web developers, it could be near mission impossible in practice.
What is LabKey server?
On the journey, I happen to encounter an oasis, which is LabKey Server. It is introduced in Wikipedia as
LabKey Server is free, open source software available for scientists to integrate, analyze, and share biomedical research data. The platform provides a secure data repository that allows web-based querying, reporting, and collaborating across a range of data sources. Specific scientific applications and workflows can be added on top of the basic platform and leverage a data processing pipeline.
Although it is mainly targeting biomedical research, I consider it can be applied to other fields without much headache. Below lists some of its key features.
LabKey server supports/includes
- popular RDBMS (MS SQL, MySQL, PostgreSQL, Oracle) and SAS Data as well as Excel, text and AWS S3
- built-in web parts for UI development
- data grid and charts can be useful for quick analysis – see this example
- scripting engines including R, Java, Perl as well as SQL queries
- tight integration with R outputs (charts …) as reports and even R Markdown documents
- custom web page/application development via JavaScript API
- Note it may incur additional on-off investment in Ext.js to fully utilize the API
- pipeline server that can handle heavy/long computing/processing
- modules to package certain functionality such as Workflow or analysis
- good set of authentication options
- useful extra features
- Message Board
- Wiki
- Issue Tracking
Thanks to these features, I consider LabKey server can be used as an internal collaboration tool as well as a framework to deliver products for external clients, focusing more on the What part. In the rest of this post, some basic features of LabKey server is introduced by creating a project and generating a report that consists of a data grid, built-in chart and R report.
Quick example
The latest stable version of LabKey server is 16.2. I installed it on my Windows labtop after downloading the Windows Graphical Installer (LabKey16.2-45209.14-community-Setup.exe) from the product page. While installing, it sets up SMTP connection and the following components are also added to C:\Program Files (x86)\LabKey Server
: Java Runtime Environment 1.8.0_92, Apache Tomcat 7.0.69 and PostgreSQL 9.5. Note that installation may fail due to port confliction if you’ve got Tomcat or PostgreSQL installed already. In this case, it’d be necessary to uninstall existing versions or use the manual installation option.
After installation, the server can be accessed via http://localhost:8080
– 8080 is the default port of Tomcat server. And it is required to set up a user. Then it is ready to play with.
Project setup
I set up a Collaboration project named LabkeyIntro through the following steps.
The start page shows Wiki and Messages web parts in the main panel and a Pages web part is shown in the right side bar. Also it is possible to add another web part by seleting and clicking a button.
Add a list
I removed the existing web parts and added a Lists web part in the side bar. By clicking the MANAGE LISTS link, it is possible to add a new list. Note that lists are the simplest data structure, which are tabular and have primary keys but don’t require participant ids or time/visit information. Check this to see other data structures.
I imported a simulated customer data of R for Marketing Research and Analytics and it can be downloaded from here.
The imported data is shown in a grid view by clicking the name of the list. By default the following features are provided in a grid view and they are quite useful to investigate/manage data as well as a customized view can be shown as a report.
- sort/filter in Customize Grid
- insert or delete a row
- export to Excel, text or script
- print and paging
- add or import fields in Design
Built-in chart
Added to the above features, it provides 3 built-in chart types: Box plot, Scatter plot and Time series plot. I made a scatter plot of age versus credit score, grouping by a boolean variable of email status.
R report
What’s more interesting and useful is its integration with R. While a more sophistigated report can be generated by R markdown, I just added a scatter plot matrices as a R report in this trial. By default, R script engine is not enabled so that it is necessary to turn it on as following.
Note that I only changed the program path and pandoc/rmarkdown is not set up. For further details, see this article.
Also note that, a package is loaded from where R is installed (eg C:\Program Files\R\R-3.3.1\library
) so that, if a package is not installed by administrator, it is not loaded. For example, I needed the car package but, as it is installed in my user account’s site directory (ie C:\Users\jaehyeon\Documents\R\win-library\3.3
), it was not loaded. In order to resolve this, I opened the R terminal (R.exe) as administrator and installed the package as following.
install.packages("car", lib="C:\\Program Files\\R\\R-3.3.1\\library", dependencies = TRUE)
Once it is ready, it is relatively straightforward to add a R output as a report – see the screen shots below.
Note that a graphics device (png()
) is explicitly set up.
Organize project page
Now there are three sections to deliver – data, built-in chart and R report. In order to organize them, I created 3 tabs: Example Data, Built-in Chart and R Intro as shown below.
A List – Single web part is added to Example Data while Report web parts are included in the remaining two tabs.
This is all I’ve done within several hours. Although only a few basic features are implemented, I consider it provides good amount of information for internal collaboration.
It’s too early but would you consider it’s alright to resort to LabKey server as an effective tool for the How part? Please inform your ideas.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.