Is Your Company Giving You What You Need to Build Great R Shiny Apps?

[This article was first published on Tag: r - Appsilon | Enterprise R Shiny Dashboards, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Performant Shiny Apps with better databases blog banner

Is your R Shiny app slow, but you doubt Shiny itself is the bottleneck? This is a likely scenario, especially if you’re given a slow database with no indexes or normalization. Data retrieval and database communication in general become an issue, and it has nothing to do with R Shiny.

Today you’ll learn how to build great R Shiny apps by solving the most common slow database problem – indexes and normalization. We’ll first provision a free Postgres database on AWS, import some data, and then work extensively on this issue. There’s even a section on preparing your data in advance, so the resulting Shiny app doesn’t have to implement complex SQL statements. Let’s get started!

Looking to build a production-ready Shiny app? Try Rhino – An R package from Appsilon.

Table of contents:


Database Setup, Connection, and Data Import

Setting up databases is now easier than ever, and most cloud providers will allow you to use one database instance free of charge. AWS is one such cloud provider, and we’ll use it to run a free-tier PostgreSQL database.

Concerned with your data quality? Automate data quality reports with the R package: data.validator.

It’s not an entirely straightforward process, but we’ll walk you through each step. The only prerequisite is that you have an AWS account registered.

AWS Security Group for Database Access

Before even provisioning a database, you have to create a new Security Group. Put simply, this will allow you to tag the database, so specific inbound/outbound rules apply to it. You want to allow all traffic from anywhere, just as shown in the image below:

Image 1 - AWS security group that allows all traffic from anywhere

Image 1 – AWS security group that allows all traffic from anywhere

As you can see, we named our security group r-postgres, and this name is a vital piece of information to remember for the following step.

You could be more restrictive if this is a production application, but this simple rule will serve us for now. Up next, you can start provisioning the database.

Provisioning a Free-Tier PostgreSQL Database

Under RDS, provision a new database and make sure to select PostgreSQL as a database vendor, and Free tier under templates:

Image 2 - Provisioning a new PostgreSQL database (1)

Image 2 – Provisioning a new PostgreSQL database (1)

If you scroll down a bit, you’ll see two essential things to configure. The first one is Public access. You must select “Yes” as an option, otherwise, the database won’t get a public IP you can use for the connection. Also, make sure to select r-postgres as a VPC security group:

Image 3 - Provisioning a new PostgreSQL database (2)

Image 3 – Provisioning a new PostgreSQL database (2)

You’ll also have to specify a username/password combination for the database instance, so make sure to go with something you’ll remember.

That’s everything you need to change, so scroll to the bottom and click on the big orange button to create the database instance. You’ll see a screen like this one after a couple of minutes:

Image 4 - Provisioning a new PostgreSQL database (3)

Image 4 – Provisioning a new PostgreSQL database (3)

Note the Endpoint and Port values – these are essential for establishing connections either from R or from GUI tools. Let’s see how next.

Database Connection from a GUI Tool

We’re using a free version of TablePlus as a GUI tool for database communication and SQL execution. You are free to use any other tool, of course.

In TablePlus, create a new connection to PostgreSQL and paste your specific Host/Port/User/Password values – you know these from the previous section:

Image 5 - TablePlus PostgreSQL connection

Image 5 – TablePlus PostgreSQL connection

The fields will light up green when you click on “Test” if the connection succeeds. If that’s the case, click on “Connect” and you’re ready to go. If the fields light up red, you entered some of the values incorrectly, or the database instance hasn’t finished provisioning yet. Make sure to double-check.

Up next, let’s see how to establish a database connection from R.

Database Connection from R

R needs an additional DBI package in order to establish a connection with various databases, PostgreSQL being one of them. Simply install it by running install.packages("DBI") and you’ll be good to go.

Regarding the connection itself, refer to the snippet below. It establishes a connection to a remote AWS RDS Database, runs a simple query, prints the result, and disconnects from the database.

Of course, don’t forget to fill out the connection details first:

library(DBI)

conn <- dbConnect(
  RPostgres::Postgres(),
  dbname = "postgres",
  host = "",
  port = "",
  user = "",
  password = ""
)

q <- dbGetQuery(
  conn = conn,
  statement = "SELECT 1 AS myid"
)

print(q)

dbDisconnect(conn = conn)

If everything went well, you’ll see the following printed to the R console:

Image 6 - PostgreSQL connection through R

Image 6 – PostgreSQL connection through R

We now have everything the connection established both through R and a GUI tool, so there’s only one more thing needed before exploring how to build great R Shiny apps – the data itself. Let’s take care of it next.

Data Creation and Import to PostgreSQL Database

Most real-world applications summarize huge volumes of data and display summary statistics through text, tables, and charts. We’re somewhat limited to the volume of data in a free-tier database, but we can still create and transfer reasonably large datasets.

We’ll mix things up technology-wise by using Python to create the data. If you don’t have Python installed, or don’t want to use it to create the dataset, simply download it from our servers.

The following Python script will create a 1M row dataset containing dummy, random employee data, parse it with pandas and save it locally in CSV file format:

import numpy as np
import pandas as pd
import names


department_options = ["Sales", "Marketing", "IT", "Management", "Accounting"]

def get_salary(department: str) -> float:
    mappings = {
        "Sales": 1.0,
        "Marketing": 1.25,
        "IT": 2,
        "Management": 3,
        "Accounting": 1.5
    }
    
    base_sal = np.random.randint(low=3000, high=5000)
    
    try:
        return base_sal * mappings[department]
    except Exception as e:
        return base_sal
      
      
if __name__ == "__main__":
    _ids = np.arange(1, 1000001)
    _first_names = [names.get_first_name() for name in range(len(_ids))]
    _last_names = [names.get_last_name() for name in range(len(_ids))]
    _departments = [np.random.choice(department_options) for dept in range(len(_ids))]
    _salaries = [get_salary(dept) for dept in _departments]
    _years_experience = [int(sal // 1000) for sal in _salaries]
    
    df = pd.DataFrame({
        "id": _ids,
        "first_name": _first_names,
        "last_name": _last_names,
        "department": _departments,
        "salary": _salaries,
        "years_experience": _years_experience
    })
    df.to_csv("/path/to/Desktop/employee_data.csv", index=False)

Here’s what the first couple of rows looks like:

Image 7 - Head of the employees dataset

Image 7 – Head of the employees dataset

The next step is to create a database table and import the dataset. Run the following SQL code for table creation:

CREATE TABLE employees(
	emp_id INTEGER NOT NULL,
	emp_first_name VARCHAR(64),
	emp_last_name VARCHAR(64),
	emp_department VARCHAR(32),
	emp_salary REAL,
	emp_years_experience INTEGER
);

And now right-click on the table from the left menu, and click on Import. Provide a path to the CSV file, and map the table columns to the columns found in the CSV file:

Image 8 - CSV to database table column mappings

Image 8 – CSV to database table column mappings

Once done, click on “Import” – it will take a couple of minutes:

Image 9 - Importing local data to a PostgreSQL database

Image 9 – Importing local data to a PostgreSQL database

And that’s it – the dataset is now loaded into the table and we can start optimizing it for better R Shiny performance.

Database Indexes – Build Great R Shiny Apps By Speeding up Your Queries up to 5000 Times

You can think of SQL table indexes like index page(s) in a book. Essential topics and ideas are listed at the end or beginning, so you can find the topics you need in no time. In other words, you can scan the index to find topics of interest and their respective location.

The same methodology applies to database table indexes. If the table is indexed properly, it will alter the execution plan to retrieve the data as quickly as possible. This is one of the best and most guaranteed ways to build great R Shiny apps and improve Shiny app performance overall.

But what about the negative sides? SQL indexes aren’t without flaws, and the biggest considerations come down to additional storage space in your database, and slower insertion/update operations, as the index has to be updated every time. Keep these in mind.

There are many types of SQL table indexes, and we’ll explore four of them in this section. But first, let’s duplicate our employees table so we can easily compare performances between queries on indexed and non-indexed tables:

CREATE TABLE employees_indexed AS SELECT * FROM employees;

Once done, proceed to the unique index section.

Unique Index

This type of index provides an index for each unique value. It’s typically associated with primary keys. Here’s how you can add it to our ID column:

CREATE UNIQUE INDEX
ON employees_indexed(emp_id);

PostgreSQL allows us to compare performance and execution plans by putting the EXPLAIN ANALYZE keywords in front of a SELECT statement. We’ll use it to compare performances when fetching an employee with a specific emp_id value.

First, for the non-indexed table:

EXPLAIN ANALYZE SELECT * FROM employees WHERE emp_id = 578652;
Image 10 - Unique index (1)

Image 10 – Unique index (1)

In absolute terms, 181 ms is next to nothing, but gathering data from dozens of SQL queries quickly adds up.

Let’s run the same on the indexed table and compare the relative scores:

EXPLAIN ANALYZE SELECT * FROM employees_indexed WHERE emp_id = 578652;
Image 11 - Unique index (2)

Image 11 – Unique index (2)

Put simply, the indexed version is thousands of times faster, which is an amazing performance optimization considering it only took us one line of SQL code to create the index. This is because the non-indexed implementation requires a full-table scan to find the record, while the indexed implementation uses a much faster index scan.

Let’s go over non-unique indexes next.

Non-Unique Index

This type of index is ideal when the data in a given column is not unique but often used in the WHERE clause. It’s a default type of index created by PostgreSQL, so to add it, just drop the UNIQUE keyword.

Let’s create one on the last name column:

CREATE INDEX index_employees_last_name
ON employees_indexed(emp_last_name);

Let’s now fetch some records and put the emp_last_name as a filter condition in the WHERE clause:

EXPLAIN ANALYZE SELECT emp_last_name, emp_department, emp_salary FROM employees WHERE emp_last_name = 'Anderson';
Image 12 - Non-unique index (1)

Image 12 – Non-unique index (1)

The indexed implementation groups data into heaps behind the scenes, so the execution plan will change and the overall results will be returned much faster:

EXPLAIN ANALYZE SELECT emp_last_name, emp_department, emp_salary FROM employees_indexed WHERE emp_last_name = 'Anderson';
Image 13 - Non-unique index (2)

Image 13 – Non-unique index (2)

It’s not a couple of thousands of times faster, but the difference is still quite significant.

Multi-Column Index

No one is stopping you from adding a single index to multiple table columns. This is a de-facto practice when multiple columns are oftentimes used to filter out records in a WHERE clause. If you’re using multiple filter conditions, consider adding a multi-column index to these columns.

Let’s create by combining emp_last_name and emp_department:

CREATE INDEX index_employees_last_name_dept
ON employees_indexed(emp_last_name, emp_department);

And now let’s fetch some records by filtering down on these two conditions – last name and department:

EXPLAIN ANALYZE SELECT * FROM employees WHERE emp_last_name = 'Anderson' AND emp_department = 'Management';
Image 14 - Multi-column index (1)

Image 14 – Multi-column index (1)

Not too bad, 232 ms, but indexed implementation takes this to a whole new level:

EXPLAIN ANALYZE SELECT * FROM employees_indexed WHERE emp_last_name = 'Anderson' AND emp_department = 'Management';
Image 15 - Multi-column index (2)

Image 15 – Multi-column index (2)

Under a single millisecond, that’s impressive! If you think these results are great, wait until you see what a sorted index can do.

Sorted Index

By default, an index is sorted in ascending order. But what if you’re typically looking for the last records, like employee names that start with a “Z”? Well, adding a sorted index with descending sorting is the way to go.

Let’s add one on the emp_first_name column:

CREATE INDEX index_employees_first_name_desc
ON employees_indexed(emp_first_name DESC NULLS LAST);

Now we’ll run a query that keeps only the employees named “Zack”:

EXPLAIN ANALYZE SELECT * FROM employees WHERE emp_first_name = 'Zack';
Image 16 - Sorted index (1)

Image 16 – Sorted index (1)

Not too bad considering we have 1M records in the table. But the indexed implementation makes this 175 ms look like a joke:

EXPLAIN ANALYZE SELECT * FROM employees_indexed WHERE emp_first_name = 'Zack';
Image 17 - Sorted index (2)

Image 17 – Sorted index (2)

These are the four essential types of SQL indexes you must know. Next, let’s see how they help you build great R Shiny apps.

Leveraging Indexed Data in R Shiny

It goes without saying, but indexes are made on SQL tables and have nothing to do with R and Shiny. This is the whole idea behind fixing the data problems at the source, and not blaming the dashboard framework.

The dashboard you’re about to see allows the user to select minimum and maximum employee ID, and update the table data with a press of the action button. We’re using the DT package for table data visualization, but there are other options:

Need to render a table in R Shiny? Consider these top R Package for visualizing Table Data.

The Shiny app establishes a database connection on the app run and kills it when the app is closed. This way, you don’t have to worry about leaving the connection open by accident.

Anyhow, here’s the code:

library(DBI)
library(DT)
library(shiny)

conn <- dbConnect(
  RPostgres::Postgres(),
  dbname = "postgres",
  host = "",
  port = "",
  user = "",
  password = ""
)


ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      tags$h3("Shiny PostgreSQL"),
      tags$br(),
      numericInput(inputId = "minId", label = "Min ID:", value = 100000, min = 1, max = 1000000, step = 1),
      numericInput(inputId = "maxId", label = "Max ID:", value = 110000, min = 1, max = 1000000, step = 1),
      actionButton(inputId = "btn", label = "Fetch")
    ),
    mainPanel(
      textOutput(outputId = "outDuration"),
      DTOutput(outputId = "outTable")
    )
  )
)


server <- function(input, output, session) {
  v <- reactiveValues(
    minId = NULL,
    maxId = NULL,
    tableName = NULL,
    timeStart = NULL,
    timeEnd = NULL
  )
  observeEvent(input$btn, {
    v$minId <- input$minId
    v$maxId <- input$maxId
  })
  
  data <- reactive({
    v$timeStart <- Sys.time()
    
    if (is.null(v$minId) || is.null(v$maxId)) return()
    q <- dbGetQuery(
      conn = conn,
      statement = paste(
        "SELECT * FROM employees_indexed WHERE emp_id BETWEEN", v$minId, "AND", v$maxId
      )
    )
    
    v$timeEnd <- Sys.time()
    
    data.frame(q)
  })
  
  output$outDuration <- renderText({
    if (is.null(v$timeStart) || is.null(v$timeEnd)) return()
    duration <- v$timeEnd - v$timeStart
    paste("Duration:", duration, "seconds.")
  })
  
  output$outTable <- renderDT({
    datatable(
      data = data(),
      filter = "top"
    )
  })

  
  session$onSessionEnded(function() {
    dbDisconnect(conn = conn)
    print("Database connection closed.")
  })
}


shinyApp(ui = ui, server = server)

Let’s run the app to see what we’re dealing with:

Image 18 - R Shiny application with indexed SQL table data

Image 18 – R Shiny application with indexed SQL table data

Overall, the app is fast, which perfectly demonstrates that R Shiny is a fast framework for building apps and dashboards. What might slow it down is unoptimized databases and data sources, so make sure to keep yours in order.

Database Normalization – From Theory to Implementation

Database normalization is a technique used when designing databases to reduce data redundancy, divide large (flat) tables into smaller ones, and link them using relationships. Normalization is often described with Normal Forms (NF). They range from 1 to 6, but in practice, normalization is typically achieved at the 3rd Normal Form (NF3).

The third normal form assumes the data prior to normalization is already in the second normal form. You have to go sequentially or go through normal forms in order. Here’s what it means in a practical sense:

  • First Normal Form (1NF)
    • Each column must contain only a single value
    • Each record must be unique
  • Second Normal Form (2NF)
    • Must be in 1NF
    • The table must have a single-column primary key that does not depend on any subset of candidate key relations
  • Third Normal Form (3NF)
    • Must be in 2NF
    • The table must not have transitive functional dependencies

In short, one table must represent one entity (sale, customer, employee), and the tables have to be linked with foreign keys. This isn’t an in-depth course in database normalization, so we’ll stop here. If you want to learn more, this article from guru99.com might help.

We’ll now focus on creating and storing a denormalized data representation to see in practical terms what’s wrong with it.

Create and Load Denormalized Sales Data

In this section, we’ll once again use Python to create a denormalized dataset. It will contain details on an employee, their salary, office branch, sale made, and sale details. You can download the dataset from the Appsilon server if you don’t want to create it manually.

Here’s the dataset creation code:

import random
import numpy as np
import pandas as pd
from datetime import datetime, timedelta


def get_branch_details(branch_name: str) -> dict:
    mappings = {
        "New York": {"branch_manager": "John Johnson", "n_employees": 32},
        "San Francisco": {"branch_manager": "Jane Janeson", "n_employees": 26}
    }
    return mappings[branch_name]


def generate_sale() -> dict:
    customers = ["Company A", "Company B", "Company C"]
    quantities = np.arange(1, 101)
    unit_prices = [5, 10, 15, 20, 25]
    
    def get_random_date(min_date: datetime = datetime.now() - timedelta(days=365), max_date: datetime = datetime.now()) -> datetime:
        delta = max_date - min_date
        int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
        random_second = random.randrange(int_delta)
        return min_date + timedelta(seconds=random_second)
    
    c_customer = random.choice(customers)
    c_quantity = random.choice(quantities)
    c_unit_price = random.choice(unit_prices)
    c_total_price = c_quantity * c_unit_price
    c_sale_date = get_random_date()
    
    return {
        "sale_customer": c_customer,
        "sale_quantity": c_quantity,
        "sale_unit_price": c_unit_price,
        "sale_total_price": c_total_price,
        "sale_date": c_sale_date
    }


if __name__ == "__main__":
    employees = [
      {"employee_id": 1, "first_name": "Bob", "last_name": "Scott", "hire_date": "2020-01-05", "email": "[email protected]", "salary": 4000, "office_branch": "New York"},
      {"employee_id": 2, "first_name": "Mark", "last_name": "Statson", "hire_date": "2018-09-01", "email": "[email protected]", "salary": 6000, "office_branch": "San Francisco"},
      {"employee_id": 3, "first_name": "Marry", "last_name": "June", "hire_date": "2015-01-15", "email": "[email protected]", "salary": 7000, "office_branch": "New York"},
      {"employee_id": 4, "first_name": "Dakota", "last_name": "Press", "hire_date": "2022-06-01", "email": "[email protected]", "salary": 2500, "office_branch": "New York"},
      {"employee_id": 5, "first_name": "Dylan", "last_name": "Star", "hire_date": "2022-11-15", "email": "[email protected]", "salary": 2000, "office_branch": "San Francisco"}
    ]
  
    records_per_employee = {1: 100, 2: 225, 3: 550, 4: 80, 5: 65}
    
    sales_data = []
    for i, emp in enumerate(employees):
        n_records = records_per_employee[emp["employee_id"]]
        
        for record in range(n_records):
            sales_data.append({
                **emp,
                **get_branch_details(branch_name=emp["office_branch"]),
                **generate_sale()
            })
            
    sales_data_df = pd.DataFrame(sales_data)
    sales_data_df.to_csv("/path/to/Desktop/sales_data_denormalized.csv", index=False)

Here’s what a sample of 15 records looks like:

Image 19 - Sample of our denormalized dataset

Image 19 – Sample of our denormalized dataset

This dataset is considered to be denormalized. Why? Well, it’s one table that contains information on multiple entities – employee, customer, office branch, and sale.

You can load the data into the database by importing the CSV file, just like before, so we won’t go over the process again. Let’s work on its normalization next.

Database Normalization in Action

So far, this is what our denormalized table looks like:

Image 20 - Denormalized table structure

Image 20 – Denormalized table structure

We want to normalize it by creating a table for each entity and assigning an auto-incremented ID column that will serve as a foreign key to other tables. In other words, this is the normalized structure we’re aiming for:

Image 21 - Normalized table structure

Image 21 – Normalized table structure

To achieve this, let’s first create the tables and connect them via the FOREIGN KEY constraints:

CREATE TABLE customers(
	cst_id SERIAL PRIMARY KEY,
	cst_name VARCHAR(64)
);

CREATE TABLE office_branches(
	obr_id SERIAL PRIMARY KEY,
	obr_name VARCHAR(64),
	obr_branch_manager VARCHAR(64),
	obr_n_employees INTEGER
);

CREATE TABLE employees(
	emp_id SERIAL PRIMARY KEY,
	emp_first_name VARCHAR(32),
	emp_last_name VARCHAR(32),
	emp_hire_date DATE,
	emp_email VARCHAR(64),
	emp_salary REAL,
	emp_obr_id INTEGER,
	CONSTRAINT fk_emp_office_branches FOREIGN KEY(emp_obr_id) REFERENCES office_branches(obr_id)
);

CREATE TABLE sales(
	sal_id SERIAL PRIMARY KEY,
	sal_emp_id INTEGER,
	sal_cst_id INTEGER,
	sal_quantity INTEGER,
	sal_unit_price REAL,
	sal_total_price REAL,
	sal_date DATE,
	CONSTRAINT fk_sal_employees FOREIGN KEY(sal_emp_id) REFERENCES employees(emp_id),
	CONSTRAINT fk_sal_customers FOREIGN KEY(sal_cst_id) REFERENCES customers(cst_id)
);

The tables are now created but are empty at the moment. The goal is for each table to contain only the unique elements, referenced from the main sales table via foreign keys.

We have to start from the simplest lookup tables first because tables with foreign keys will expect a connection to these tables upon insertion:

INSERT INTO customers(cst_name)
SELECT 
	DISTINCT(sale_customer)
FROM sales_data_denormalized
ORDER BY 1;


INSERT INTO office_branches(obr_name, obr_branch_manager, obr_n_employees)
SELECT DISTINCT
	office_branch,
	branch_manager,
	n_employees
FROM sales_data_denormalized
ORDER BY 1;


INSERT INTO employees(emp_first_name, emp_last_name, emp_hire_date, emp_email, emp_salary, emp_obr_id)
SELECT DISTINCT
	first_name,
	last_name,
	hire_date,
	email,
	salary,
	(SELECT obr_id FROM office_branches WHERE obr_name = office_branch) AS branch_id
FROM sales_data_denormalized;


INSERT INTO sales(sal_emp_id, sal_cst_id, sal_quantity, sal_unit_price, sal_total_price, sal_date)
SELECT
	(SELECT emp_id FROM employees WHERE emp_email = email) AS employee_id,
	(SELECT cst_id FROM customers WHERE cst_name = sale_customer) AS customer_id,
	sale_quantity,
	sale_unit_price,
	sale_total_price,
	sale_date
FROM sales_data_denormalized
ORDER BY 1;

This wasn’t so difficult, was it? We’ll now issue a couple of queries to verify everything is working correctly.

Querying Normalized Database Tables from a GUI

Running SQL queries on normalized tables isn’t as convenient for a developer as fetching rows from a single table. You now have to join multiple tables with foreign keys, and the entire SQL statement will be bulkier for that reason.

Nevertheless, this is a way to go if you want to respect industry standards, and conventions, and want to optimize database performance.

The first query we’ll run will return the employee’s full name, office branch, and the total amount of sales (USD) after January 1st, 2023. We’re only outputting employees in the New York branch for easier integration with Shiny later on:

SELECT
	e.emp_first_name || ' ' || e.emp_last_name AS employee_name,
	o.obr_name AS office_branch,
	SUM(s.sal_total_price) AS amount_sold
FROM sales s
JOIN employees e ON s.sal_emp_id = e.emp_id
JOIN office_branches o ON e.emp_obr_id = o.obr_id
WHERE 
	s.sal_date >= TO_DATE('2023-01-01', 'YYYY-MM-DD')
AND o.obr_name = 'New York'
GROUP BY e.emp_first_name || ' ' || e.emp_last_name, o.obr_name
ORDER BY 3 DESC;

Here’s the output:

Image 22 - Total sales by employee

Image 22 – Total sales by employee

Everything looks right, so let’s proceed with the second SQL query. It will provide a summary of sales by month in the New York branch:

SELECT
	DATE_TRUNC('MONTH', s.sal_date)::DATE AS period,
	SUM(s.sal_total_price) AS total_sold
FROM sales s
JOIN employees e ON s.sal_emp_id = e.emp_id
JOIN office_branches o ON e.emp_obr_id = o.obr_id
WHERE o.obr_name = 'New York'
GROUP BY DATE_TRUNC('MONTH', s.sal_date)::DATE
ORDER BY 1;
Image 23 - Sales total over time for the New York branch

Image 23 – Sales total overtime for the New York branch

This will make for a great-looking line chart in the following section.

Querying Normalized Database Tables from R Shiny

Now it’s time to see the results of database normalization in R Shiny. Normalization is mandatory if you want to build great R Shiny apps, but it doesn’t necessarily have to come from your end. If you’re an R developer, you can just ask your database manager or database engineer whether the database tables are properly normalized.

Back to the topic now. The dashboard code you’ll see below allows the user to filter the record based on the office branch (either New York or San Francisco). As soon as the update is made, two queries from the previous section are run. The results from the first query are represented as a data table, and the second as a line chart.

Wait, what is a line chart? Read our complete guide to making and styling line charts with R and ggplot2.

The server logic mostly boils down to writing SQL statements, which is something we’ll optimize in the following section. For now, bear with us and run the following code:

library(DBI)
library(DT)
library(ggplot2)
library(shiny)

# Global database connection
conn <- dbConnect(
  RPostgres::Postgres(),
  dbname = "postgres",
  host = "",
  port = "",
  user = "",
  password = ""
)


ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      tags$h3("Shiny PostgreSQL"),
      tags$br(),
      selectInput(inputId = "selBranch", label = "Branch:", choices = c("New York", "San Francisco"), selected = "New York")
    ),
    mainPanel(
      # Table for the first query
      tags$div(
        tags$h4("2023 Statistics for a Selected Branch"),
        DTOutput(outputId = "table2023Stats"),
      ),
      # Chart for the second query
      tags$div(
        tags$h4("Sales by Month for a Selected Branch"),
        plotOutput(outputId = "plotTimeSeries")
      )
    )
  )
)


server <- function(input, output, session) {
  # Get table data
  data_stats <- reactive({
    q <- dbGetQuery(
      conn = conn,
      statement = paste0(
        "SELECT ",
        "e.emp_first_name || ' ' || e.emp_last_name AS employee_name, ",
        "o.obr_name AS office_branch, ",
        "SUM(s.sal_total_price) AS amount_sold ",
        "FROM sales s ",
        "JOIN employees e ON s.sal_emp_id = e.emp_id ",
        "JOIN office_branches o ON e.emp_obr_id = o.obr_id ",
        "WHERE s.sal_date >= TO_DATE('2023-01-01', 'YYYY-MM-DD') ",
        "AND o.obr_name = '",
        input$selBranch,
        "' GROUP BY e.emp_first_name || ' ' || e.emp_last_name, o.obr_name ",
        "ORDER BY 3 DESC"
      )
    )
    data.frame(q)
  })
  
  # Get chart data
  data_graph <- reactive({
    q <- dbGetQuery(
      conn = conn,
      statement = paste0(
        "SELECT ",
        "DATE_TRUNC('MONTH', s.sal_date)::DATE AS period, ",
        "SUM(s.sal_total_price) AS total_sold ",
        "FROM sales s ",
        "JOIN employees e ON s.sal_emp_id = e.emp_id ",
        "JOIN office_branches o ON e.emp_obr_id = o.obr_id ",
        "WHERE o.obr_name = '",
        input$selBranch,
        "' GROUP BY DATE_TRUNC('MONTH', s.sal_date)::DATE ",
        "ORDER BY 1"
      )
    )
    data.frame(q)
  })
  
  
  # Display table data
  output$table2023Stats <- renderDT({
    datatable(
      data = data_stats(),
      filter = "top"
    )
  })
  
  # Display chart data
  output$plotTimeSeries <- renderPlot({
    ggplot(data_graph(), aes(x = period, y = total_sold)) + 
      geom_line(color = "#0099f9", size = 2) +
      geom_point(color = "#0099f9", size = 5) +
      geom_label(
        aes(label = total_sold),
        nudge_x = 0.25,
        nudge_y = 0.25,
        check_overlap = TRUE
      ) + 
      theme_classic()
  })

  
  # Close connection on app exit
  session$onSessionEnded(function() {
    dbDisconnect(conn = conn)
    print("Database connection closed.")
  })
}


shinyApp(ui = ui, server = server)

Here’s the R Shiny app you should see:

Image 24 - R Shiny dashboard leveraging normalized database tables

Image 24 – R Shiny dashboard leveraging normalized database tables

That was easy, right? Up next, you’ll learn how to get rid of the sheer code volume produced by the SQL statements.

Preparing Data in Advance – A Must If You Want to Build Great R Shiny Apps

SQL Views will allow us to drastically reduce the amount of SQL code we have to write in R/R Shiny. They’re also a great way for data engineers to prepare data for you, especially if SQL isn’t where you shine.

But first, what are Views? You can think of views as virtual tables that aggregate data from a table or set of tables and update automatically as the data in the mentioned tables changes. For example, you can wrap the entire logic of the previous two SQL statements in a view, and then run a SELECT command on that view.

Let’s see how it works in practice.

Creating Views in PostgreSQL

Our first query summarizes sale statistics for all employees in a branch from January 1st, 2023 onwards. We’ll remove the branch filter condition from the SQL code since we want the view to contain data for all branches, and then we’ll filter them out in R Shiny.

You can create a view (or replace an existing one) in PostgreSQL by running a CREATE OR REPLACE VIEW <view-name> command, and then pasting a SELECT statement afterward. For example, here’s the code required to create a view named v_emp_stats_2023:

CREATE OR REPLACE VIEW v_emp_stats_2023 AS (
	SELECT
		e.emp_first_name || ' ' || e.emp_last_name AS employee_name,
		o.obr_name AS office_branch,
		SUM(s.sal_total_price) AS amount_sold
	FROM sales s
	JOIN employees e ON s.sal_emp_id = e.emp_id
	JOIN office_branches o ON e.emp_obr_id = o.obr_id
	WHERE 
		s.sal_date >= TO_DATE('2023-01-01', 'YYYY-MM-DD')
	GROUP BY e.emp_first_name || ' ' || e.emp_last_name, o.obr_name
	ORDER BY 3 DESC
);

SELECT * FROM v_emp_stats_2023;

And here’s what it contains:

Image 25 - Contents of a SQL View (1)

Image 25 – Contents of a SQL View (1)

Try creating the second view – v_total_sales_by_month without looking at the code snippet below. It will be a nice and simple challenge to make sure you’ve grasped the concept.

Can’t figure it out or just want to verify? Here’s our code:

CREATE OR REPLACE VIEW v_total_sales_by_month AS (
	SELECT
		DATE_TRUNC('MONTH', s.sal_date)::DATE AS period,
		o.obr_name AS office_branch,
		SUM(s.sal_total_price) AS total_sold
	FROM sales s
	JOIN employees e ON s.sal_emp_id = e.emp_id
	JOIN office_branches o ON e.emp_obr_id = o.obr_id
	GROUP BY DATE_TRUNC('MONTH', s.sal_date)::DATE, o.obr_name
	ORDER BY 1, 2
);

SELECT * FROM v_total_sales_by_month;
Image 26 - Contents of a SQL View (2)

Image 26 – Contents of a SQL View (2)

We now have the views created, so let’s make appropriate changes to the R Shiny application.

Changes in R Shiny App Structure

You’ll only have to change the bits of the server logic. You’ll want to select everything from the view and add a filter condition, so only the records from a currently selected branch are displayed.

Feel free to copy the code, or try to implement the changes on your own – it’ll be a good SQL practice:

library(DBI)
library(DT)
library(ggplot2)
library(shiny)

# Global database connection
conn <- dbConnect(
  RPostgres::Postgres(),
  dbname = "postgres",
  host = "",
  port = "",
  user = "",
  password = ""
)


ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      tags$h3("Shiny PostgreSQL"),
      tags$br(),
      selectInput(inputId = "selBranch", label = "Branch:", choices = c("New York", "San Francisco"), selected = "New York")
    ),
    mainPanel(
      # Table for the first query
      tags$div(
        tags$h4("2023 Statistics for a Selected Branch"),
        DTOutput(outputId = "table2023Stats"),
      ),
      # Chart for the second query
      tags$div(
        tags$h4("Sales by Month for a Selected Branch"),
        plotOutput(outputId = "plotTimeSeries")
      )
    )
  )
)


server <- function(input, output, session) {
  # Get table data
  data_stats <- reactive({
    q <- dbGetQuery(
      conn = conn,
      statement = paste0(
        "SELECT * FROM v_emp_stats_2023 WHERE office_branch = '",
        input$selBranch,
        "'"
      )
    )
    data.frame(q)
  })
  
  # Get chart data
  data_graph <- reactive({
    q <- dbGetQuery(
      conn = conn,
      statement = paste0(
        "SELECT period, total_sold FROM v_total_sales_by_month WHERE office_branch = '",
        input$selBranch,
        "'"
      )
    )
    data.frame(q)
  })
  
  
  # Display table data
  output$table2023Stats <- renderDT({
    datatable(
      data = data_stats(),
      filter = "top"
    )
  })
  
  # Display chart data
  output$plotTimeSeries <- renderPlot({
    ggplot(data_graph(), aes(x = period, y = total_sold)) + 
      geom_line(color = "#0099f9", size = 2) +
      geom_point(color = "#0099f9", size = 5) +
      geom_label(
        aes(label = total_sold),
        nudge_x = 0.25,
        nudge_y = 0.25,
        check_overlap = TRUE
      ) + 
      theme_classic()
  })
  
  
  # Close connection on app exit
  session$onSessionEnded(function() {
    dbDisconnect(conn = conn)
    print("Database connection closed.")
  })
}


shinyApp(ui = ui, server = server)

The dashboard looks identical to what we had previously, so there’s no need to show it again. The code blocks are much shorter and easier to read now, though.

That’s all for the tips we have for you today. Let’s make a short recap next.


Summing Up How to Build Great R Shiny Apps

To summarize, applications and dashboards can be slow. If that’s the case, your users won’t like it, and you’re likely to end up losing a bunch of them. That being said, oftentimes it has nothing to do with the programming language, tool, or technology you write the dashboard in, but is instead related to poor database performance and management.

We hope our article has helped you shed some light on what could possibly go wrong on the database end. It’s easy to dump everything you have in a single flat table and call it a day, but that approach is likely to have a huge negative impact on your app performance. Spend some time learning concepts such as table indexes and database normalization, or at least speak with the database expert in your company.

Have you experienced major bottlenecks in R Shiny applications? If so, were they related to the quality of the data storage solution? Please let us know how you’ve fixed the problem – @appsilon.

Want to learn more about R Shiny and make a career out of it? Here’s everything you need to go from zero to hired.

The post appeared first on appsilon.com/blog/.

To leave a comment for the author, please follow the link and comment on their blog: Tag: r - Appsilon | Enterprise R Shiny Dashboards.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)