Basic Shiny Application

November 27, 2016
By

(This article was first published on R – Steffen Ruefer, and kindly contributed to R-bloggers)

In this article I will describe how I built a basic web application with R and Shiny. Shiny is a web application framework for R and can be used to build and deploy interactive web applications.

Final Product Requirements

Recently I did some research on real estate prices through a property website. Going through each of the displayed properties was time consuming and showed me many details that I did not need to know for now. I was also worried about errors in the data; for example if I filtered my search by number of bedrooms, I might miss on a house that shows zero bedrooms, while in reality it was just a typo by the real estate agent filling in the data. I decided to write a quick Shiny application to get a better idea about pricing data.

First, I decided what the application needs to be able to do:

  • Display a Price vs. Area Plot
  • Filter data by Price, Area and number of bedrooms
  • Color code the data points that have different number of bedrooms
  • Fit a basic model to the data and display it in the plot
  • Update the Plot after each input change

For the Layout Requirements I chose a simple page layout with a sidebar at the left for the user input. The plot would contain the majority of the page space, as shown below.

web app layout

Data Sourcing

I downloaded a dataset through web scraping (there will be a tutorial about that topic later) and cleaned the data to make it easier to use in the web application. After cleaning the data, it consisted of 77 observations (rows) and 4 features (columns). The features are:

  • yearly_price: Yearly Rent in U.A.E. Dirham (AED)
  • num_checks: Number of checks per year
  • num_bedrooms: Number of bedrooms
  • area_sqft: Area in square feet

The data is saved as a CSV file.

Reading and Exploring the Data

I put the data file called re_data.csv into the main directory. After that the code can be loaded and the first few lines displayed as below:

mydat <- read.csv("re_data.csv", stringsAsFactors = FALSE)
head(mydat, 5)

This will show the below output:

yearly_price num_checks num_bedrooms area_sqft
1       199999          1            5      3168
2       210000          2            3      3400
3       210000          2            4      2788
4       185000         NA            3      2540
5       210000          4            4      3162

The actual data loading will happen in the app code. I did some exploratory data analysis, which I will not discuss further here as the topic is how to build the web application. But to get an idea how the data looks like, I printed the scatter plot for rental price vs. area below.

scatterplot

While the plot gives some idea about the price vs area relationship, it is not very helpful. One data point on the top right corner appears to be an outlier or a special case – and it causes the rest of the data points to be squeezed to the left bottom corner. With a web application, such data points can be filtered out and the data be re-plotted instantly.

Building the Shiny App

To create a new Shiny App from RStudio, go to “New” and select “Shiny Web App…”.

new shiny app step 1
new shiny app step 2

After you selected a name and a directory for the app, RStudio will open a new file called app.R that already contains some code. It is an example application that is fully functional already; usually I use this code as template to create my own applications. In the code snippets below I already removed the example code and only left the template structure.

# PART 1 - Load Libraries
library(shiny)

# PART 2 - Define User Interface for application
ui <- fluidPage(
   
   # Application title
   titlePanel("Application Title"),
   
   # Sidebar with user input elements
   sidebarLayout(
      sidebarPanel(

            # Input Elements here, e.g.
            
              # - sliders, checkboxes
              # - Radio buttons, text input
              # - etc.
            
      ),
      
      # Show a plot
      mainPanel(
         plotOutput("distPlot")
      )
   )
)

# PART 3 - Define server logic required to run calculations and draw plots
server <- function(input, output) {
   
   output$distPlot <- renderPlot({
      
      # calculations for the plot

      # draw the plot
      
   })
}

# PART 4 - Run the application 
shinyApp(ui = ui, server = server)

The code is divided into 4 parts: 1) load required libraries, 2) design the user interface, 3) run the program logic and 4) run the app. I will discuss each part separately next.

Loading Libraries

In Part 1, I load the required libraries to build the program logic and the web application; afterwards, I load the data into RStudio from the CSV file.

# PART 1 - Load Libraries and Data
library(dplyr)           # For data manipulation
library(ggplot2)         # For drawing plots
library(shiny)           # For running the app

# Read data from CSV file
mydat <- read.csv("re_data.csv", stringsAsFactors = FALSE)

Additional code could be run here, for example initializing data, adding features, doing data manipulation etc. For this simple application it is not required.

User Inputs

In Part 2 I am defining the user interface. It will contain three sliders to select ranges of values, as well as a single option checkbox and a dropdown menu.

user interface

Here is the code to create this user interface:

# PART 2 - Define User Interface
ui <- fluidPage(
   
   # Application title
   titlePanel("Real Estate Demo App"),
   
   # Sidebar with input options
   sidebarLayout(
      sidebarPanel(
            
            # Filter Input for Rental Price Range
            sliderInput("priceInput",                  # Name of input
                        "Price Range",                 # Display Label
                        min = 135000,                  # Lowest Value of Range
                        max = 450000,                  # Highest Value of Range
                        value = c(135000, 450000),     # Pre-selected values
                        pre = "AED ",                  # Unit to display
                        step = 5000),                  # Size per step change
            
            # Filter Input for Area in sqft
            sliderInput("areaInput",                   # Name of input
                        "Area Range",                  # Display Label
                        min = 2000,                    # Lowest Value of Range
                        max = 15000,                   # Highest Value of Range
                        value = c(2000, 15000),        # Pre-selected values
                        step = 100),                   # Size per step change
            
            # Filter Input for Number of Bedrooms
            sliderInput("bedsInput",                   # Name of input
                        "Bedroom Range",               # Display Label
                        min = 0,                       # Lowest Value of Range
                        max = 5,                       # Highest Value of Range
                        value = c(0, 5)),              # Pre-selected values
            
            # Select if number of beds should be color coded
            checkboxInput("bedsColorInput",            # Name of input
                          "Show Bedrooms",             # Display Label
                          value = TRUE),               # Pre-selected value
            
            # Choose Model to fit from Dropdown Menu
            selectInput("model",                       # Name of input
                        "Model Type",                  # Display Label
                        choices = c("None" = "none",   # Available choices in the dropdown
                                    "Linear" = "lm",
                                    "Smooth" = "smooth"))
      ),
      
      # Items to show in the Main Panel
      mainPanel(
            
            # Show Scatterplot
            plotOutput("scatterPlot")
      )
   )
)

It is a function that builds each part of the layout with individual elements inside. It defines the layout with two components: the application title panel, and the sidebar layout. The sidebar layout also contains two components: the sidebar, with the user interface elements (sliders, etc.) and the main panel, where the plot will be displayed.

Each input element is created similarly: it contains a name, a label and parameters to set values like minimum and maximum of a range, pre-selected values, etc. Notice that each input element must be separated from each other by a comma (probably the first thing to check if the app does not run – are all the commas there?).

The main panel has only one element – the plot to display. And there is no code that shows how the plot is created; that is because the program logic for how the plot is created is not part of the user interface design. In this part, only a placeholder is defined with a name for the plot (in this case I simply called it “scatterplot”). The actual plot will be defined in the server part (Part 3).

Outputs and Calculations

In Part 3, the actual program logic which happens in the background is done. User input data is integrated into the program logic. Every time the user makes a change, e.g. he/she selects a new price range of interest, or chooses a new model fit, the server data part will be re-run and update the plot accordingly.

# PART 3 - Define server logic required to draw the plot
server <- function(input, output) {
   
   # Define the Plot UI output
   output$scatterPlot <- renderPlot({
         
         # Define my own variables
         minPrice <- input$priceInput[1]
         maxPrice <- input$priceInput[2]
         
         # Filter data based on user input
         filtered <- mydat %>%
               filter(yearly_price >= input$priceInput[1],
                      yearly_price <= input$priceInput[2],
                      area_sqft >= input$areaInput[1],
                      area_sqft <= input$areaInput[2],
                      num_bedrooms >= input$bedsInput[1],
                      num_bedrooms <= input$bedsInput[2]
                      )
         
         # XY Scatter Plot, X = Area, Y = Price
         ## Color Code the bedroom numbers
         if (input$bedsColorInput == TRUE) {
               g <- ggplot(filtered, aes(x = area_sqft, y = yearly_price, color = num_bedrooms)) +
                     geom_point(size = 5, alpha = 0.5) +
                     theme(legend.position="bottom")
         }
         
         ## without bedroom number color coding
         else {
               g <- ggplot(filtered, aes(x = area_sqft, y = yearly_price)) +
                     geom_point(size = 5, alpha = 0.5)
         }
         
         # Plot design elements: title, scale labels etc.
         g <- g + labs(
                     title = "Real Estate Data",
                     subtitle = paste0("Prices from ", formatC(minPrice, big.mark = ","), 
                                       " to ", formatC(maxPrice, big.mark = ","), " AED"),
                     caption = "Source: various real estate websites"
               ) +
               xlab("Area in sqft") + ylab("Yearly Rent in AED") +
               scale_y_continuous(labels = scales::comma) +
               scale_x_continuous(labels = scales::comma)
         
         # Display Model Fit (Line through data)
         ## Linear Model Fit
         if (input$model == "lm") {
               g <- g + geom_smooth(method = "lm")
         }
         
         ## Smooth Model Fit
         else if (input$model == "smooth") {
               g <- g + geom_smooth(method = "loess")
         }
         
         # Display the Plot
         g
   })
}

First, the output is defined with the renderplot() function; within the function, all the magic happens. I define some variables – this is only to save me lengthy typing of the input variable values. Afterwards I use dplyr to filter the data as per user inputs and assign the new data to a variable called filtered. This variable is used to create the plot. With some if…else… logic I created two different plots depending on whether the number of bedrooms should be color coded or not.

Next is cosmetics – to make the plot look better than the default, I added a title, subtitle and caption (you need at least ggplot version 2.2.0 to make that work). I also labeled the axis properly and I changed the number format of the axis to comma separated – large numbers are easier to read that way.

In a final step I added the model that should be displayed inside the data points. Recall from the dropdown menu input earlier that there were three options: none, Linear or Smooth. By checking the input value of the dropdown menu I add the appropriate model (or no model).

Final Product Code

Finally, all that is left is to run the app with the last line of code:

# PART 4 - Run the application 
shinyApp(ui = ui, server = server)

If you followed along you can now run the app by clicking on the “Run App” Icon. The app will open in a new RStudio window and you can test its functionality within it. Alternatively, you can click on “Open in Browser” and the same app will be displayed in your default web browser.

You can find the full code on Github for download.

Using the App

Now let’s look at the functionality of the web app. When running the app, it should look like that:

app default view

Initial Data Display

On the left side panel are the sliders and other user options, set to their defaults. Usually I set sliders in a way that no filtering occurs – it is best to see the full data at first. The “Show Bedrooms” checkbox is set, which colors the data points based on the number of bedrooms; the darker the color, the lower the number of bedrooms. A legend below the plot shows the scale for the color coding. No model has been fit to the data yet, as “Model Type” is set to “none” by default.

Take some time to play around with the sliders and the other options – you will see that the plot gets updated every time the input changes.

Outliers and Typos

The initial plot shows the same data as looked at previously. On the top right is the possible outlier looked at earlier. To see the rest of the data better, let us filter out this data point. You can do so by either reducing the maximum price, or the maximum area until it is below this point. Afterwards the plot will be redrawn and the rest of the data will be better visible.

outlier filtered

Notice that the slider price range is not equivalent to the plot scale. Although I only reduced the maximum price to 365,000 AED, the y-axis maximum is now at 280,000 AED. Reason is that re-scaling happens automatically based on the available data. The high price data point is now gone and the next highest priced data point is used as reference.

Next I am looking for data points with wrong data values. As real estate data is usually entered manually, typos or mistakes are common. This does not need to be a problem, but by filtering too early you might miss interesting data. While browsing through real estate websites I noticed that sometimes the number of bedrooms was entered wrongly – so if I would run a filter by bedroom number (e.g. only look at 3-4 bedroom villas), I might miss out on perfectly fine offers simply because those where this number was entered incorrectly will not show up in the list anymore.

To test this, I will filter out data points that show zero bedrooms. For better visibility, I also switch off the bedroom color coding. Below are the before and after plots – notice that a couple data points disappeared when filtering out 0-bedroom points; obviously, villas with zero bedrooms do not exist, so this must be a typo during filling the web forms.

bedroom filter before

bedroom filter after

Were these data points important? Maybe not – but most of them are right in the bulk of the data, indicating that they might be possible candidates when looking for a property of this type. They also will weigh in on building the price-area model, which we will look into next.

Fitting a Model

Now I want to understand the relation between price and area, i.e. size of the property. If I see new data in the future (for example while looking through other websites, or while talking to a broker), using my model I can quickly understand if the asking price is in the usual range, or if it is on the cheap or expensive side.

The easiest model is a simple linear regression model – making a “best fit” straight line through the available data points. For this example I chose 2-5 bedrooms as filter and then selected the linear model option. The model fits a straight line through the data points.

linear model fit

It also shows a confidence interval, suggesting a tolerance estimate. The regression model also suggests a trend, i.e. a linear relation between price and size of the property. It might not be the best fit (data points are spread quite wide on each side of the trendline), but it is a very easy to understand model. It is also easy to explain, i.e. it is not a “black-box” type of model (like neural networks and many others).

If the data suggests that the relation between price and property area is not linear, we can use a model that gives closer data points more weight than data points that are further away. In the user interface it is called “Smooth” model; it actually is a local regression model, or LOESS.

local regression model

With this model, there is no straight line but a curve that is smoothed along the local data trend. In this case it looks like an S-shaped curve. While it might provide better forecasts than the linear model, it is not as easy to explain.

Share your Shiny App

There are various ways to share your web application. The easiest is to publish it on Shinyapps.io; but there are others, too. Here is the list:

  • Publish on Shinyapps.io
  • Run your own shiny (web) server and publish it there
  • Share the code on Github for others to run it on their own computer

Publishing on shinyapps.io requires to sign up at their website. There is a limited free plan that is enough to get you started. Running your own shiny server might be the way to go when you want to publish more applications and you have a lot of traffic to your site. Sharing it on Github is convenient, but not everyone might want to install R and RStudio on their computer, i.e. it depends on your target audience if that approach is sufficient.

Conclusion

This is the end of the tutorial; it only covered briefly how to get started with building your own shiny web application. The example app has many things that could be improved, for example:

  • Add a currency converter
  • Use a more useful bedroom color display
  • Display data point information when hovering over a point with the mouse
  • Calculate and display a forecast

There are many well written tutorials about this (see links below), so if you want to dive in deeper I suggest you have a look at them, and keep writing your own, ever improving apps. Happy coding!

Resources & Links

Web App on my own server: Real Estate Demo App
Web App Example on Shinyapps.io: Real Estate Demo App
Source Code on Github: Real Estate Demo App Source Code
Extensive, well written tutorial about writing Shiny Apps: Dean Attali’s Shiny Tutorial
Getting started with shinyapps.io
From the Shiny Website: Teach yourself Shiny
Regression Analysis on Wikipedia

To leave a comment for the author, please follow the link and comment on their blog: R – Steffen Ruefer.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)