Iris Data Set Visualization Web App in < 100 LOC

August 7, 2010
By

(This article was first published on R-Chart, and kindly contributed to R-bloggers)




The iris data set pops up pretty regularly in statistical literature.  It consists of 50 records from three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor).   I came across it recently while reading Introduction to Data Mining.   It comes up in several places in the book to demonstrate techniques for visualization and classification.  There has been a number of articles, posts and videos about R and the web in recent times.  This post presents a way of creating some plots for the data set using R and a Ruby Web Application.

R and the Web
There are a number of situations where it would make sense to expose a data set and wrap a certain amount of R functionality within a web application.  Non-R users might need access to the data.  You might want to provide a presentation of findings available through the web.  You might even want to collaborate with other R developers by posting an HTML table that they can read in using XML.  In time, I expect that some standard web application frameworks will emerge to fill in the gap.  And so the current application is some “thinking out loud” on my part in that direction.

Prerequisites
In order to run this application on your machine, R and Ruby must be installed and functional.   The R packages ggplot2, R2HTML, RServe are used as well as the iris data set.  This web app was written and tested in Windows, but should run in *nix with small modifications.

Ruby Configuration
Install the package to allow communication with RServe

    gem install rserve-client

On *nix systems folks often sudo to install.

If for some reason you do not go the normal route of installing a gem (e.g. you downloaded from github at http://github.com/clbustos/Rserve-Ruby-client), make sure that your ruby $LOAD_PATH has the library available when you run the program.

You can even do this in line in the ruby program by including a line like the following at the beginning of the program:

$LOAD_PATH<<'C:clbustos-Rserve-Ruby-client-v0.2.4-2-g47b0da7lib'

The application itself is available on github.
Run the Web Application

Start Rserve.
  C:Program FilesRR-2.10.1libraryRserve> Rserve

You should see output like this if it started successfully.
 Rserve: Ok, ready to answer queries.

Start the Ruby web application - specify a port if you like with the -p option.
 iris_data_set_webapp.rb -p 4445

With these steps complete, you should be able to hit the application at http://localhost:4445.  Three links are available.  The r_version link simply demonstrates that Ruby (through the sinatra framework and Rserve client) can communicate with R.  Clicking this link causes the version of R to display in the browser.

The second link is to the iris data itself.  This page displays a formatted HTML table rendered using the R2HTML package.  Admittedly, this is a bit of a confusion of concerns (view information be provided by R) but it provided a convenient mechanism to convert a data frame to an HTML table.

The third link allows you to modify the aesthetics of the plot.  Specifically, the x, y and color can be set to any of the available variables.  The result is a "grammatically correct" chart.

Code Walkthrough and Commentary
The package declarations could be written out
   require 'rubygems'
   require 'sinatra'
    ...

Instead all of the packages are included in an array (surrounded by brackets).  Then each require directive is issued as we iterate through each element in the array..

 ['rubygems', 'sinatra', 'rserve','fileutils','haml'].each{|r|require r}

The packages being used are

rubygems - the ruby packaging system itself
sinatra - a minimal web app DSL
rserve - to integrate with R
fileutils  - some convenience methods for file system access
haml - well, this one requires some explanation...

HAML is one of the many Ruby mark up/templating languages that is in vogue today among Rubyists.  It seems to save a few keystrokes from writing straight HTML, but it slows me down since I think in HTML and end up working backwards to writing the HAML.  I kind of like the pythonesque interpretation of indentation being meaningful and that the HTML looks pretty.

Anyway, it is used here but I am still on the fence about it.

To experiment with haml using irb, just require haml, create an engine and output the results to HTML.

require 'haml'
Haml::Engine.new('%h3 hello world').to_html

Back to the web app.  Create a global connection to Rserve.

include Rserve
$c = Connection.new

 The following lines kinda-sorta reload Sinatra most if the time which allows  you to change code and view the changes without starting and stopping the server.  Only it does not always work  :) …but it works enough for me that I included it and just restart if things are not updating the way I expect.

configure do
  Sinatra::Application.reset!
  use Rack::Reloader
end

This looks at a line that comes from the web app source file itself.  Yep, kind of wild (echoes of camping).  If the line matches the regexp  and is one of the get functions below (other than the index itself), we pull out the url path and slap it in an HTML anchor.  This is a convenient way to have a home screen during development where each get URL can be invoked.

def anchor(line)

  if line=~/get '/([a-z|A-Z])/
     l=line.split[1].gsub("'",'');
     haml "%a{:href => '#{l}'}> #{l} n%br"    
  end
  
end

Return a string of html with an heading that says “Links” and a link to each “get” URL available in the web application.  The list of links is generated by reading the contents of this file, and creating a hyperlink (if possible) using the “anchor” method above.

get '/' do
  html=haml '%h3 Links'
  File.open(__FILE__).readlines.each{|l|html+=anchor(l).to_s}
  html
end

This is a simple example of how integration with R works.   The connection to RServe named $c is sent a string of R code to evaluate.  We expect a single result that we interpret as a string.

get '/r_version' do
  $c.eval("R.version.string").as_string  
end


This method creates an R script, evaluates it and returns a link to an image that will appear in the public directory that is in the same directory with this file.  The < < SCRIPT syntax is sometimes called a heredoc.  It is just a convenient way to create multi line strings - you could use double quotes in this context as well.  The variables x, y and color that are passed in are substituted  where you see #{x}, #{y}, #{color}.



def irisplot(x,y,color)
    script= < < SCRIPT
library(ggplot2)
  ggplot(
          data=iris, 
           aes(x=#{x}, y=#{y}, color=#{color})
         ) + geom_point()
   ggsave('#{FileUtils.pwd}/public/irisplot.png')
SCRIPT


  $c.eval(script)

  " < img src='irisplot.png' width='600', height='600' > "
  

end

Note: The spaces between the less than signs for the HEREDOC are artificial - they were required because blogger was not correctly interpreting them together.  Similar problem with the image tag - I just added spaces to prevent rendering issues.

This example demonstrates how to open an independent R script and run it.  See the iris.R script itself for more information about what is going on.  In general, the R2HTML package is being used to create a file whos handle is returned.  We then read the contents of the file in and these are returned as HTML.  The contents of the file are an HTML table that represents the iris data frame.

get '/iris_data' do  
  url=$c.eval(File.open('iris.R').readlines.join("n")).as_string
  File.open(url).readlines()
end

This page can be scraped using R and two lines of code.  You could read this data into R running on another computer on the network:

library(XML)
df=readHTMLTable('http://nameofmachine:4445/iris')

The following creates an iris plot using the parameters passed in.

post '/plot' do
  irisplot(params['x'],params['y'],params['color'])
end

Finally, returns the form that allows you to input which fields are used to create a plot for the iris data set.

get '/iris_plot_input' do


  # Retrieve the iris data set column names into a ruby class variable.
  # These will be used to populate dropdowns.
  @colnames=$c.eval('data(iris);colnames(iris);').as_strings
  
  # WARNING Hard Coded Defaults below.  I used these so that 
  # we would have reasonable values by default.


  # Put all of the HAML markup in a string
html=<
%form{ :action => "/plot", :method => "post"}
  %table
    %tr
      %td
        %label{:for => "name"} x:
      %td  
        %select{:name=>'x'}  
          = @colnames.each do |col|
            %option{:value=> col, :selected => (col == 'Sepal.Length')} 
              =col
    %tr
      %td    
        %label{:for => "name"} y:    
      %td  
        %select{:name=>'y'}  
          = @colnames.each do |col|
            %option{:value=> col, :selected => (col == 'Sepal.Width')} 
              =col
    %tr
      %td    
        %label{:for => "name"} color:
      %td  
        %select{:name=>'color'}  
          = @colnames.each do |col|
            %option{:value=> col, :selected => (col == 'Species')} 
              =col
    %tr
      %td
        %input{:type => "submit", :value => "Create Plot"}
HAML


# Render it with the HAML engine
haml html


end

Examples produced by the app:



This application is amazing in that it simply pulls together some of the best programming resources around.  In well under 100 lines of code it is simple and easy to maintain.  With that in mind, I have been thinking about other directions that could be used to generalize this approach.  One is simply to bundle Sinatra with R (perhaps using JRuby).  Sinatra web apps could then be dynamically based upon data sets (kind of like the current app) or around R functions (kind of like the fgui package).  It seems like Hadley Wickham had a similar idea first and has a related project on Github. His approach is to port Sinatra to R so that web apps could be developed in R without the use of another language such as Ruby.


To leave a comment for the author, please follow the link and comment on his blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.