Integrate machine learning and big data into real-time business intelligence with Snowflake and Plotly’s Dash

[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Business intelligence (BI) is an indispensable tool for many, if not most, modern organizations. BI covers an entire gamut of end-to-end activities from data mining to reporting, all carried out with a core goal assisting critical business decision making.

How significant has BI become? One indication of its popularity can be gleaned from this Google Trends chart showing its search popularity over the last five years.

Google Trends Data — BI vs Machine Learning

This chart shows a steady and sizable increase in search volume for BI throughout the last five years. In fact, it has consistently remained above even the volume for machine learning, another critical business capability that often supports BI activities.

Modern BI activities have evolved to almost unrecognizably complex forms even since the 1990s or 2000s, never mind its nascent days of the 1950s and 1960s using mainframe computers.

Quite simply, business Intelligence is here to stay, and it is well and truly intertwined with the domain of big data.

Take a look at customer reaction or competitor activity monitoring for instance.

These days, natural language processing (NLP) tools might be deployed to parse and analyze millions, if not billions, of social media posts across multiple platforms not to mention press releases, websites and online fora. Or, internal systems might be built to search and analyze corpora of internal text data comprising tens of millions of texts in documents, e-mails, internal chat logs, and customer feedback.

Quite simply, business intelligence is here to stay, and it is well and truly intertwined with the domain of big data; a necessary consequence of which has been that machine learning / AI tools are now indelibly linked for their necessity in analyzing the enormous volume of data.

One side effect of growing complexities in BI activities has been increased demands for those building the underlying infrastructures, such as data management platforms. In fact, many organizations these days have eschewed building their own solutions for contracting external service providers such as Snowflake to fill their data warehousing needs.

Snowflake

Snowflake — Overview

Snowflake is one of the leading data warehousing service providers, offering ‘near zero-maintenance’ service, as well as uniquely providing de-coupled, ‘near-instant‘ scalability to their clients. This means that Snowflake’s compute power or storage are independently scalable, allowing the user to scale one or the other up or down for as long (or as short) as needed.

For these reasons and more, Snowflake is a massively popular solution in the world of data management and warehousing.

But, collecting data and having fast access to the data is only one part of the puzzle. To meet BI’s goal of aiding business decision-making, the requisite systems must effectively analyze the latest and greatest datasets, and subsequently deliver its key findings to the relevant stakeholders. In other words, it requires a tight integration between the underlying data, analysis layer, and user interface.

Pairing Plotly’s Dash with Snowflake

Dash was designed with these goals in mind, and that’s why it is a natural partner to a premium data service provider such as Snowflake for delivering not just vanilla, static, BI, but integrated, responsive BI systems incorporating machine learning analysis layers. Dash is a lot more than a simple tool for visualizing existing data, but an integrated user interface layer for machine learning and data science models.

An example of a successful marriage between Dash and Snowflake can be seen in this demo Dash app, designed to search and analyze over half a million user reviews from Amazon.

Screenshot of Dash / Snowflake driven BI app

This app allows the user to perform a search of the underlying dataset, as well as to analyze the text of a review, whether it is from the search results or manually typed in by the user.

When the user updates a filter or performs a search, Dash sends the query through to Snowflake, which returns search results from half a million records in less time than a blink of an eye (within tens of milliseconds). Dash then takes the returned result set to generate a dashboard report with not only macro-level statistics, but also natural language processing analysis outputs that are generated on the fly.

In other words, Dash-powered BI dashboard can incorporate not only live data, but live machine learning analysis layers under the hood.

With traditional systems, database, data analytics, and dashboard outputs must be separately updated by disparate individuals or departments. Combining these components to set up an equivalent system to Dash and Snowflake would require more time and cost, not to mention that it would be slow and prone to errors or inconsistencies as some pipelines are updated faster than others.

Dash-powered BI dashboard can incorporate not only live data, but live machine learning analysis layers under the hood.

Dash can help to integrate these components and automate intermediate tasks, pulling the displayed data from the primary database and running required analysis or ML models in real-time. This ensures that the entire organization is aligned and working with common ground truths from the one dataset.

By connecting Dash with Snowflake, the analytics outputs will be always up to date and in sync with the primary data; there is no need for further intermediate processing, passing data back and forth between departments, and updating the app separately. Take a look here at the app in action:

Dash in action — responsive filtering & NLP outputs (app)

As the animation shows, the Dash app reacts to a user’s inputs by triggering a series of processes, starting from passing a query to Snowflake and processing the returned data set.

The app carries out statistical analysis in the background, updates corresponding graphs, and triggers the NLP analyses for sentiment analysis and named entity recognition (NER).

Dash is, of course, easily customizable. While in the above animation you see multiple outputs being updated simultaneously, as much, or as little, of the app can be made to be triggered by specific inputs.

The next figure shows a user fetching a random review from the filtered results set, or clicking through to one of the named entities from a review to perform a new search. Once again, a new review is populated, automatically triggering Dash to run the NLP engine looks for named entity, before displaying them in the results.

Fetching data & triggering NLP analyses (app)

Beyond simple BI-style controls like sliders, dropdowns and buttons, Dash supports much more advanced interaction possibilities, such as free-form text input for on-the-fly processing with real-time model execution. To demonstrate this, in this app the user can even type in their own review as shown in the animation below. Once the user finishes typing and clicks away from the text box, Dash once again initiates the sentiment analysis and NER analyses, updating the results.

Triggering NLP analyses on user-entered data (app)

Not only is it merely possible to build these repeatable analysis layers with Dash to streamline analyses and reporting, it is actually incredibly easy to do so. The secret is to leverage Dash’s callback functions that wrap a function to inputs and outputs, for example a change to the filter parameters to an output graph.

Writing callback function to update a dashboard element takes just few lines of code; here is an example:

@app.callback(Output('filt-ner-count', 'figure'),
             [Input('filt-params', 'children')])
def update_ner_freq_chart(filter_params):     
     fig = ...
     return fig

This, and a reference to the output element, is all it takes for a Dash function to detect an update to the search parameters and update the relevant chart on the dashboard.

The fact that Dash allows data scientists to code the analysis modules in their preferred language (such as Python, R, or Julia), as well as the front end, is simply gravy. Not a trivial one, mind you.

We’ve seen Dash empower many data science teams to take control of the entire data dashboard, instead of building analysis or machine learning layers and handing the outputs over to separate front end engineers.

By pairing Dash with powerful services such as Snowflake, you and your organization can take advantage of all that bid data offers, while minimizing the headaches, inconsistencies and labor involved in analysis and communication.

If you’ve gotten this far, and haven’t looked at the app in action — what’re you waiting for? Go and take a look.

We are excited to see what you build with these tools and look forward to seeing the amazing creations from our community of incredible, creative Dash users. If you would like to learn more about Dash and its capabilities, check out our weekly live demo!

To leave a comment for the author, please follow the link and comment on their blog: R – Modern Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)