Many companies are still largely managed with spreadsheets. While working with data we have seen the full spectrum of use cases of Excel, including some truly scary ones. There is, however, a clear trend of companies switching to alternatives after losing money and time due to errors and chaos. In this post, I am going to explain my take on when it is proper to use spreadsheets and when it is desirable to switch to a more advanced technology.
We are going to take a closer look at some of the root causes of the spectacular success of spreadsheets. After that, we’ll verify which of these are still true and which have changed with shifting technology.
In particular, I want to take a closer look at Shiny, which is a quite unique piece of technology. Shiny is a web framework created for R which is very quick to learn. It is not unusual to see people familiar with R build something on their first day of playing around with Shiny. Learning R is significantly harder than learning Excel. Possible backgrounds of R users include life science, economy, psychology, sociology, mathematics, physics and many others. This talk from UseR Conference by David Robinson can give more insights about many industries using R and its fast growth within each. Still, it is orders of magnitude easier than learning any other programming language. This is clearly visible from the number of people with different backgrounds using R.
Following that trend, Dash was introduced for Python with a very similar concept in mind. It allows its users to build analytical web applications from prebuilt components, solely in Python. Dash is built on field-proven technologies like Flask and React. It is probably a little more difficult to learn Python and Dash from scratch when compared to R and Shiny, but this might be the right choice for someone familiar with Python.
Reasons for success
Spreadsheets are an excellent tool that proved their usefulness for decades in different contexts. The main advantages are flexibility paired with a simple and powerful reactive computational model. A reactive model means that everything is recalculated in real time and when you add new formulas you don’t need to think about the order of computations as in most programming languages.
Spreadsheet formulas are easy to create with only several more complex concepts like VLOOKUP or pivot tables. Part of the success is also the fact that What You See Is What You Get, which means a reduced level of abstraction.
And the story could end here. But the reality is that there are tons of problems with spreadsheets.
Countless articles have been written about these problems. If you look for a good read just read about CFOs discussing alternatives or story about JPMC losing more than $6 billion. More materials can be found online (try this or even this).
The recurring top problems are always errors and chaos.
What has changed while Excel remained essentially the same?
User interface got better, way better. Since VisiCalc, we no longer have small, monochromatic displays but powerful devices with graphical frameworks. With accelerating development, people have gotten used to better-looking designs from Facebook, Google and many SaaS services.
The gap between the 2-dimensional grid and custom made applications gets larger and larger. A lot of business folks expect more than cells.
AJAX and WebSockets are technologies that allow for background communication from the browser to the server, making todays website interactive. AJAX allows for one way communication – browser sending request to server, while WebSocket works both ways but scales with more difficulty. The way we use the web has changed as well. Nowadays it is a real-time, collaborative environment. Sometimes it is hard to imagine what the web had looked like without technologies like AJAX and WebSockets. These technologies sparked the change to real-time experience, but today, we take them for granted.
The size of data has increased. With digital revolution moving forward we grew our datasets at exponential speed. In my career, I’ve seen spreadsheets with tons of data and formulas and hundreds of sheets. The question is if spreadsheets were intended to be used this way or maybe, just maybe, this is a job for a database?
VBA or Visual Basic for Applications is a programming language that allows for control of many aspects of the host application, including manipulating user interface features, such as menus and toolbars, and working with custom user forms or dialog boxes. The language itself is narrow and limited. Spreadsheets are bigger and much more complex. Today, they are used to build applications, with sophisticated logic or even VBA scripts communicating with external services. Applications = Software. Ask any engineer how much effort is needed to test software. The amount of work needed to take care of a large spreadsheet and test it is an order of magnitude larger than that of an application of similar size – so nobody does it. It is a non-obvious example of technical debt because there is no source code.
There are also regulations changing what can be done with spreadsheets, especially in finance and insurance context. It is very hard to document how logic in the spreadsheet is exactly implemented and even harder to prove its correctness. Keeping track of changes, requests and approvals in one or many distributed files can also be a hassle. All these things have already been solved for traditional software decades ago.
What hasn’t changed is the need to use reliable software, as obvious as this sounds. Most organizations use data to make data driven decisions and get an edge on their competitors. All mistakes and errors have their costs. Spreadsheet errors are devious as most of them go unseen. We believe they are not there, but in truth, they are.
When are spreadsheets still a good fit?
We can take a look at the list above and look for counterexamples. The following use cases come to mind:
- Correctness is not mission critical.
- Your data is small and formulas are not extremely complicated.
- You don’t need real-time updates or collaboration, or you can use online spreadsheets (Office 365 or Google Docs) and you don’t need a scalable solution. Microsoft suggests less than 30 concurrent users and Google also has its official limit of 100. You should probably have less if you want everything to run smoothly.
- Optimizing user experience and the interface is not important.
- Solution is temporary, and the spreadsheet will not need maintenance.
Hint: Beware as many of the temporal solutions evolve to something used more often.
In practice that means that it is a good idea to stay with spreadsheets, when:
- The thing is not going to live within the company or its only role is being a scratchpad. Excel reports can be instantly generated by R or Python (and others) to allow business people to play with data.
- When you are at the very early stage it might be good to crystalize your vision using spreadsheets and then consider if something more stable is needed.
- When you don’t have resources for something better.
- When it’s still cheaper than building software. Possibly when you need lots of manual work. E.g. some data gathering tasks, done by entry level workers, tedious and time-consuming.
An easy fix
An easy fix would be to just buy software. And this might be the right solution for standard, well defined, universal scenarios like CRM or Business Intelligence. Of course, these have some other consequences like sacrificing custom functionalities and ability to make quick adjustments or experiments.
For selected groups of problems buying Salesforce or Anaplan might be a solution. But still, most companies are going to keep some custom spreadsheets they use as tools. Tableau or Power BI won’t do the trick as one of the most crucial functionalities of these spreadsheet tools is adding and modifying the data. Another problem with BI tools is that they mean similar chaos as spreadsheets for larger deployments.
The sweet spot
We need something allowing for rapid development, validation of correctness, extendibility and adjustability while keeping the same reactive model as Excel.
As for now it looks that Shiny and Dash are the most promising tool to fill this gap. They allow for quick iteration cycles and are much less heavy than other programming frameworks. Both require a person developing the tool, but so does Business Intelligence software.
The most important is that Shiny and Dash share with spreadsheets flexibility and powerful reactive computational model. They are slightly more complex but in exchange, you get the solution for most critical problems with Excel.
Source code, which Shiny and Dash apps have, is the only known way to avoid ambiguity and chaos in the long run. Writing code allows for encapsulating of abstraction and dividing working parts into modules.
Tests are the only effective way to minimize errors. At Appsilon we proved that both logic and the interface of Shiny and Dash applications can be tested automatically, catching mission critical errors faster and increasing iterations speed. Those tests can be easily automated and plugged in into continuous integration. As a result bugs are caught sooner, they don’t go into production and they don’t face final users. Fixing those types of errors is several orders of magnitude faster and cheaper.
Shiny and Dash are web frameworks. And the web is where you want to be. This gives you all the innovations at hand and independence from huge vendors. The web also means scale – while scaling is not one of the Shiny strengths, we have managed to scale applications to hundreds of users for our clients. Scaling Dash seems even more promising as we’ve learned from our deployments.
User experience is no less important. Both frameworks allow you to build beautiful business applications. There is no comparison between spreadsheets and these. Thanks to this, you get easier and faster adoption within your company. Don’t take my word for granted and take a look at some shiny.semantic demos.
We already have ongoing Dash projects as well. Get in touch if you want to assess your current technology practices and make your company even better! Shiny and Dash are not tools for everyone, so not everyone can be a creator, but this group is much, much larger than you think. This person doesn’t even have to be an Engineer – it can be an ambitious and curious Excel user or someone tired of the limitations of traditional Business Intelligence. You need a team of people with at least one Shiny or Dash developer. Needless to say, once the tool is created, anyone can use it and get insights.
Even though it seems that Shiny is a sweet spot now, we are excited to see upcoming innovations in this space. Delivery time for Shiny is a huge advantage, but performance and scale are important issues for corporate solutions. It takes some practice to write Shiny apps that scale well, and even for well written applications, their scalability has limits.
Dash from Plotly is already entering this space and they work on their
dasher package for R as well! We are excited to see competition in these space as this is going to be beneficial for all of us. We expect to see improvements in Shiny both in terms of efficiency and capabilities in the near future. We hope for some movement in the Julia community as well. Escher for Julia proved that reactive model can be simply implemented in Julia – sadly the project seems to be no longer maintained.
At Appsilon we are committed to deliver top-notch decision support systems for our clients – follow us on social media to stay updated about innovations in that space.
Follow Appsilon Data Science
- Follow us on Twitter!
- Check us out on Facebook page!
- Follow us on LinkedIn!
- Star our GitHub packages, including
- Sign up for our newsletter and get free e-book!
Read the original post at Appsilon Data Science Blog.