The following blog post is based on a classical reporting task most of the people working in BI got frequently: Besides all the nice dashboards you create in Tableau, from time to time people will approach you with a request for a “data extract” – reports that typically looking like cross tables. Ideally, the extract comes as an Excel file with each crosstab in a separate sheet ready to be analyzed with the favorite tool of the requester. How will you do that in a programmatic way with Tableau?
Extracting data out of the Tableau “ecosystem” to process them in another tool is clearly not part of the philosophy of the Tableau guys. They would argue that you should give those people access to Tableau so that they can do their analysis within their tools. Although this is a good point they miss some important things – first Tableau is expensive compared to solutions that come for free or solutions that are already part of the company’s software portfolio (for example Excel with free add-ons like Power Query, Power View & Power Pivot). Given that often there is a no extra or only a small fixed budget, not everybody will get a full copy of Tableau. Second, changing the standard tool for data analysis and visualization (in a broad sense) within an organization needs a lot of budget, time and power. Procedures and templates needs to be changed, people need to be convinced, trained and accompanied. Even with an impressive and sometimes magical tool like Tableau it is not for granted that this project will be a success down to the smallest corner of your company. And that’s why sometimes we need to use Tableau for purposes that are not in its focus – and extracting data out of a workbook into an excel file seems to be such a thing.
Imagine you have a workbook containing of different sheets all in the form of cross tables. You can see a simple example in the screenshot below (and find a link for downloading the workbook at the end of the posting):
If you want to extract all those crosstabs into one Excel file, you have to download them manually (as “Crosstab”) and the move them into one Excel file one-by-one. This is time-consuming & error-prone (e.g. typing mistakes when naming sheets, order of sheets in excel, default size of cells may result in unreadable content) especially if you have to do it multiple times and/or for several reports. Sadly ‘tabcmd’ is not a help here as it exports csv in key-value format instead of maintaining the crosstab layout (see this tableau forum post for what is going on – and this one for voting up an idea convincing Tableau to integrate the crosstab export into tabcmd). That said, I will show a way using free third-party tools to export all three views into on Excel (plus: basic formatting of values is preserved).
I got the basic idea for how to extract crosstabs from a forum posting given by Christopher Cannata “Hello everyone, I was tasked with a project to allow these crosstab CSV files to be generated automatically each morning for our company. Unfortunately tabcmd was not capable of allowing us to automate the process but I was able to develop a test suite in Selenium IDE call it from a batch script on a daily basis.”. Using a software testing framework for web applications like Selenium to automate the csv extraction process is a hell of a good idea for the first part of our question.
And it turned out that PowerShell is a good approach for the second part. PowerShell is simple and yet powerful and avoids a lot of problems with encoding, formats and data types. I ran into these problems when I tried reading the raw csv and exporting it as excel using several custom libraries for Python.
But as I use Python for most tasks regarding scripting and automation around Tableau, I took it again in this example to orchestrate the whole workflow.
Before we go deeper into the different steps, the picture below shows the whole story:
Extracting Crosstabs from Tableau Server using Selenium
Selenium is a software testing framework for web applications. As such, it can record user actions and afterwards “replay” those activities again and again on new versions of the app, testing if those action still lead to the expected result. We just need the capabilities to record and automatically run those macros (called ‘test cases’ – with a set of test cases called ‘test suite’). Here, a test case mimics user actions to download a single csv file as crosstab with all the necessary steps like – login into Tableau Server, select and view the right workbook and downloading the csv file.
The installation is easy and requires two tools – the Selenium IDE for recording test cases (installed as a plugin for Firefox) and the Selenium server for running test suites. The later one is just a jar file, which is started from command line. Now start your Firefox browser and you should see a new icon for starting the developer GUI.
The small red icon on the right side shows that Selenium is now recording every step you are doing in Firefox. Type in your Tableau Server URL, login into Tableau Server, open your report and download the crosstab as csv. As you interact with Tableau Server you will see that every step show up in the central window of your Selenium IDE. After you finished downloading the first csv, log out and stop the recoding process. Then save you test case. You can now test if everything works fine by starting the automatic “replay”. If not, change commands or add new ones.
It is difficult to give a general template about how to download a crosstab from Tableau Server because the whole workflow and order of commands depends heavily on your profile details like start page, page layout, language and so on. Nevertheless I attached an example test case at the end of the post that should give some guidance on how to fine-tune a test case and provide some general tips:
- It helps, if you change the configurations of your Firefox profile or use dedicated one for this task – see here for a list of things that need to be adjusted.
- Set a dedicated download folder in your Firefox profile – I found no way to tell Selenium how to go through the download manager dialog.
- Selenium will not recognize that you sometimes wait for Tableau to load a workbook. You have to insert separate “pause” commands later within your test case to give Tableau time for processing before Selenium will take the next step. The length of the pause needs to be set by you depending on how long Tableau needs for loading plus some buffer.
- Create a separate test case for every view in Tableau that should be downloaded – that means your last step within a test case should be to sign out from Tableau Server.
- Test cases and test suites are stored as plain html files and can therefore easily edited using a text editor.
After you have confirmed that everything works as expected, save all test cases as a test suite.
Because we don’t want to start the test suite manually each time we need to download data from Tableau, the next step is to automate the whole process. For this we can use the Selenium server that comes as a standalone jar file. The server is started from command line calling ‘java –jar
- -htmlSuite “*firefox”
- -t trustAllSSLCertificates (to avoid certificate checking errors)
In my case the command looks as follows:
A final test by running the command should then confirm that there are nothing but a couple of csv files in the download directory.
Using PowerShell to combine several csv files into one Excel file
Before explaining the PowerShell script, I need to say that my first approach was to use one of the existing Python libraries (for example openpyxl, xlrd and XlsxWrite – just to name a few). But with none of them I got a satisfying result within a reasonable time. Biggest problems were the csv file encoding and that the downloaded csv files contain formatted numbers which seem to be easy for Excel, but difficult to read in correctly with Python.
So after spending about an hour with each one of the libraries, I ended up with searching for an alternative approach using windows build-in capabilities. If just a double click on one of those csv files opens up Excel and gives me a nicely formatted sheet, there should be a way to automate this – and there is: use MS PowerShell. The script is straightforward – create a “hidden” workbook and add several sheets. Then go to the first sheet, open the first csv within a temporary Excel file, select all within the temporary file and copy-paste it into the sheet. Then go to the next sheet and repeat the process with the second csv. Finally save it as xlsx file (the new Excel workbook file format). Here is the code that will do that:
Workflow control with Python
Finally I wrote a small python script to have a central place for setting parameter values and starting the Selenium server as well as the PowerShell script. It also cleans up the download directory by dropping all existing files (therefore make sure that there is no other content within the download directory of the Firefox profile used for extracting data). The latter is important because otherwise PowerShell will use your old files instead of the new ones. The script looks as follows:
One can easily add additional features to the python script like renaming old files, using dynamically created file names, sending the final Excel as email and so on.
That’s all – the python script can be started with a simple double click. After waiting for it to finish, the excel extract should be ready for further use.
Whether Tableau likes it or not – extracting crosstab data out of the Tableau ecosystem is something that may be necessary if you run Tableau inside a BI infrastructure. If this only needs to be done once, you can use the Tableau Server front-end. But you will need a more efficient way if it becomes one of your regular tasks. This post presented a method that automates the whole process and creates well formatted Excel files out of multiple Tableau workbook crosstabs. If you setup the workflow once, it is easy to adapt it for extracting data from another workbook using the old scripts as templates.
To give you an example, I uploaded a zipped folder here. It contains a Tableau workbook with three crosstabs as views that can be used as example report, a test suite for this Tableau workbook, and the PowerShell script as well as the Python script. Please make sure to set your Tableau Server URL, username and password and all other parameters correctly. Pre-test the Selenium Script within the Selenium IDE Firefox plugin. If it doesn’t work check that Selenium is doing the right steps in the correct order and change the test cases accordingly.
As always any feedback on this content is welcome.