Post 9 - Metabase and Sankey Diagrams

In the past week, I tried Metabase, an open-source data analysis tool, which can be started as a single Docker container and used in the browser. My final goal is to turn it into a setup where I can upload the CSV export of my GnuCash file and have preconfigured dashboards show me data that I am interested in; maybe even a Sankey chart of my cash flow report, so I don’t have to manually go through the numbers. I have talked about GnuCash twice on this blog before. Both of those were steps leading up to this attempt, where I am trying to plug GnuCash into a “real” analytics tool, rather than making do with GnuCash’s in-built reports functionality or the solution with converting GnuCash into Ledger, and using Ledger’s CSV output feature and pasting the output into a spreadsheet. I am certain that there is nothing out there which does this already. (If you know of something, please e-mail me!) I have no idea how long this will take or whether it will even work. As always, it has already proven to be an interesting exploration.

Metabase is an open-source data analytics application. It uses the browser as its “interface” and it is one of the new-age products which attempts to make dashboarding easier for business metrics. A product in this family that I have a passing acquantaince with is Looker, and its cousin, Looker Studio. I built dashboards using Looker Studio which would run a query on BigQuery on anonymized user data that was imported from MySQL, which was the main data store for the backend application. I have been hearing about Metabase lately1 and I wanted to test it out. It has two features that I like a lot:

Both of these are requirements for me because I don’t want to upload a CSV export to online platforms like Looker Studio or BigQuery, and because I don’t want to go through the extra step of exporting GnuCash to CSV first, and then importing that CSV into a live database. The second step is something I can live with, but I will probably stop using any setup which requires too much work. (I posted about a duct-taped setup with several tools to convert online articles into ePub files about two years ago. That setup lasted about 2 months before I gave up on it completely, because there were too many manual steps before I could connect my Kindle and sync the Calibre library with it.)

I tested out Metabase with a sample GnuCash book that I created with completely made up transactions and it works very well. The product is very polished; it can be started easily using a simple Docker Compose file, which is there in the documentation. It requires a Postgres container, which is where all the state is stored. For the kind of analysis that I am looking to do, I am going to use the same Postgres container for the Metabase application DB (i.e. Metabase’ state) and the data that I want to analyze.

Importing CSV files worked after a few changes to it: The CSV file needs to be encoded in UTF-8, which is reasonable. Once uploaded, columns are automatically detected, and a “Model” is created (which is the equivalent of a Postgres table for our rudimentary use-case). It becomes possible to make some simple “Questions”, which are automatically updating calculations based on a Model. As it is possible to replace all the data in a Model with a new CSV file, the Questions will also update and show the result of applying some aggregations on the new data. Neat!

I imported a file with dates in it, but I did not see Metabase treat those dates as a special type (by default). This might be something that I would have to look into, if I start thinking seriously about building a setup. The Metabase documentation does mention that being able to filter by time periods is a feature. I wonder whether the problem was that the dates in my input file were in the YYYY/MM/DD format rather than the ISO YYYY-MM-DD format.

All-in-all, I was very happy with this. I toyed with the tool for about 2 hours, and this was enough for me to get a hang of the basics, and be able to make some easy aggregations. Exploring the documentation though, I hit on gold: Metabase can create Sankey charts from CSV data.


Sankey charts are a great way to display pie-charts, where one can “drill down” several times into each chunk in the pie, without an interactive interface. This is what they look:

img

Source: Data from 2021 – Understanding the current energy situation in Japan (Part 1). Built using SankeyMATIC: Build a Sankey Diagram. You can edit the input data for this diagram here.

These kind of Sankey charts exist for financial data as well. Most commonly, they exist for public companies, such as this one. This simplifies a (possibly) huge spreadsheet into a single diagram which is easy to understand at first glance; no explanation is required. It shows both sides of the “sales” diagram: the products that sales come from, as well as what happens to the proceeds. Sankey diagrams are great!

I have wanted a similar visualization for my (rather simple) GnuCash data as well. GnuCash is a great piece of software; however, its Reports functionality is pretty feature incomplete. The in-built reports are fairly limited in type. Even those that exist don’t have enough flexibility to make a report look the way that one would want it to: some reports have the ability to convert raw numbers into percentages, while others (like the Cash Flow report) don’t. It is very hit-and-miss, and to the extent that I do any “analysis” of my GnuCash data I do it either after converting the file to Ledger, or by manually copying a table from a report into a spreadsheet and working with it there. There is no way to even export reports as CSV files, so that they may be plugged into another data analytics platform. Ultimately, this all adds to the “manual” nature of the whole task; causing me to do it less often than I would actually like to.

I’m hoping that Metabase (or Duck DB, which is one alternative which I have not looked into yet) will help me do analytics directly on the CSV export, or better still, the GnuCash file itself2, and that one (or both of them together) will help me visualize Cash Flow as a Sankey diagram.

  1. Mainly, thanks to [Minusx Metabase AI](https://minusx.ai/).

  2. The GnuCash file is a gzipped XML file. So, gzip -d will give you the plain XML file.