Handling Ragged and Unbalanced Data Hierarchies
Business Intelligence tools are designed to ingest data that is organized in standard hierarchies: hierarchies that have a field on each level of the hierarchy. Complex data hierarchies, such as ragged and unbalanced hierarchies, can be difficult to model and query. If your data is ragged and unbalanced, you will have to get it into a form that BI tools can ingest before you can use it in a dashboard.
Depending on the structure of the data, Sisense offers several solutions for resolving complex hierarchies. These solutions range from simple table modelling using a dashboard hierarchy to transforming the data with SQL. This page focuses on several specific types of complex hierarchies. Use the approach delineated below to resolve complex hierarchies.
Types of Data Hierarchies
A balanced hierarchy has an equal number of levels in each branch and is easiest to handle. The example shown below is a natural hierarchy of years > quarters > months.
An unbalanced hierarchy has at least one branch which does not reach down to the lowest level. The below example of a company org chart is typical of an unbalanced hierarchy. Some divisions have more levels than others.
A ragged hierarchy is characterized by having entities whose parents 'skip a level'. The below screenshot exemplifies this. The city Washington DC and the Greek cities of Athens and Thessaloniki roll directly to the Country instead of to a State.
Hierarchies can also be both ragged and unbalanced, adding an additional level of complexity.
The Scenario
In the below scenario, data hierarchy is represented by the following diagram. Note that it is both unbalanced (some branches do not reach down to the lowest level) and ragged (some branches skip a level).
The initial hierarchy is defined recursively, as shown below:
The Solution
-
To address this scenario, in Sisense, create placeholder values where a level is skipped in a ragged hierarchy.
To start with, you create a new table to identify skipped entities. A new table 'Hierarchy1' has been created for this purpose using the SQL below. For simplicity, numbers are hard-coded to correspond to the levels, and only single levels are skipped.
This SQL can be easily modified for other scenarios using the same core logic.
- Consolidate all the branching entities that roll to the same placeholder parent to the same entity.
- Union the placeholder values back into the main hierarchy table, and join and replace the appropriate parent IDs.
This can be done in either order, but the screenshot below shows the union, then the join.
The non-ragged hierarchy now logically looks as below. It is ready now to resolve using a standard flattening approach. The Entity Name field is used in place of IDs for simplicity.
The SQL to do so is as follows:
Note:
The N/As in the table account for the unbalanced branches where the placeholder values account for the ragged hierarchy.
-
Now it is a simple matter of mapping the new resolved hierarchy table back into the data model. If the facts all roll to the lowest level of the hierarchy, the above logic and resulting table will work as-is. If the facts need to be attributed across different levels of the hierarchy, you would split the structure into different tables. The resolved parent IDs can be used to link across the hierarchy (see example below).
Roll-ups should now work seamlessly when using these new dimensions, and this should be easily consumed on a dashboard.