When asked how to aggregate data in Qlik products in the quickest way, the answer is “it depends”. While the key factor is the uniqueness/ cardinality of the aggregating dimensions, there are other elements at play.
In general, though, the fastest way to aggregate in the data load script (after loading the data into memory) is:
When aggregating by a low cardinality dimension in a small data set, resident load and run a group by immediately (this is also the fewest lines of script)
When aggregating by a higher cardinality dimension, or on one that requires a lot of sorting prior to aggregation, resident load and sort the table by the high cardinality dimension as the first step. Then resident load this table and run your group by as a second step.
The short version: use approach 2 as the default, unless your data is very simple.
The main benefits of this script are that it requires a single row addition per log, with no manual increment of the version number, and the version control information can be surfaced by the engine either to the UI or via the APIs.
There’s often a discussion about what each of these autonumber/hash functions does in Qlik. We commonly see these functions used for creating integer key fields, anonymising data (for example, names in demo apps), and maintaining long string fields for later comparison (as the hashes are shorter than the strings they replace).
To do this, I’m using the script below. I’m also keen to show outputs from QlikView vs Qlik Sense, and results of running the same script on another machine.
My observations are the following:
– AutoNumber/AutoNumberHash128/256 – different output per load as the value is typically based on the load order of the source data
– Hash128/160/256 – the same output, across every load. Stays the same between Qlik Sense and QlikView, and also between different machines
I recently spotted an unexpected slow-down in a load script, which was caused by using one of these functions. In summary:
– Using RowNo() in a simple load script is considerably slower than RecNo()
– If you must use RecNo(), it may be faster to do this in a direct load
– If you must use RowNo(), it may be faster to do this in a resident load
In this post I explore the outputs of RecNo() and RowNo() to demonstrate the difference visually.
These two fields are often used interchangeably, but they provide different output. In summary:
– RecNo() is a record number based on source table(s)
– RowNo() is a record number based on the resulting table
As a result, RowNo will always provide a unique integer per row in an output table, while RecNo is only guaranteed to be unique when a single source is loaded (RecNo is based on each single source table, interpreted individually rather than collectively).
In the absence of a source control system like SVN or GIT, a quick and easy way of capturing changes in a app is to update a version control tab before passing the app back to other developers, or onto testers.
This is a very low-tech solution, but the format below works well in shared environments. The first two tabs of your application should be:
– Version (explain the app and list changes you’ve made)
– Config (set the configuration values like paths, dates, etc used in your application’s data load and UI)