When asked how to aggregate data in Qlik products in the quickest way, the answer is “it depends”. While the key factor is the uniqueness/ cardinality of the aggregating dimensions, there are other elements at play.
In general, though, the fastest way to aggregate in the data load script (after loading the data into memory) is:
When aggregating by a low cardinality dimension in a small data set, resident load and run a group by immediately (this is also the fewest lines of script)
When aggregating by a higher cardinality dimension, or on one that requires a lot of sorting prior to aggregation, resident load and sort the table by the high cardinality dimension as the first step. Then resident load this table and run your group by as a second step.
The short version: use approach 2 as the default, unless your data is very simple.
I recently spotted an unexpected slow-down in a load script, which was caused by using one of these functions. In summary:
– Using RowNo() in a simple load script is considerably slower than RecNo()
– If you must use RecNo(), it may be faster to do this in a direct load
– If you must use RowNo(), it may be faster to do this in a resident load
When you’re building a new website you often code it ‘on-the-fly’. That’s to say, strapping in new features here and there until it does exactly what you want – but leaving a mess that needs to be optimised.
One of the best ways of testing your site for scalability (other than getting huge traffic straight away) is to test how long your PHP scripts take to parse. We can do this by comparing the time at the start and end of a script to give us the processing time.