Group by performance in Qlik Sense & QlikView with order by sorting

When asked how to aggregate data in Qlik products in the quickest way, the answer is “it depends”. While the key factor is the uniqueness/ cardinality of the aggregating dimensions, there are other elements at play.

In general, though, the fastest way to aggregate in the data load script (after loading the data into memory) is:

  1. When aggregating by a low cardinality dimension in a small data set, resident load and run a group by immediately (this is also the fewest lines of script)
  2. When aggregating by a higher cardinality dimension, or on one that requires a lot of sorting prior to aggregation, resident load and sort the table by the high cardinality dimension as the first step. Then resident load this table and run your group by as a second step.

The short version: use approach 2 as the default, unless your data is very simple.

For Dimension1 (low cardinality), a direct group by (G) was fastest. For Dimension3 (high cardinality) and Number (low cardinality) an order by, then group by (O&G) was fastest by a large margin
Continue reading “Group by performance in Qlik Sense & QlikView with order by sorting”

A quick performance comparison with Qlik Sense – AWS EC2 vs Azure Virtual Machines

Previously, I tested the performance of a load script while using RecNo() and RowNo() functions. This conveniently gave me a script which consumes up to 25GB of RAM, along with considerable CPU power.

So, what about testing it on two cloud boxes? I’ve chosen a machine from both AWS and Azure, loaded them with Qlik Sense September 2018 and run the load script.

Total Test Duration by Host

The summary: The AWS box was approx 8% faster than the Azure box.

Continue reading “A quick performance comparison with Qlik Sense – AWS EC2 vs Azure Virtual Machines”

Qlik load performance with RecNo() and RowNo()

Using RecNo() or RowNo() will impart a performance impact on your load script. I discussed these functions in a previous post where I looked at the output of RecNo vs RowNo.

I recently spotted an unexpected slow-down in a load script, which was caused by using one of these functions. In summary:
– Using RowNo() in a simple load script is considerably slower than RecNo()
– If you must use RecNo(), it may be faster to do this in a direct load
– If you must use RowNo(), it may be faster to do this in a resident load

Example script for one of the tests – load data from disk and add the RowNo

Continue reading “Qlik load performance with RecNo() and RowNo()”

Benchmarking and testing your PHP script with microtime

When you’re building a new website you often code it ‘on-the-fly’. That’s to say, strapping in new features here and there until it does exactly what you want – but leaving a mess that needs to be optimised.

One of the best ways of testing your site for scalability (other than getting huge traffic straight away) is to test how long your PHP scripts take to parse. We can do this by comparing the time at the start and end of a script to give us the processing time.

Continue reading “Benchmarking and testing your PHP script with microtime”