Using RecNo() or RowNo() will impart a performance impact on your load script. I discussed these functions in a previous post where I looked at the output of RecNo vs RowNo.
I recently spotted an unexpected slow-down in a load script, which was caused by using one of these functions. In summary:
- Using RowNo() in a simple load script is considerably slower than RecNo()
- If you must use RecNo(), it may be faster to do this in a direct load
- If you must use RowNo(), it may be faster to do this in a resident load
Example script for one of the tests - load data from disk and add the RowNo
I'm using the comments source file from the stackexchange archive for this post. This file is fairly chunky when saved into a QVD (~14GB) and will act as a larger (~70M rows) source data set.
We're using comments data from StackOverflow
I'll load all fields except for the "Text" field as that's not required for this example.
I'm interested in the speed of RecNo and RowNo on a direct load from a file, vs on a resident load from a file. There are six tests:
- Load: Load directly from the QVD (optimised load) - to provide a reference point
- RecNo: Load directly from QVD while creating a new field with RecNo
- RowNo: Load directly from QVD while creating a new field with RowNo
- Load_Resident: Load directly from QVD (optimised), then resident this into a new table*
- RecNo_Resident: Load directly from QVD (optimised), then resident this into a new table, creating a new field with RecNo on the resident load*
- RowNo_Resident: Load directly from QVD (optimised), then resident this into a new table, creating a new field with RowNo on the resident load*
*The resident load tests will exclude the time taken to load the original table, and only measure the duration of the RESIDENT operation.
Each test will be repeated five times to ensure a consistent result. The Qlik Sense version tested is September 2018.
The results demonstrated large differences between RecNo and RowNo in both the straight load and in the following resident load tests.
The results show large variation between RecNo and RowNo
On the direct load from QVDs, the following was found:
- Load: Optimised load without transformation - 56.2s
- RecNo: Unoptimised load, adding RecNo field - 84.6s (+28.4s)
- RowNo: Unoptimised load, adding RowNo field - 441.4s (+385.2s)
RowNo takes a huge 6.5 minutes longer than an optimised load - almost 6 minutes longer than the RecNo test.
On the resident load tests (which exclude the initial optimised load time):
- Load_Resident: Straight resident load - 34.0s
- RecNo_Resident: Resident load adding RecNo field - 66.6s (+32.6s)
- RowNo_Resident: Resident load adding RowNo field - 267.4s (+233.4s)
Here, RecNo remains about the 30s mark, and although RowNo is faster than on the direct load, it still takes almost 4 minutes longer than the straight resident.
If we factor in the 56.2s for the initial load and add the 267.4s for the resident load then this brings us to 323.6s - still considerably faster than the direct load option above. RecNo however, would be faster to do in the direct load (84.6s direct vs 56.2s+66.6s=122.8s using resident loads.
Variation and limitations
Once repeated five times (the script runs a single test five times, then moves onto the next test), there was limited variation in duration.
Despite small variations between iterations, the test results highlight clear differences between the two functions
These tests are limited by the omission of any other functions in the load process, and will be influenced by the data set and infrastructure used. The results will therefore vary in a "real world" complex load script, but they do hold true against my observations of deployments seen in the wild.
A copy of the app used for this test is available here on github.