QVD read/write performance comparison in Qlik Sense Enterprise with QVD and QVF encryption

Following on my previous post, I had a look at the performance impact of enabling QVD and QVF encryption in Qlik Sense.

In this test, I’m using Qlik Sense Enterprise November 2019 release on an Azure B4ms (4 vCPU, 16GB RAM) VM running Windows Server 2019. Qlik Sense encryption settings were left at default.

A sneak peak of the results for the largest data set tested

The questions

I’ll prepare a follow up post running through the questions and findings, this post summarises the test structure and high level findings.

The tests & source data

The data I’m loading is one of the freely available data sets on archive.org from StackExchange (in this case, a 618MB serverfault 7z archive).

Stack Exchange makes a huge amount of anonymized data available via Archive.org

Uncompressed, it’s 3.13GB, or 2.5GB for just the XML files I’m running tests against.

The three test subjects totalled around 2.5GB uncompressed

Each of the tests below was run a minimum of three times, on XML based data sets of three different sizes (PostHistory, Posts and Badges – in order of decreasing size).

The following tests were run:

  1. Load from XML (no transformation)
  2. Store loaded XML data into QVD (no transformation)
  3. Load from QVD using optimised load
  4. Store loaded QVD data into a second QVD (no transformation)
  5. Load from QVD using unoptimised load and perform transformations (using a wide range of functions)
  6. Store transformed QVD data into a third QVD
  7. Load from QVD using unoptimised load and perform transformation/where reduction (only two functions)
  8. Store transformed QVD data into a fourth QVD
  9. Load from QVD using optimised load, then resident to perform matching transformation to #5
  10. Store transformed QVD data into a fifth QVD

The QVF file and load scripts to run these tests are available on GitHub.

Test results

The results (when assessing ONLY PostHistory – the largest input file), with the exception of tests 6, 8 and 10 (all store operations on data originally loaded from a QVD), show that enabling encryption for QVDs increases load time, and enabling both QVD and QVF encryption increases this further.

No surprises there.

Average test duration grouped by test and test mode

I’ll look into this in more depth in a follow up post.

Observation on QVD file size

There was no noticeable increase in QVD file size following encryption – see screenshots of before and after below.

File sizes without QVD encryption (excluding the 0 value for the first file as this was being written while the screenshot was taken)
File sizes with encryption – at most a few KB out

Considerations for next time

  • Instead of using a burstable instance (B4ms), I should have used a general instance such as a DS3 to ensure a baseline level of performance
  • The server size was likely too large for the smallest data set I used, meaning that operations completed too quickly for any variation to be meaningful, while Posts and PostHistory were more suitable
  • This time, I used Azure files for the primary read/write location. While we should assume performance remains consistent over time, testing with a provisioned disk attached to the VM would be a better test to remove any potential variability
  • Services were not restarted between every test, only between test modes (i.e. encrypted, unencrypted) – it would be a better control to begin all tests following a restart of at least the engine service
  • The initial test created the QVD files which were then overwritten by all following tests – ideally these would have been deleted between tests (incidentally, no obvious variation appeared between tests 1 and 2)
  • There was no system monitoring set up – this would provide insights as to CPU and IO utilisation throughout and would be a useful addition to the time statistics

A quick performance comparison with Qlik Sense – AWS EC2 vs Azure Virtual Machines

Previously, I tested the performance of a load script while using RecNo() and RowNo() functions. This conveniently gave me a script which consumes up to 25GB of RAM, along with considerable CPU power.

So, what about testing it on two cloud boxes? I’ve chosen a machine from both AWS and Azure, loaded them with Qlik Sense September 2018 and run the load script.

Total Test Duration by Host

The summary: The AWS box was approx 8% faster than the Azure box.

Continue reading “A quick performance comparison with Qlik Sense – AWS EC2 vs Azure Virtual Machines”