Structured data is very familiar to most people, as it’s what is captured by most user-facing business systems. You’ve got columns and rows, and data values are stored against a field name and identifying value of some type, with a clear data type.
Generally, getting access to vehicle data is a pay-for-data experience – but snapshots are released each year to the public.
If you head to the Vehicle Statistics page on gov.uk you get statistics about vehicle licencing, excise duty evasion, vehicle roadworthiness testing and more. You’ll probably want to check out the index as there’s 76 files in the 2016 download alone, at various levels of granularity…
The one I’m going to look at today though, is:
Licensed Cars, Motorcycles, Light Goods Vehicles, Heavy Goods Vehicles, Buses and Other vehicles by make and model, Great Britain, annually from 1994, quarterly from 2008 Quarter 3 ; also United Kingdom from 2014 Quarter 4
AKA table VEH0120. Interestingly, Qlik Sense throws an “Unknown Error” message when trying to load “.ods” files so I converted it to Excel prior to loading.
I recently worked on an app that was loading from several hundred gigabytes of CSVs, and attempting to perform expensive transformations on these files in Qlik Sense. Normally this isn’t a problem, but due to the way the transformation was written, the result was a saturated server…and I found myself reflecting what “Big Data” means to different people (and to myself).
A recruiter’s post on LinkedIn also made me chuckle, as it highlights the disparity between definitions well. From my view of big data, I doubt that someone working in that field is likely to be interested in a role where one of the three core job skills is Excel…
My observation is that there are two camps – one side that classifies using the “V’s”, and another using an altogether simpler definition! Continue reading “What is Big Data? Thinking about a definition suitable for me”