Quo Vadis QVD? No need for Qlik Sense data files in 2020?

Quo Vadis QVD? No need for Qlik Sense data files in 2020?

I’ve recently gotten a message on LinkedIn congratulating me for 5 years in my current job. This led me to reflect on developments I've witnessed and I have some thoughts I want to share with you. In this first article, I wanted to reflect on QVD files and what they’re worth in the current Qlik and data world. I will describe my perception of current developments in data storage and how and why QVD files could be redundant - and why they're probably not. I'd be happy to hear your thoughts in the comments! Full disclaimer: It might get technical.

For the uninitiated (read: non-Qlik developers)

In case you’re new or foreign to the Qlik ecosystem, here’s the primer: Qlik Sense is a Business Intelligence Tool that evolved from QlikView. Both systems share an engine (the name is QIX engine, a.k.a. The Beast) that is built on in-memory technology, that allows rapid queries and - as a convincing argument to heads of IT and infrastructure - therefore doesn’t require an underlying database or data warehouse that contains transformed, aggregated or consolidated data. Qlik Sense (and QlikView which I will omit from now on, but most considerations will go for both) comes with a built-in Extract, Transform and Load (ETL) processor - mostly referred to as Qlik Script. Combined with a large number of connectors it’s possible to mount Qlik Sense directly on many source systems directly and mix and mangle the data to an architect’s heart’s content. Qlik developers then have the option to store the memory image of a table as a binary file on the system storage. Since the data structure - in layman’s terms - in this binary file resembles the data structure in memory, storage and loading the data back into Qlik are lightning-fast and - not to forget - the files are highly compressed. Depending on cardinality, factors reach from 10x to 100x compression. If you want to know more about this, or I forgot something, please leave a comment.

What I've used QVD files for

When I started working with Qlik one of the first things I have learnt was how to layer my data pipeline within Qlik: Source Systems were to be replicated as QVD files to create a copy of the source tables to work and load from. This raw data could then, still using Qlik Sense, be transformed to create a logical layer which in some cases already resembles the final data model. Back in the day, these aggregated files were then mixed in a QlikView document to compose the data model (or data mart). Finally, these data models were then loaded into one or more front end applications using the data from these data models to facilitate analytics. In our practice, we do this still. Except for the data model layer, which has become somewhat awkward in Qlik Sense - there are approaches to still do this or something equally as efficient but I don’t want to get into too much detail here.

No alt text provided for this image

Why did we pretty much build data warehouses even though there were data warehouses readily available? Speed, cost and convenience! A data warehousing project used to be long (months to years with creating pipelines, defining storage, modelling data and running batch jobs), expensive (6 to 7 digit projects and insane maintenance cost) and inflexible. No parentheses for the last point, because I want to elaborate a bit: Getting a new dimension, measure or field into a cube in e.g. SAP Business Warehouse is a multi-stage effort that starts in the ECC and needs to be manually modelled all the way into the multiprovider or BEx query it is supposed to be used in. This is far from a business user skillset (and probably one of the reasons SAP is a DAX company...). So when I say convenience I mean: flexibility, ease of access and time to deploy. 

Sounds like a pretty good deal Boris, why am I reading this?

Good question and Qlik Sense sales would be easier if I weren’t to ask myself the same thing. All the praise for QVDs is offset by some relevant downsides: 

  1. Vendor Lock-In, this may sound weird, but you cannot easily use QVD data in external applications. That was OK when the main purpose of data was using it in Qlik analytics, but times have changed. With AI, machine learning and other data science projects getting lined up the question for central data storage becomes relevant again. When you use the layered model above, you do not only lock in raw data, but also business knowledge and logic by modelling the data in Qlik Sense, this can lead to redundancy should this be required outside of the Qlik ecosystem as well.
  2. Storage on hard drives is a data integrity nightmare. I’m not talking about access, even though that is a problem as well. I’m talking about file corruption, file backup and restore and all the other reasons companies moved from MS Access to MS SQL Server.

These are my two main gripes with an otherwise pretty solid system. The first point has certainly become more relevant these days, while the second one has always been relevant but was trumped (Not my favourite word in 2020) by the benefits outlined above. 

What has also changed is the availability of manageable, affordable and easy-to-deploy cloud storage solutions. 

Most of you will have come across Snowflake, Microsoft Azure Synapse, Google BigQuery, Amazon Redshift, and all the other names that are selling convenience in data warehousing at an unprecedented level.

Your data in the cloud

No alt text provided for this image

At an entry point of $23 per months for a terabyte of storage (e.g. Snowflake as of writing of this article), we cannot really speak about expensive data storage anymore. Where the cost comes in is at the compute layer. Any database operation that is run as a query SELECT (data retrieval), DELETE (well… delete), but also UPDATE (I’m not doing this) or INSERT (...) will be charged by resource consumption for the operation in a serverless architecture or is run by a dedicated data warehouse virtual machine on the cloud. The available resources (read query performance) are highly elastic so you can tailor request times to the business requirements - at a cost. I don’t want to get into too much detail about cloud data warehouses and how they work, for further reading refer to an article by e.g. Julian Schulte, he posted a free version here.

These data warehouses come with substantial connectivity options to a multitude of data sources. Usability is well documented, user friendly - still, not low tech but also not shell prompt, and they come with built-in features for security and data governance. User access to data is very straight forward, well organised and intuitive. To name just a few benefits. They are - most importantly - fully agnostic to how and where you are going to use your data. 

This leads to me thinking about if data staging as we (in the Qlik world) know it, is still the best way to go? While we mostly told our customers to get rid of their data warehouses in the past, because you didn’t need them for analytics anymore, I would be hesitant to make such a suggestion today with so many business and use-cases for data being now readily available. 

So, no more QVD files? Ever?

No alt text provided for this image

Don't be sad, but I think Qlik developers will have to take alternate use-case data consumption and usage into consideration. When designing data pipelines for analytics, it is important to ask oneself the question of whether business logic is only required in Qlik Sense or if it could possibly be required elsewhere. Every time the answer to this question is “yes”, modelling the use case on the data warehouse is probably the better choice - Qlik Script is still very powerful, sometimes an immensely quicker time to value can be achieved by starting in Qlik and transitioning when necessary, but this is also a decision analysts and architects will have to make. 

What about the data in general? I think, to be honest, having data in source systems and QVDs only is a risk when it comes to deploying future use cases and serving data to business users and data science teams in the organisation. This risk is compounded if the data in the source systems is subject to change or is regularly discarded - QVDs can buffer this but see above.

I’ve mentioned above that every procedure in a modern cloud data warehouse will cost you money, so there is still life in QVDs yet. Especially during development QVD files can save you a ton of requests to e.g. Azure Synapse. Also, data that isn’t highly volatile can be “cached” as QVD to save money. Maybe very complex calculations that the QIX engine can more easily (and thus more cost-efficiently) handle…

What about other options?

Cloud data warehouses certainly are not the only way data could be stored these days. Data Lakes with potent data lake or query engines, like Dremio, will allow an even more price efficient way to store data and make it available to the business.

I'm not talking about on-premises data warehouses because they seldom profit from the benefits coming from low maintenance, elasticity and flexibility although technology has evolved in this sector as well.

Where does this leave us?

I think it's giving us - as Qlik developers - more things to consider when deploying our architecture. We will have to align the way we model our pipelines with our customer's data strategy. Cloud data warehousing allows even smaller companies to access technology, that was formerly exclusive to ones, who could afford high maintenance contracts and had engineers to take care of data processing and provisioning.

I think the standard (if there ever was such a thing) data processing for Qlik will become more similar to its competitors since more and more companies will want their data (raw and ready for analytics) to be accessed via centralised data storage - and to be honest, that's the key concept of data democratisation!

I would be interested to learn whether or not you're experiencing the same shift? How do you see this development? How do you see QVD files in current times and how do you use them? Have I been using QVD files wrong this entire time? Let me know and I'm looking forward to the discussion.

Topaz Rosen

BI Developer at SolarEdge Technologies

2 年

Hi Boris Michel Had to return back to your (excellent)article while contemplating over something. This pre-dated the emergence of “dbt”. And I think the role “dbt” took in the MDS resembles very much what .qvd and n Tier architecture is for Qlik. Would love to hear your thoughts on this.

Martina Bobková

Online marketing specialist ?? Content Marketing ??Performance marketing ?? Social media coach

4 年

Hi Boris, Qvd is part of a big data, and nowadays there are more and more data. Please check this website www.evltool.com there is a tool for a fast data generation with a low implementation cost.

Nicolas Kaiser

Senior BI Consultant | Data Architect | ESG | Climate

4 年

Qvd files are still very important while developing big apps sitting on millions/billions of records. How would you manage an incremental load without them? Btw using qvds without implementing an incremental logic has always been useless in my opinion. I think that we are still going to use it as long as the In-Memory technologies are faster than any databases for OLAP queries (even columnar ones), then we could do only live queries.

Jonas Valleskog

Helping you unlock value from your data

4 年

QVD files are still a thing when designing efficient ETL processes for exclusively Qlik consumption. It’s not that big a thing though. Their primary USP being - they’re really fast to load from. It’s just a barebones file based data cache at the end of the day. Useful for decoupling source systems to accelerate your iterative development processes, I.e. by caching what you need and working against cache instead of a slow source. If your source is a high performance DW, the point of using a QVD layer to cache data diminishes significantly. Most clients I work with have their data in a multitude of places and sometimes not warehoused at all, so for them - QVD files still serve a function. Another way of looking at it is to consider the QVD file format as a tool in the Qlik toolkit enabling the platform to go deeper and cover a greater degree of the data source to dashboard journey than what some of the competitor products can do. Without them, you will likely require more tooling to do the same job, which means you do less in Qlik and more somewhere else. Not a good or bad thing per-say, but it would limit the agility and use cases which you could solve exclusively with Qlik.

Qlik Data Catalog can help to read data from QVD and expose it to other systems. Its a great way to enable true self-service

要查看或添加评论,请登录

Boris Michel的更多文章

  • A Holiday Miracle for Decentralised Teams

    A Holiday Miracle for Decentralised Teams

    The holiday season is approaching. Our team has the tradition of doing an office Secret Santa.

    5 条评论
  • Innovation at the speed of *PUNCH*

    Innovation at the speed of *PUNCH*

    I am a sucker for innovation! So some weeks ago, when a friend of mine Alexander Schiller gave me a heads up, I signed…

    2 条评论
  • Data Strategy: Inspire Change - Part IV

    Data Strategy: Inspire Change - Part IV

    We’ve come quite some distance since the first post (Data Strategy: Bring the People) of our series in which we have…

    4 条评论
  • Data Strategy: Leadership and Data - Part III

    Data Strategy: Leadership and Data - Part III

    Welcome back! After a short introduction (part I here) and then looking at some cultural aspects that business should…

  • Data Strategy: It's a Cultural Thing - Part II

    Data Strategy: It's a Cultural Thing - Part II

    Welcome back! If you're new: welcome, to catch up, check out part 1 of this mini-series here. In last week's post, we…

    6 条评论
  • Data Strategy: Bring the people - Part I

    Data Strategy: Bring the people - Part I

    Hi all, in this blog mini-series, I want to take you on a journey of how you can make your data-driven projects…

    10 条评论

社区洞察

其他会员也浏览了