Out with the Old ETL: Navigating the Upgrade Maze

Out with the Old ETL: Navigating the Upgrade Maze

Here is what I find very fascinating lately.

The more data professionals I talk to and the more data integration projects we do, the more I realise just how archaic D&A is in most established organisations. And I mean, wow!

  • Key analytical reports living on someone’s PC (not even a power back up!) – talk about single points of failure
  • Shared spreadsheets used to enter crucial client data without any quality checks or governance – cells get overwritten regularly
  • Paper forms that are then manually entered into a database (full time job, mind) – data gets into the wrong fields
  • Data pipelines built for a singular custom purpose and optimised to run on-prem. They are then dumped into a cloud without understanding of the implications (cost and performance are all over the place)
  • Businesses are terrified to touch legacy logic. So, they build new rules on top, creating monstrous pipelines that few people understand
  • Maintaining multiple redundant systems because the business is not sure of the implications if they switched off the old one

I can go on and on. Feel free to share your own horror stories in the comments.

What baffles me most is that as the data industry, we are at the top of our game in terms of tools and capabilities. Today, we can solve pretty much any data challenge. We have knowledge, tools and experience to make the transition to the modern ways like never before. Yet, most organisations continue to cling to their antiquated data systems, processes and analytics. Why??


Lost in the data maze

I find it extremely curious. IOblend’s core focus is on data migrations, building new or replacing old pipelines with modern ones (ETL), and synchronising data among multiple systems (on-prem and cloud-based). Majority of our hands-on experience naturally stems from working on those types of projects. But I know the issues span the entirety of the digital transformation landscape.

We encountered mindboggling complexity when upgrading legacy data pipelines. Legacy systems often have highly customised configurations, deeply embedded within an organisation's operations. These systems were developed over years, decades even. They are tailored to specific business needs and intricately linked with other enterprise processes. The shift to modern architectures means disentangling these connections and re-establishing them in a new, fundamentally different environment. That’s very scary to most data teams.

Legacy systems always contain inconsistencies, data quality issues and undocumented data-handling practices, which lead to challenges when aligning them with the modern cloud-based systems. A straightforward migration job on the surface quickly turns into a sheer nightmare. It’s often simpler to just build another ETL pipeline on top of the existing one. Take the existing feed and iterate from that. So, what business end up with is a spaghetti of pipelines of various vintages and dubious quality, all interdependent on each other. The sprawl keeps growing over time. Sounds familiar??


Legacy ETL to the cloud

One of the most formidable challenges is the migration of legacy ETL processes. The business often doesn’t realise what’s involved. They just want what they consider a “lift and shift” job. Just move it to the cloud. Everyone does it. Shouldn’t take long, right? Well, no.

Cloud architecture is fundamentally different from the on-prem ones. To take a full advantage of the performance and lower operating costs, the business must rebuild their ETL and associated processes to work in the cloud. You must optimise these processes for a new environment that operates on different principles of data storage, computation, and scalability. So no, “lift and shit” won’t cut it. A proper migration is required.?


The reluctance to alter business logic

Then, if the thorough rebuild is required, it means getting deep under the skin of the existing pipelines and systems. However, data engineers dread updating the business logic embedded deep within legacy systems. Their fear is rooted in the risk of disrupting established data processing flows, which could lead to data inaccuracies, reporting errors, or even system failures. The latter one is often a sackable offence.

What is very unhelpful is that legacy systems tend to lack clear documentation, especially around the custom modifications. The business users who were involved in the delivery of the said system and associated analytics suites have long since retired. This makes the task of accurately replicating or updating business logic in a new environment painful, to say the least. It’s very easy to open a can of worms. Hence, the engineers steer away.?


Migrating ETL takes forever

If you’ve ever been involved in an ETL migration project, you know it always takes longer than you planned it to be. The time required for a complete and fully supported ETL migration depends significantly on the complexity of the existing systems, the volume of data, tools used for implementation, and the specific requirements of the new architecture. Typically, such migrations can take anywhere from several months to well over a year. That’s for a modest migration (single system to the cloud).

One of the projects we witnessed a few years back was an attempted migration from an on-prem system to a modern, cloud-based architecture. But the company could not get itself to rebuild and decommission the core engine, which had been developed a few decades earlier. They tried to splice the new cloud tech on top of it and exactly replicate the legacy logic in the new system. Even when it didn’t make sense to do it. The business just didn’t have the necessary understanding of their own system and feared disruption.

They ran out of money trying to splice together a Frankenstein monster, scrapping (years!) of hard work that had gone into it.?


Failure rate is high

The cost of migrating a legacy ETL process to a modern architecture can be substantial. It encompasses not only the direct costs of cloud services and tools. There are also indirect costs involved such as training, potential downtime, and the resources involved in planning and executing the migration. Such migrations often run into hundreds of thousands or millions of pounds, depending on the scale and complexity of the operation. The dev work alone can cost a small fortune.

The challenges cut across all industries in an equal manner. We have seen similar cases in all sorts of organisations. Banking, healthcare, manufacturing, aerospace, retail, telecoms, utilities, you name it. Everywhere they encounter the same issues when undertaking digital transformation.

Gartner estimated over ? of all digital transformations fail. The reasons are overrunning budgets and busted timescales. You can see why.

The businesses are thus understandably sceptical when they consider upgrading older systems and processes. Most have been burnt in the past and got scars to prove it.?


Fear and lack of incentives

This is why, despite having a treasure trove of sophisticated tools and capabilities at our fingertips, there's a stubborn hesitance to break free from these antiquated systems. This isn't merely a case of grappling with technical challenges. No, it's more deep-seated than that.

It's a blend of trepidation towards the unknown, apprehension about potential pitfalls, and past failures. Also, surprisingly, there is a lack of full clarity on the benefits that modern technology brings to the table. Yes, modern technology makes our business better off. But why I see my cost line swelling faster than my revenue after we moved to the cloud?

Then there is little incentive for the devs to untangle the web of the business logic, system designs and ancient ETL. They will spend time unpicking the puzzles but won’t get any additional rewards for that. And if they accidentally bring down the system in the process, they face losing their jobs. Let someone else do it.?


Light at the end of the tunnel?

So the way I see it, the key here is to minimise the risk, improve incentives and empower people inside the organisations to drive change. Organisations should not shy from bringing in external expertise and tooling to make the journey faster and more cost effective. Do not get hung up on particular technologies and fads. Understand what suits your use case best and stick with that. Do not build for the distant future (it never arrives, btw), over-spec for every conceivable eventuality and complicate the design to the point where it never works. Focus on delivering the value fast today while preserving the flexibility to scale when needed.

I believe, if we can land this message with the businesses, the journey to the modern data analytics will accelerate.

Visit us at IOblend.com for all your data integration and migration needs. We specialise in de-risking and simplifying digital transformations, helping you successfully navigate your data journeys. Drop us a note and let’s chat.

Val, glad you are also feeling the industry’s pain. Keep digging for more “dirt under the carpets” ??

要查看或添加评论,请登录

Val Goldine的更多文章

社区洞察