New Data Platforms - Real-Time Data, luxury or necessity?

New Data Platforms - Real-Time Data, luxury or necessity?

"We've been looking for real-time data for a long time. But now, we can't really do without it!"?

I'll try to convince the last reluctant ones.

Introduction

Lately, I've been struck by two reflections that seems to echo this trend:

Evolving data requirements

Like the unexpected twists and turns in corporate strategies, expectations of real-time Data have taken on unprecedented proportions. I sometimes hear "We don't need real-time", often relayed by IT teams. I feel like replying directly: "Maybe today, but tomorrow...?"

Okay, looking at a real-time dashboard from morning to night isn't a panacea, but conversely being forced to wait until the next morning to find out about the previous day's performance is no longer a truly acceptable scenario.

Technological capabilities

With the new capabilities offered today by Cloud Data Platforms and the galloping innovation in the Cloud to meet the ever more demanding needs of customers, real-time has become a key factor, whose strategic importance, and above all accessibility, cannot be ignored.

The advent of new data platforms has marked a significant turning point in the way companies manage and interpret their information.

My definition of Real-Time Data

"Real-Time in Data is the art of capturing the moment, a bit like capturing the perfect photo at the right moment."

Unlike the old school, where batch data reigned supreme, mainly due to technological constraints, real-time is the perfect alignment between business need and data availability. It's as if you finally had the super-power to understand, predict and react instantly.

Real-Time Data is the "right" moment when the business needs the data.

There's really mostly no need to have, in just a few milliseconds, details of all the new orders incoming into the Information System. On the other hand, it's hard to wait an hour if you need to streamline your operations. A few minutes will certainly be enough for operational needs.

If we're talking about customer segmentation, a daily or weekly recalculation will probably suffice, as opposed to the monthly or annual one previously used.

In short, Real-Time is when data availability is no longer subject to technological constraints.

Real-Time, yes, but how?

"Adopting real-time is a bit like learning a new dance: you start with the basic steps before launching into the complex figures."

Starting with data ingestion is a bit like laying the foundations before building the rest of the edifice.

The first step will probably be to start with micro-files on cloud storage (S3, Blob, GCS...). Today, we know very well how to ingest data as soon as it becomes available, using mechanisms native to Data platforms.

We can also deploy Data Sync solutions to synchronize operational databases with ad-hoc solutions (Fivetran, Airbyte, etc.).

Regular ingestion (every 5 or 10 minutes) will give your business teams the impression that the data is already there.

And surprise, using Kafka for streaming isn't always a panacea. Sometimes, simplicity takes precedence over complexity.

Data transformation

"Real-time data is a way of accessing your company's body language".

Today's Data Platforms, with their advanced functionalities such as Snowflake's Dynamic Tables or Databricks' Live Tables, offer surprisingly simple real-time data updating capabilities, incremental when the context permits (a single CTAS-type SQL query may suffice).

This is where we clearly see the limits of ETLs and ELT products alike, which often impose an update pattern that is far removed from the platform's native capabilities.

It’s still possible to orchestrate more manual update pipelines with requests generated by dbt and some incremental update templates.

Real-Time: The foundation for an Active Datalake

"Gone are the good old days of BI, when the data warehouse was a bit of a dead end for data."

As I often say, BI and analytics will soon represent less than 50% of the uses of a new Data Platform. So it's obvious that architectures and practices are no longer the same.

Datalakes thus become "active", by which I mean that they themselves become sources of data for the rest of the company, in particular operational IS, but also marketing, vendors apps, etc.

It goes without saying that the data re-injected must be as up-to-date as possible. Hence the fundamental importance of data now managed in real-time.

Monitoring and Data Quality

"Monitoring data quality in real-time is a bit like being the vigilant guardian of a precious treasure."

The issues of monitoring and data quality persist, however.

Given the ongoing updating of data as soon as it is ingested, it's not impossible that certain data quality issues slip more easily under the radar.

Here we look at the benefits of Data Observability and its 4 pillars:

  • Metrics
  • Metatada
  • Lineage
  • Logs

which natively integrate monitoring responses and modern data quality practices.

Conclusion

"Real-time in Data is a bit like a silent, game-changing revolution."

Ultimately, the adoption of real-time is not a technological whim, but an appropriate response to a world where data rules the game in business.

Is it such a complex investment, an approach to be put off until tomorrow? The debate is open, and time, as always, will be the best judge.

要查看或添加评论,请登录

Laurent LETOURMY的更多文章

社区洞察

其他会员也浏览了