5 data & analytics trends to watch for in 2023 and beyond

5 data & analytics trends to watch for in 2023 and beyond

Data stacks, tools, and capabilities are evolving rapidly in the analytics universe. Business leaders must develop foresight and ammunition in the field to maintain a competitive advantage with data.

A lot of articles focus on the latest and coolest buzzwords. However, we are going to look beyond the generic ideas and buzzwords and instead make some practical and specific observations.

Here is a list of the 5 biggest, impactful trends for 2023 and beyond.

  1. Real-time data and its monitoring are going to be table stakes
  2. Hybrid data stack will win — not a “modern” data-stack
  3. Data traceability is going to be key for building customer trust
  4. Speed of data preparation will increase with AI, metadata & automation
  5. Data science & ML may become commoditized with no-code tools

Let’s dig deeper into each of these.

1. Real-time data and its monitoring are going to be table stakes

Just until a few years back, it was entirely acceptable to have a “daily refresh” of your data in dashboards, email reports, and spreadsheets.

This was justified as there were many sequential steps of data movement, integration, and time-intensive (& compute-intensive) transformation and aggregation that the raw data had to go through before it could be viable for a business to observe. The only way to get the data faster was to increase computing power and make the batch-processing more frequent (such as every 2 hours).

Nowadays, even a delay of 2 hours in accessing your key business metrics means that you would potentially lose millions of dollars.

Imagine not being able to respond to high traffic on your e-commerce website because you got to know about it 6 hours later.

Imagine not being able to respond to a security attack or a fraud attempt immediately because the data took its time to be ready.

Imagine launching a new feature on your streaming service and learning 12 hours later that people are dropping off because there was a bug in the new release.

As Andrew Grove says:

"Only the Paranoid Survive."

When it comes to data, only business leaders who are paranoid about measuring their business survive.

Real-time data isn’t a nice to have anymore. It’s table stakes. And, so is the quality assurance of that real-time data.

So, what can your business do about it?

  • Invest in real-time streaming infrastructure:?The paradigm shift is from “pulling the data” at a cadence to “pushing the data” whenever a change is detected. Technologies such as Fivetran allow ingesting terabytes of data in various formats, including application data, telemetry data, logs, and videos in near-real-time. Distributed data streaming systems such as Kafka, AWS Kinesis, and GCP Pubsub help data consumers “subscribe to” data streams from disparate source systems. Finally, distributed computing systems such as Apache Storm enables powerful data processing and aggregations with a latency of a few milliseconds. Collectively, these systems provide high reliability, throughput, and simplicity in making the dream of real-time data a reality.
  • Treat “data as a product”:?Every application owner that produces data from their system should be empowered to publish events for any downstream data consumers. The data producers should be held accountable to spend time understanding the downstream analytics use-cases, make their data easy-to-use & well-documented, and provide guarantees on timeliness and data quality. You need a platform for committing to and tracking SLAs.
  • Invest in data observability:?You deserve to know that your metrics & dashboards will be stale or broken before they are. For a successful real-time data strategy and implementation, you need data observability (proactive monitoring & detection) system that notifies you about the health of your data. DataGalaxy’s data observability does exactly that.

2. Hybrid data stack will win — not a “modern” data-stack

Yes, the “modern data stack” is all the buzz these days. The idea behind it is great as it is intentionally designed to reduce complexity in running a traditional data platform.

However, we see a “hybrid data stack” much more prevalent than a modern data stack. And, that’s what the data says as well:?69% of businesses are opting for a hybrid stack?(source ).

There are two main challenges with the modern data stack:

  • Although the popularity of modern data stacks is rising fast, the first version of a “modern data stack” is almost 10 years old. That’s not the reality we live in today. So, many technologies in that stack are not as modern anymore and getting disrupted by newer technologies. And we expect that to continue in the future.
  • It’s not practical to sustain a “modern data stack.” Over time, your tech stack broadens, becomes more fragmented, and gets riddled with tech debt. This is completely natural as business needs to evolve and change as companies mature. Different use cases require different tools & technologies. As a result, a practical solution for a business is a hybrid data stack that comprises solutions, tech, and data vendors from the past, the present, and the future.

All of this raises the complexity of managing, analyzing, and governing your data to a whole new level. With a hybrid data stack, it becomes super important to achieve full data visibility!

DataGalaxy is connected to your past, present, and future data stack.


It supports all major cloud platforms. We provide a large set of connectors to easily identify and map your physical data, data-processing technologies, and reporting tools.

3. Data traceability is going to be key for building customer trust

We live in times when customer trust is fading regarding how organizations store, manage and protect their data.

An erosion of customer trust means an erosion of your business’s top line.

To deliver on customer trust with data, your organization needs an enterprise-grade data traceability solution that allows you to easily follow your data back to its genesis. This involves clear visibility of how data flows, who owns what, who uses what, and the transformations that along the way.

In a large company, a single data point could be used by a remote data science team that the data owner hasn’t ever heard of. Not knowing it could erode customer trust.

Without this visibility, your business simply won’t be compliant with GDPR! If your end-customer asks you to?delete every single trace of their data?from your systems, you need to know where to delete it from.

You need to know:

  • Where are the different sources, and who owns what
  • Where all have the data been copied, and who owns what
  • Which models and reports are using it downstream & who uses those

Without this information at your fingertips, your business could be fined.

It is even more complex to orchestrate, trace and visualize the inputs, outputs, and end-to-end transport of data across disparate systems such as Kafka, OLTP, to data-warehouse to Looker or Google Sheet.

A good data traceability solution enables you to trace your data across the entire data landscape spanning from transactional data stores, messaging systems, and warehousing systems to analytic & reporting tools.

A sophisticated data traceability solution (like the one DataGalaxy provides) does all of the following:

  • Integrates into disparate systems and derives insights from the larger system
  • Tracks down each single data point & understands why & where it’s broken
  • Traces lineage upstream & downstream, owners & users, and all the transformations that happen along the way

4. Speed of data preparation will increase with AI, metadata & automation

At DataGalaxy, we have seen first-hand how much time and investment businesses need to put into data preparation. According to?TDWI Research Survey , more than one-third of the businesses said that their data teams spent ~70% of their time on data preparation!

Not only does this slow down the time-to-insight for business leaders, but it also requires businesses to hire, train and rely on highly specialized data talent.

There can be a better future state. And, change is in progress.

There are three levels of maturity for businesses to speed up data prep:

  • Self-service data prep tools, for business users, not requiring advanced SQL
  • Assisted intelligence using powerful metadata for faster data prep
  • Data prep automation that abstracts away the complex logic

As a first step, using self-service data preparation & integration solutions such as?Alteryx?empowers not only your highly-skilled data engineers & analysts but also your business users, such as power Excel users, to achieve the same complexity of data transformation without requiring months of manual effort and years of technical training.

Second, a powerful metadata solution such as?DataGalaxy?takes away a whole lot of guesswork that analysts have to waste time on when preparing data. “Which table/column should I use?”, “Is this data fresh/accurate?”, “How should I join/filter this data?” — time spent answering all these questions adds up and impacts your data preparation velocity. DataGalaxy connects to your internal systems to centralize the metadata and leverages the historical data usage contained in logs of queries and reports. With the help of our intelligence layer, we uncover relevant & healthy data and its linkages and present those to the business analyst for faster data preparation. With every new analytics log and user documentation, DataGalaxy’s system becomes smarter in ranking and recommendation.

Once your business has accomplished the first two maturity levels, it’s time to take it to the next level: automating data preparation. Products such as?PreQL?have taken up this ambitious initiative to deliver metrics at the fingertips of business users by abstracting away the complexity of data transformation from business users.

5. Data science & ML may become commoditized with no-code tools

It’s clear — the demand for data science is increasing rapidly as the volumes of data produced by businesses are increasing exponentially. However, there are too few data scientists to meet this demand.

According to a?LinkedIn Workforce Report ?in 2018, there were 151,000 unfilled data scientist jobs across the United States, with “acute” shortages in SF, LA, and NYC.

We can’t expect to close this gap by producing an exponentially higher number of PhDs in Mathematics and Data Science. The only way to scale our solutions to the demand is to democratize data science.

Any techno-functional analyst who is an Excel user — such as consultants, business analysts, and product managers — will be empowered to apply data science with powerful no-code tools. These tools will provide prebuilt, out-of-the-box algorithms for the most common use cases. Whether you need a classifier, a predictive system, an optimization tool, or a content generator, one doesn’t need to build it from scratch. Instead of requiring you to install Python libraries to get started, we will have tools that provide an intuitive user interface to input data, configure parameters and run suggested algorithms.

While algorithms and data science will be commoditized (for good), it doesn’t mean that we won’t need data scientists anymore. In fact, it will empower data scientists to work on more important and harder problems instead of reinventing the wheel every time.

New technologies such as?AWS SageMaker,?Civis Analytics,?Dataiku DSS,?DataRobot,?Domino Data Lab, and?Obviously AI?use prebuilt models & algorithms, automation, and visual interfaces to apply state-of-the-art machine learning and data science.

While data science “with just a few clicks” is exciting, there is also a risk that insufficient understanding of algorithms can lead to misuse, overuse, and incorrect analyses.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了