2024 Predictions, Data Leader's Guide to GenAI, Data Modeling Rediscovered, and Trends in Data Products
1. Trends for 2024: Our Team Gazes into the Crystal Ball
By Eckerson Group
Summary Predictions
Generative AI has certainly given everyone lots to talk and think about. The implications—both good and bad—are staggering. So, it’s no surprise that half our predictions involve GenAI, in particular, and machine learning and artificial intelligence (ML/AI) in general.?
Internally, we have feisty debates about GenAI and that’s reflected in our predictions. Dave Wells thinks it’s been overhyped and expectations will come back to earth in 2024 as organizations become more familiar with the strengths and drawbacks of the technology. Kevin Petrie is cautiously optimistic about GenAI and believes it’s a watershed moment that will deliver another paradigm shift in the way organizations harness data to achieve business objectives.?
Of course, there’s more to AI than GenAI. Our team believes that ML/AI will reshape nearly every commercial product, especially software products, such as data & analytics tools. Because of the pervasiveness of ML/AI, including GenAI, our team believes that organizations will need to pay greater attention to the data that feeds ML/AI models, improve AI governance processes, pay more attention to the environmental impacts of ML/AI processing, and heed the growing list of regulations governing the use of ML/AI. ?
Outside of GenAI, our team contemplates the impact of the rise of mid-market data catalogs, data product platforms, and quantum computing. We also ponder whether ML/AI will usher in an era of data collaboration and interoperability (versus monolithic architectures) and whether data quality will evolve from manual data cleansing to automated remediation. ?
2. The Data Leader’s Guide to Generative AI, Part I: Models, Applications, and Pipelines
By Kevin Petrie
As with all disruptive technologies, generative AI offers both upsides and downsides to early adopters. In this case, the upsides include rich digital interactions and a healthy productivity boost. The downsides, however, range from confused customers to operational errors and privacy breaches.
Timely, accurate, and trustworthy data makes the difference between these two outcomes. ?Chief data officers (CDOs) and their teams therefore have a big role to play. To make generative AI (GenAI) initiatives successful, data teams must modernize their environments, extend governance programs, and collaborate tightly with their data science colleagues. This blog starts a three-part series that examines these requirements in the following areas.
Language models
Generative AI (GenAI) refers to a type of neural network that humans train to interpret and create digital content such as text, images, or audio. In 2017, GenAI researchers at Google introduced the idea of a “transformer” that converts sequences of inputs into sequences of outputs. This gave rise to the language model (LM), which is essentially a huge calculator that predicts content, often strings of words, based on what it learned from existing content. The LM uses an “attention network” whose parameters quantify how tokens—e.g., words or punctuation marks—relate to one another. This attention network enables the LM to generate fast, intelligent responses to human prompts.?
The LM is a huge calculator that predicts strings of words based on what it learned from other words
OpenAI’s release of Chat-GPT 3.5 one year ago triggered today’s arms race among open source communities and vendors, such as Google, Microsoft, Hugging Face, and Anthropic to build ever-more powerful LMs. The recent chaos with OpenAI’s board and its investor Microsoft illustrates the high stakes—and the risks—as tech gorillas wrestle to capitalize on this technology.
To gain competitive advantage and deliver trustworthy results with GenAI, companies need to feed their LMs domain-specific data rather than high volumes of public Internet content. Such domain-specific LMs have the following three implementation options. (Eckerson Group also calls these domain-specific models “small language models” because they’re customized to process smaller datasets.)
3. A Fresh Look at Data Modeling Part 2: Rediscovering the Lost Art of Data Modeling
By Dave Wells
Data modeling is a core skill of data engineering, but it is missing or inadequate in many data engineering teams. Today’s data engineers focus primarily on data pipelines—the processes to move data from one place to another. That is a natural consequence of recent focus on data lakes and data science. But the result is loss of focus on shaping and structuring data—the stuff of data modeling and database design.
Most modern data engineers are process engineers, not product engineers.
领英推荐
Most modern data engineers are process engineers, not product engineers. But modern data management needs to have both process engineering and product engineering. The need for product engineering is apparent with the rising interest in data products and the need for product thinking. To achieve the right balance of process and product focus we need to rediscover the lost art of data modeling.?
Attention to data modeling diminished as data pipelines and schema-on-read became mainstream practices.
I refer to data modeling as a lost art because it has existed as a data analysis and design discipline for decades. Entity-Relationship Modeling (ERM), introduced by Dr. Peter Chen in the mid-1970s, is the foundation of most modern data modeling techniques. ERM was widely practiced from the 1980s to around 2010 when data lakes became a data management priority. Attention to modeling diminished as data pipelines and schema-on-read became mainstream practices. It is time to rediscover the lost art of data modeling, starting with the foundational concepts of ER Modeling.?
In the first article of this series, I stated that today’s data modeling is different from the past. Today’s data modeling extends data modeling practices of the past to work with the variety of data that we work with now. That first article describes several kinds of data structures including relationally structured, dimensionally structured, dynamically structured, and semantically structured. This article describes modeling techniques for relationally structured data—a logical place to begin because it is foundational. Many data professionals who are mid-career or beyond are already familiar with relational modeling techniques, and should not think these skills are outdated. Relational data is still predominant in operational systems, and most BI and analytics projects still rely on relational data. For those who have not yet experienced relational data modeling—especially data engineers—this is an important skill set to acquire.?
In future articles I’ll discuss dimensional, dynamic, and semantic modeling techniques. There you will see how these more recent modeling techniques build on a relational data modeling foundation.?
What Is Relational Data?
With apologies for stating something that may be obvious, I’ll briefly describe relational data to be sure that we’re all on the same page. Relational data is structured data that represents real-world things and the relationships that exist among them. Relational data structures organize data as rows and columns in tables, with rows representing instances of things and columns representing facts about those things. Throughout this article I use examples of a fictional car rental company to illustrate data and data models. Figure 1 shows examples of relational data for customers and vehicles.
Figure 1. Relational Data Tables
4. Webinar: Five Disruptive Trends that Every Data & AI Leader Should Understand
Speakers: Kevin Petrie & Omid Razavi
2024 is gearing up to be an impactful year for AI and analytics. Join us on January 30, as Kevin Petrie (VP of Research at Eckerson Group) and Omid Razavi (SVP of Customer Success at Alluxio) share key trends that data and AI leaders should know.
This event will efficiently guide you with market data and expert insights to drive successful business outcomes.
5. Eckerson Group Survey on Data Products
Eckerson Group seeks your help to understand how organizations are defining, adopting, and governing data products today.
In exchange for taking this short 12-question survey, we will send you the report that contains these results if you enter your email address at the end.
About Eckerson Group
Eckerson Group is a global research and consulting firm that focuses solely on data analytics. Our experts have substantial experience in data analytics and specialize in data strategy, data architecture, data management, data governance, data science, and data analytics.
Our clients say we are hard-working, insightful, and humble. It stems from our love of data and desire to help organizations optimize their data investments. We see ourselves as a family of continuous learners, interpreting the world of data and analytics for you.
Get more value from your data. Put an expert on your side. Learn what Eckerson Group can do for you!
Reflecting on the past and anticipating trends is crucial, and generative AI can be a game-changer in harnessing data for insightful forecasting. ?? By integrating generative AI, you can enhance data accuracy and speed up model development, ensuring you stay ahead in the tech leadership race. ?? Let's explore how generative AI can elevate your work; book a call with us to discover its transformative potential for your data and AI initiatives. ?? Cindy