登录查看更多内容

Hot off the Presses - Data Democratization, Data Products, Semantic Layer, Data Modeling, and Generative AI

Eckerson Group

Get More Value From Your Data

发布日期: 2023年12月8日

+ 关注

1. Data Democratization and the Duties of Data Citizenship

By Jay Piscioneri

For decades we’ve tried to empower enterprise stakeholders with data to spot problems and opportunities and make better decisions about how to respond. Data democratization is the latest buzzword to describe this elusive goal. While there have been advances in data management, governance, and analytics, something keeps getting in the way of achieving data democratization.?

The solutions that data industry vendors and analysts often propose are about technical components and approaches, such as data platforms, architectures, formats, programming languages, and now generative AI. This makes sense because we need new technical approaches to address the massive scale and complexity of modern data. But technical solutions, while necessary, are not sufficient.?

The Human Factor?

One constant barrier to data democratization receives far less attention, the human factor. Thomas Jefferson wrote about the human factor in fostering political democracy:

“Whenever the people are well-informed, they can be trusted with their own government.”

Jefferson said that for democracy to work, the citizenry must have a basic level of education and be well-informed on the issues of the day. The same is true of a data democracy. For data to empower people, people must understand how to use it. They must know the conclusions they can draw from it, the suppositions it does not support, and most importantly, how to care for data to protect its quality and secure it from theft and misuse.?

>Continue reading here

2. Data Products Part II: Data Products Require Product Thinking

By Wayne Eckerson

According to a recent LinkedIn poll I conducted, 54% of respondents said their organization has implemented data products. That’s a surprisingly high percentage, and it led me to ask how people define a data product. As I said in my prior blog , a data product is not a data asset with better quality or governance. It’s the output of a data product organization that oversees the complete lifecycle of a data product and manages its availability in an automated way.

The organizations that have implemented bonafide data products say that the biggest challenge is creating a “product mindset”. This is ironic because all commercial organizations have product teams that define and package products and services. But product thinking and experience are almost non-existent among data and analytics professionals. Hence, adopting a data product strategy is an uphill climb for most data teams.

Adopting a data product strategy is an uphill climb for most data teams.

Product Thinking

So, how do you instill a product mindset among data people? The chief data officer can’t just declare that the data team is now a product organization. They can’t simply appoint people to serve as product managers and owners and expect that a product culture will grow in alien soil. Without ample education and coaching, data professionals and the businesspeople they support will simply resort to old ways of doing things. If there is a product team, it will be in name only.?

Here are a few tips to nurture product mindset and develop a bonafide product organization:?

Seed product experts on teams. The best way to create a product culture is to hire or assign people who already have a product mindset and know what product teams do and don’t do. Of course, this can be prohibitively expensive if you have more than a handful of domain teams interested in producing data products. To reduce costs, you could steal product managers from elsewhere in your company, but you won’t make many allies this way.?
Train and coach. You can also hire a product expert to train and coach newly appointed product managers. The coaching element is critical since new product managers will need guidance long after they’ve completed their training. To maintain a product sensibility, they will need constant reminders and hints to keep from falling back on old habits.
Centralize. Some organizations create a central product team that fields requests from business domains, creates product plans, and oversees the execution of the data products. A centralized approach addresses the resource constraint, but it creates a potential bottleneck that could hamper the development and delivery of data products at a pace that meets organizational needs.?

To instill a product mindset, you should seed product experts on teams, train and coach new product managers, and create a central product team.

The good news is that creating a product organization is not rocket science. It doesn’t take an inordinate amount of time to train and coach product managers. However, it’s imperative that the organization properly defines roles and processes to support the management and marketing of data products.?

>Continue reading here

3. Why and How to Enable Data Science with an Independent Semantic Layer

By Kevin Petrie

Metrics

The semantic layer presents metrics such as average revenue per sales rep by region, operating costs per factory, or annual growth in unit sales per customer. These metrics describe market trends and business performance as part of BI projects, and also serve as features for ML models as part of data science projects. For example, data scientists might identify and refine features such as annual sales per customer, transaction size vs. average, or historical market prices. The semantic layer then presents these metrics and their values to ML models that segment customers, detect fraudulent transactions, predict market prices, and so on.

The semantic layer presents metrics that serve as features?for machine learning models

Caching

The semantic layer pre-fetches high-priority metrics into cache (i.e., memory) along with their supporting tables, columns, and records. While caching can push up cloud costs, it also reduces latency for real-time ML use cases such as fraud prevention or customer recommendations. Use cases like these might have an ML platform that uses a semantic layer to pre-fetch metrics and calculate feature values in memory based on inputs from a streaming data pipeline. The ML model then uses those feature values to produce its real-time predictions or recommendations. Caching plays a critical role in performance when metrics and features derive from distributed, even far-flung data sources.

Atlan 8 个月前

Data Chaos? AI to the Rescue

Data & Analytics 4 个月前

Data Science Best Practices

Pratibha Kumari J. 1 年前

Caching speeds the processing of ML features that derive from far-flung data sources

>Continue reading here

4. A Fresh Look at Data Modeling Part 1: The What and Why of Data Modeling

By Dave Wells

Many organizations abandoned the practices of data modeling as they shifted from data management practices of the past to adopt big data, data lake, and NoSQL technologies. Past practices focused on relational data and were typically relegated to logical and physical design to develop new databases. Today’s data modeling has much larger scope driven by many factors. These include advances in analytics and data science, rapid growth in the volume and variety of data, a shift from primarily working with enterprise generated data to acquiring lots of external data, semantic disparity of operational data as operational systems become predominantly SaaS applications, and the pursuit of data lakes and NoSQL technologies.?

These factors influence data modeling practices in three significant ways: (1) modeling to understand content and structure of existing and acquired data as well as modeling to design new databases, (2) semantic and conceptual modeling as well as logical and physical modeling, (3) modeling for all types of data including key-value, document-oriented, knowledge graphs, property graphs, etc.?

With those differences in mind, my goal with this article is to make the case that data modeling is not dead. It is more important than ever before. And it is more interesting than ever before.

The Data Modeling Process

Data modeling is the process of constructing data models. That simple definition expresses the reality, but not the complexities of data modeling. It is important to recognize that a data model is more than a diagram. It is a description of the content and structure of a collection of data. That means a diagram (or set of diagrams) supported with descriptive text and definitions. Furthermore, it is a description of the content and structure of a collection of data from a particular perspective – semantic, business, system, or technical perspective. Those perspectives partially align with the multiple levels of data modeling that have been practiced for decades. (See figure 1.)

Figure 1. Levels of Data Modeling Past and Present

>Continue reading here

5. Generative AI Needs Vigilant Data Cataloging and Governance

By Kevin Petrie

What is GenAI?

GenAI refers to a type of artificial intelligence that generates digital content such as text, images, or audio after being trained on a corpus of existing content. The most broadly applicable form of GenAI centers on a language model (LM), which is a type of neural network whose interconnected nodes collaborate to interpret, summarize, and generate text. OpenAI ’s release of ChatGPT 3.5 in November 2022 triggered an arms race among LM innovators. Google released Bard, Microsoft integrated OpenAI code into its products, and GenAI specialists such as Hugging Face and Anthropic gained new prominence with their LMs.?

Now things get tricky

Companies are embedding LMs into their applications and workflows to boost productivity and gain competitive advantage. They seek to address use cases such as customer service document processing based on their own domain-specific data, especially natural-language text. But text files introduce the risks of data quality, fairness, and privacy. They can cause LMs to hallucinate, propagate bias, or expose sensitive information unless they are properly cataloged and governed.?

Data teams, more accustomed to database tables, must get a handle on governing all these PDFs, Google Docs, and other text files to ensure GenAI does more good than harm. And the stakes run high: 46% of data practitioners told Eckerson Group in a recent survey that their company does not have sufficient data quality and governance controls to support its AI/ML initiatives.

Data teams need to govern the natural-language text that feeds GenAI initiatives

Enter the data catalog

The data catalog has long assisted governance by enabling data analysts, scientists, engineers, and stewards to evaluate and control datasets in their environment. It centralizes a wide range of metadata—file names, database schemas, category labels, and more—so data teams can vet data inputs for all types of analytics projects.

Modern catalogs go a step further to evaluate risk and control usage of text files for GenAI initiatives.

Modern catalogs go a step further to evaluate risk and control usage of text files for GenAI initiatives. This helps data teams fine-tune and prompt their LMs with inputs that are accurate, explainable, private, IP-friendly, and fair. (See figure 1.)?

Figure 1. Data Catalog Controls for Gen AI

>Continue reading here

About Eckerson Group

Eckerson Group is a global research and consulting firm that focuses solely on data analytics. Our experts have substantial experience in data analytics and specialize in data strategy, data architecture, data management, data governance, data science, and data analytics.

Our clients say we are hard-working, insightful, and humble. It stems from our love of data and desire to help organizations optimize their data investments. We see ourselves as a family of continuous learners, interpreting the world of data and analytics for you.

Get more value from your data. Put an expert on your side. Learn what Eckerson Group can do for you!

Aashima Sharma

Senior Digital Marketing Specialist- Data Dynamics

9 个月

Great lineup of analytics insights! The articles on overcoming barriers to data democratization, building a data products culture, leveraging semantic layers for #AI, revisiting #data modeling fundamentals, and responsible #generativeai via data cataloging sparked some valuable reflections and ideas for me to explore. Already looking forward to next month's edition - keep up the great work! ??

要查看或添加评论，请登录

Hot off the Presses - Data Democratization, Data Products, Semantic Layer, Data Modeling, and Generative AI

Eckerson Group

Get More Value From Your Data

1. Data Democratization and the Duties of Data Citizenship

The Human Factor?

2. Data Products Part II: Data Products Require Product Thinking

Product Thinking

3. Why and How to Enable Data Science with an Independent Semantic Layer

Metrics

Caching

领英推荐

4. A Fresh Look at Data Modeling Part 1: The What and Why of Data Modeling

The Data Modeling Process

5. Generative AI Needs Vigilant Data Cataloging and Governance

What is GenAI?

Now things get tricky

Enter the data catalog

更多精彩文章

社区洞察

其他会员也浏览了

AI, Data Science, Analytics Main Developments in 2018 and Key Trends for 2019

Data Science Talent | Newsletter Edition 3

"Empowering Predictive Analytics: The Synergy of Machine Learning and Data Threads in a Dynamic Data Landscape"

Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterprise: The SEAL Method

You, the enterprise and AI - Part 2: Data Science vs Artificial Intelligence

Tackling Data Challenges to Build Enterprise AI

Unleashing AI's Potential

What is Data Science? How does it convert raw data into useful information for companies to grow?

Unlocking Data Potential: The Power of Data Transformation in AI Use Cases

DATA Pill #028 - how data-driven is your company really? Also what is the future of AI?

1. Data Democratization and the Duties of Data Citizenship

The Human Factor?

2. Data Products Part II: Data Products Require Product Thinking

Product Thinking

3. Why and How to Enable Data Science with an Independent Semantic Layer

Metrics

Caching

领英推荐

4. A Fresh Look at Data Modeling Part 1: The What and Why of Data Modeling

The Data Modeling Process

5. Generative AI Needs Vigilant Data Cataloging and Governance

What is GenAI?

Now things get tricky

Enter the data catalog

Blending Data Mesh and Data Fabric: Crafting a Balanced Data Strategy

2024年10月29日

A Guide to Self Service: Understand, Plan & Implement

2024年9月27日

Secrets to Creating an Effective Data Strategy: Tips from Industry Leaders

2024年6月21日

A Guide to Data Products: Everything You Need to Understand, Plan, and Implement

2024年5月28日

2024 Predictions, Data Leader's Guide to GenAI, Data Modeling Rediscovered, and Trends in Data Products

2024年1月12日

Hot off the Presses - Metadata Management, Creating Data Products, Responsible AI Ethics, and AI Data Pipelines

2023年11月6日

Hot off the Presses - Data Pipelines For Gen AI, ROI On MDM, Responsible AI, And Data Analytics Operating Models

2023年10月4日

Hot off the Presses - Defining Data Products, Taming the AI Frontier, Driving ROI with MDM, Succeeding with Large Language Models, Data Fabric

2023年9月6日

Hot off the Presses - Navigating Generative AI, Data Product Development, Small Language Models, DataOps, And FinOps

2023年7月21日

Modernizing Data Stack, Data Products, LLMs for Data Engineering, Center of Excellence, Data Mesh Readiness and More

2023年6月21日

社区洞察

其他会员也浏览了

AI, Data Science, Analytics Main Developments in 2018 and Key Trends for 2019

Data Science Talent | Newsletter Edition 3

"Empowering Predictive Analytics: The Synergy of Machine Learning and Data Threads in a Dynamic Data Landscape"

Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterprise: The SEAL Method

You, the enterprise and AI - Part 2: Data Science vs Artificial Intelligence

Tackling Data Challenges to Build Enterprise AI

Unleashing AI's Potential

What is Data Science? How does it convert raw data into useful information for companies to grow?

Unlocking Data Potential: The Power of Data Transformation in AI Use Cases

DATA Pill #028 - how data-driven is your company really? Also what is the future of AI?