Enterprise Data World 2024 Takeaways: Trending Topics in Data Management. Part 3

Enterprise Data World 2024 Takeaways: Trending Topics in Data Management. Part 3

I was privileged to deliver a workshop at the Enterprise Data World 2024.

Publishing this review is a way to express my gratitude to the fantastic team at DATAVERSITY and Tony Shaw personally for organizing this prestigious live event.

Part 2 of this article has discussed the trending topics in data architecture and modeling.

Part 3 will discuss the key trends in applying artificial intelligence to data management practices and developments in other areas of data management, such as data quality, master and metadata management, and data visualization.

Artificial Intelligence in Data Management

Artificial intelligence (AI), generative AI, and machine learning have become buzzwords in the data management community. First, let's look at their definitions. Many approaches exist to define these disciplines and their relationships. In this article, I will use the definitions from reliable sources. However, they may not be the “only correct ones.”

Gartner stipulates, "Artificial intelligence (AI) applies advanced analysis and logic-based techniques, including machine learning, to interpret events, support and automate decisions, and take actions .”

?“Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio, and?synthetic data ,” as TechTarget defines.

IBM defines machine learning (ML) as “a "ranch of?artificial intelligence (AI)?and computer science that focuses on the using data and algorithms to enable AI to imitate the way that humans learn, gradually improving its accuracy .”

?These three technologies are interrelated. First, it is essential to realize that GenAI and ML are types of artificial intelligence.

AI includes all forms of computational techniques that mimic human abilities.

Machine learning is a technique through which systems learn and make data-based decisions. ML is based on statistical methods and is used in other AI types, including GenAI.

GenAI is a specialized application of ML in which the algorithms do not just make decisions or classifications but produce new data instances that mimic the characteristics of the training data they have learned from. These models capture the essential patterns of the data and utilize this knowledge to generate synthetic data samples similar to the original dataset.

According to Roochir Purani , GenAI's key use cases and deliverables are “written content augmentation and creation, questions answering and discovering, language tone, summarization, simplification, classification of content for specific use cases, chatbot performance improvement, software coding, and synthetic data.”

Somehow, we have created a fetish for GenAI, hoping it will solve our challenges in managing data properly. Let me demonstrate a simple example: I want to visualize the relationship between AI, GenAI, and ML described above.

I checked various graphical options on Google and then asked ChatGPT to visualize this simple relation. Figure 1 compares the results of human and Gen AI work.

Figure 1: Comparison between human and GenAI visual representation of dependencies between AI, GenAI, and ML.

I am curious: which of the images appeals to you? I think you get my point about fetishizing GenAI capabilities.

So, let us be realistic about the role of AI and its sub-areas in data management. AI is a technology that can be used in different data management IT tools. AI is not a unique feature that can be used independently from other functionalities. AI empowers different data management capabilities and processes.

?So, let me share some takeaways:

Data management capabilities can be enforced by using AI technologies.

Several experts, Stephanie Paradis , Gretchen Burnham, CDMP , Dr. Gurpinder S. Dhillon , Aron Elston , Yves Meier , shared their expertise in this area. According to them, AI can be used to empower the following DM capabilities:

  • Data governance

AI can automate stewardship activities and assist in content analysis and summarization.

  • Metadata management

AI can assist in documenting and aligning technical and business metadata, identifying dependencies between different metadata objects, and generating metadata.

  • Data modeling and architecture

AI can help create business glossaries, document and link logical and technical data models, build data classifications, and integrate and aggregate data from various sources.

  • Data quality

AI can be used in data profiling, translating data quality requirements written in business language into data quality checks in technical languages, recommending DQ rules based on data source systems scans, and cleansing data.

  • Data lifecycle management

AI and ML can optimize data processing and storage of significant data volumes using distributed computing, data compression, and predictive caching techniques. These technologies can be used for real-time processing to ingest, analyze, and act upon data arrival.

  • Other use cases

Roochir Purani predicts the following emerging GenAI use cases: “producing synthetic data helping augment scarce and incomplete data,” “enterprise applications proactively suggesting actions based on historical transaction data,” “encapsulating and modernization of legacy systems code,” “assistance role in managing complex projects,” “improving process workflows,” and “negotiating contracts and optimizing bids.”

AI requires governance.

According to Stephanie Paradis and Gretchen Burnham, CDMP , data governance should focus on defining business cases, curating data sets taken for training, controlling syntactic data sets for production, and controlling and controlling models’ governance.

Douglas R. Briggs has stressed that good governance for AI should “balance support for innovation with risk and impact,” take into account the concerns of a “broad spectrum of interested parties,” “provide clear and effective guidance for practitioners,” integrate with “existing organizational governance,” and “remain flexible and agile to adapt.”

Challenges in leveraging AI models exist.

Roochir Purani has mentioned the following challenges related to AI adoption: “AI projects are not always aligned with business strategy,” “failing to consider what tangible capabilities AI projects need,” “misunderstanding the probabilistic nature of AI,” and “articulating what business value they want AI to create, but not recalibrating organizational behavior in ways that will deliver value with the humans who interact with AI.”

Adopting AI impacts multiple business capabilities/operations.

These business operations include a business strategy, operating model, enterprise architecture, engineering and operations, and change management. It also impacts people, processes, and technology.

Future of AI in data management

Sonny Rivera believes, "We"need to stop micro-optimizing archaic processes and start reimagining them in a GenAI world – across the whole data-to-insight value chain.”?

" want to thank the leading experts in this area for their valuable input: Stephanie Paradis , Gretchen Burnham, CDMP , @Jenna Alden and &Alex Cruikshank from WestMonroe, Douglas R. Briggs , Dr. Gurpinder S. Dhillon , Aron Elston & Yves Meyer, Roochir Purani , Mike King , Sonny Rivera

Other Data Management Capabilities

Even when discussing fancy staff about AI, we must still be down to Earth and focus on applying and improving foundational data management (DM) capabilities.

Let me share with you some takeaways from the conference that relate to core DM areas of expertise.

Data quality

  • According to C. Lwanga Yonke , the key challenges to improving data or information quality include inefficient collaboration between data producers and consumers, decentralized data administration, and DQ being considered IT tasks. As Teri Hinds, CDMP stated, some other reasons can be that DQ has different meanings to various people.
  • Multiple frameworks and methodologies exist to establish DQ management. It was interesting to see that three presenters described DQ using different concepts (e.g., activities or capabilities) and had pretty different viewpoints on the content of this capability.
  • Establishing a data quality business function is one way to improve data quality. This means investments in people, processes, and technology developments.
  • Establishing data (quality) management is a must for financial institutions due to the need to comply with regulations. Risk management becomes an important component of overall data management, as demonstrated by Gerard K.
  • Managing data quality in the cloud environments and an agile-oriented culture has its specifics.

Master Data Management

The DAMA Dictionary defines master data as “The data that provides the context for business activity data in the form of common and abstract concepts related to this activity. It includes the details (definitions and identifiers) of internal and external objects involved in business transactions, such as customers, products, employees, vendors, and controlled domains (code values.)”

DAMA-DMBOK2 separates master and reference data. However, in practice, I’ve seen situations where professionals combine these two data types because distinguishing them is challenging.

In my practice, I also experienced some other challenges. Sometimes, the same data (e.g., contract) can be identified as master or transactional, depending on an organization’s business model. I also can’t understand the difference between master data and other data management. The data management techniques and required capabilities are the same as applied to any data type. Of course, the outcomes, like data architecture, may differ.

Let’s come back to the key conference’s takeaways:

  • According to Donna Burbank , a successful MDM initiative requires alignment between data architecture, data governance and stewardship, and business processes. MDM must be considered in the context of a more comprehensive data management strategy. I believe all other data management capabilities, such as data quality, metadata management, and other types of enterprise architecture, also enable MDM.

Metadata management

My general observation is that the topic of metadata has been overlooked at the conference. This may happen because people don′t realize the role and importance of metadata. One reason is the complexity of the metadata concept. For example, unlike other data, metadata can be presented by a single element (e.g., a data owner) or a complex construct (e.g., data lineage).

In my workshop, I demonstrated that metadata management has two key goals: enable the data lifecycle and manage the metadata lifecycle. Most data management capabilities, like data governance, enterprise architecture, data quality, etc., produce, exchange, and consume three key types of metadata: business, technical, and operational. Various metadata constructs combine different types of metadata. For example, data lineage combines business and technical metadata, sometimes enriched by operational one. The data observability concept combines all three metadata types.

I included knowledge graphs discussed in several presentations a metadata-related topic.

Gartner states, “Knowledge graphs are machine-readable representations of the physical and digital worlds. They include entities (people, companies, digital assets) and their relationships, which adhere to a graph data model -- a network of nodes (vertices) and links (edges/arcs) .”

Let me share a couple of takeaways on this topic:

  • The EDM Council, presented by Elisa Kendall , promotes adopting data content standards to promote innovation across industries.” It does this by developing and standardizing industrial ontologies.

·????? Knowledge graphs enable data integration via semantic layers.

  • According to Dan Collier and Jeremy Debattista , implementing knowledge graphs requires a mindset adjustment to embrace data “as a valuable asset, one that could fuel growth and success.” This includes training staff, preparing data architecture, integrating and interlinking data assets, and improving data quality.
  • Knowledge graphs enhance the existing business processes, allow for the representation of diverse data sources, relationships, and metadata, help map models of business domains, create a foundation for data governance, and ensure data processing transparency by documenting data lineage.

Data visualization

Data visualization represents information in graphical format using charts, graphs, maps, and other visual tools.

LetLet'sscuss a couple of takeaways.

  • Michael Scofield demonstrated different techniques for human beings to use for data visualization.

He stated that graphics have several goals: “to figure out what is going on” and “to explain to decision-makers what they need to know about reality.” Data visualization helps “see things that other people cannot” and ?provides “unique insights and exclusive understanding of what’s happening.”

Knowing the audience and normalizing data to acknowledge context are the key success factors in expressing information.

According to them, data visualization is a dynamic, human-centered, analytical, and discovery process that requires multiple methods. Three factors ensure good data visualization: data, design, and function.

Advanced data visualization is focused on complex data and includes interactive dashboards, 3D visualization, augmented or virtual reality, etc.

The benefits of advanced data visualization are improved operational efficiency, enhanced risk management, increased collaboration, reduced costs, and improved decision-making.

Data Analytics

According to Amazon , “Data analytics converts raw data into actionable insights. It includes a range of tools, technologies, and processes used to find trends and solve problems using data. Data analytics can shape business processes, improve decision-making, and foster business growth.”

Gartner recognizes four maturity levels of data analytics: descriptive, diagnostic, predictive, and prescriptive. While many companies are still on the first and maybe second maturity levels, the ultimate goal is to reach the upper levels. However, the predictive and prescriptive analytics are linked to the AI capabilities we discussed before.

Prashanth H Southekal, PhD, MBA shared his insights regarding predictive analytics in business. The key takeaways were:

  • There are several insight sources: intuition, science, data, and analytics. Four key components in data analytics are algorithms, data, assumptions, and ethics.
  • Predictive analytics makes predictions about the future with historical data.
  • Five key criteria for selecting data analytics projects are improving business performance, practicality, relevance, the applicability of data analytics concepts, implementation, change management, and quantifiable business impact.

I want to thank the leading experts in this area for their valuable input: C. Lwanga Yonke , Teri Hinds, CDMP , Gerard K. , Missy Clymer & Nathan Schmitt , @Christina Smith, Prakash Kewalramani , Donna Burbank , @Louis Crook & Mike Overholt (Ashley), Elisa Kendall , Dan Collier , Jeremy Debattista , Michael Scofield , @Kulev Rail & Dr. Deepak Singh & Dr. Arunkumar Ranganathan from @Infosys, Prashanth H Southekal, PhD, MBA .

Part 3 is the last part of this article. In conclusion, I strongly advocate for the invaluable experience of engaging in face-to-face interactions with leading data management experts at live conferences. These interactions are not only an excellent opportunity to acquire in-depth knowledge but also to stay abreast of the latest industry trends and innovations. Additionally, they offer a unique platform to gather fresh ideas that can be effectively implemented within your organization, driving growth and fostering innovation.


Kaneshwari Patil

Marketing Operations Associate at Data Dynamics

6 个月

Your breakdown of AI, GenAI, and ML is incredibly informative. It's refreshing to see a clear explanation of these interconnected technologies and their applications in data management.

回复
C. Lwanga Yonke

Information Quality and Data Governance Consultant, Trainer, Advisor, Coach, Mentor

6 个月

Valuable summary, Dr. Irina Steenbeek! Also, thank you for the mention.

回复
Jeremy Debattista

Data Governance | Knowledge Graphs | Architect | Engineer

6 个月

Thank you for the mention and another interesting article

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了