Enhancing Semantic Clarity in Data Modelling for Data Managers

Enhancing Semantic Clarity in Data Modelling for Data Managers

Executive Summary?

This webinar highlights the critical challenges and opportunities in business vocabulary development, emphasising the need for clear business definitions within organisations to enhance Data Management. Howard Diesel addresses the management of business terminology in Data Modelling, the importance of semantic clarity, and distinguishes between frameworks such as DCAM and DMBoK.

Additionally, he underscores the significance of Reference and Master Data, the role of data stewardship, and the interplay between ontology, taxonomies, and semantic clarity in data interpretation and integration. The webinar also points to the necessity of addressing knowledge gaps, implementing normative guidelines, and ensuring Data Quality as essential components for effective knowledge management in the evolving landscape of artificial intelligence, machine learning, and Data Governance.?


The Challenges of Business Vocabulary Development?

Howard Diesel opens the webinar on the challenges of developing a cohesive Business Glossary from disparate terminologies used in banking and insurance. Additionally, he adds that challenges particularly during the integration process mandated by BCBS 239. Howard then shares his own experiences from multiple projects, highlighting a specific instance at the central bank where they were tasked with defining 900 lines of balance sheet terminology.?

Frustration and slow progress in developing the Business Glossary led to a revealing conversation with a colleague from the insurance side. A colleague explained to Howard that the reluctance to agree on definitions stemmed not only from fundamental differences but also from the significant work required to alter existing systems and terminologies across various insurance companies.?

Howard moved on to share insights gained from a course with Mark Atkins and Terry Smith, which focused on writing business definitions, and emphasised the importance of aligning data architecture with business architecture. He advocates for the critical role of information objects in developing a business architecture reference model, challenging the complexity and disagreements surrounding definitions such as "student" and "course."??

Figure 1 Semantic Clarity & Consistency?

Addressing the Issue of Clear Business Definitions in a Company?

An attendee shares their experience in a previous role where they encountered a crisis at a company struggling to define "churn." The sales and marketing departments had conflicting definitions, causing confusion. To resolve this, the attendee notes proposing the creation of two distinct terms: "sales churn" and "marketing churn." However, the CEO insisted on a single churn metric for the dashboard. After discussing the issue with the CEO and illustrating the value chain that included various departments, he acknowledged the need for clarity in each area.??

The attendee notes that after reaching an agreement with the CEO they faced another technical challenge due to only having one database field for churn. They then recommended creating separate fields for sales and marketing churn and ensuring proper integration for accurate reporting. This approach successfully addressed the problem, leading to satisfaction across the departments involved.?

Managing Business Terminology in Data Modelling?

The attendee then, shared on their experiences of managing inconsistent definitions across various organisations. This brought about the challenges of establishing a standardised naming convention for data terminology, particularly within ISO frameworks. They mentioned their experiences of managing inconsistent definitions across various organisations.??

The complexity of integrating qualifiers—such as marketing and sales—into terms like "churn" while maintaining clarity is emphasised. There was acknowledgement for the difficulties faced during a project that consolidated three affiliate databases, highlighting the ongoing struggle with Data Quality and definition consistency among diverse stakeholders. Howard returned to Mark and Terry's approach of forming communities or tribes to reach consensus on terminology as a successful strategy, reinforcing the importance of engagement and stakeholder management in overcoming these challenges.?

Semantic Clarity in the Organisations?

Howard moves on to focus on the significance of measuring semantic clarity within organisations, highlighting the existing challenges of semantic confusion. He notes what has been mentioned by the attendee that emphasised the need for a feedback mechanism to validate and score terminology and definitions.?

A key challenge posed to the group is the lack of established metrics for assessing semantic clarity, with a call for examples of attempts to quantify this aspect, such as rating it on a scale of 1 to 10. The importance of these measurements is reiterated, particularly in fostering a clearer understanding within the organisation.?

Figure 2 Semantic Clarity & Consistency Management
Figure 3 Why Semantic Clarity Matters?

Differences between DCAM and DMBoK?

There is ongoing confusion regarding the comparison between the DCAM (Data Management Capability Assessment Model) and the DMBoK (Data Management Body of Knowledge) maturity models. While both frameworks are designed to advance Data Governance practices, they differ significantly in their structure and target audiences. The DCAM is more suited for executive-level discussions, as it focuses on high-level metrics and strategic KPIs, making it less practical for implementation and management purposes at deeper operational levels. This distinction complicates direct comparisons, suggesting that they serve different needs in the Data Governance landscape.?

Figure 4 Examples of Semantic Confusion

?

Figure 5 Comparing DCAM and DMBoK?

The Importance of Reference and Master Data?

Howard shares on his recent completion of his DCAM training. He mentions a question from one of his students pertaining to the lack of recognition for Reference and Master Data in DCAM compared to DMBoK. Howard notes that the DMBoK does not provide a ranking of processes but includes a chapter on maturity models. In contrast, DCAM offers a structured approach with eight components, 39 capabilities, and 138 sub-capabilities for assessing an organisation's performance. This discrepancy leads to confusion, as some organisations mistakenly shift from DMBOK to DCAM or vice versa without recognising that DCAM leverages Data Governance, Data Quality, and Metadata fundamentals to assess Reference and Master Data capabilities. Howard notes that, ultimately, misunderstandings stemming from visualisations and diagrams contribute to these issues within the industry.?

Role and Responsibilities of Data Stewardship?

There has been confusion regarding the definitions of key roles in Data Management, particularly the terms "data steward" and "data owner." The DSBoK, developed by E-Learning Curve and CIMP, defines a data steward as a leadership position within a company that holds accountability. However, this definition has sparked debate, with some, including Veronica, suggesting that the term should refer more to subject matter experts or data owners—individuals who are accountable for decisions regarding data. This lack of consensus on terminology has created ongoing confusion within the organisation, highlighting one of the primary goals of DCAM, which was to establish a clear and unified understanding of these roles.?

Figure 6 Owner or Steward?
Figure 7 What is a Data Steward??
Figure 8 Data Stewardship?

The Differences and Concepts in Data Modelling and Machine Learning?

Howard moves on to the distinction between Data Models and machine learning models, highlighting the confusion that can arise from the use of the term "model" in various contexts. A Data Model serves as a blueprint or map to understand data structure, whereas a machine learning model provides a set of instructions or a recipe for making decisions. Additionally, he shares that the concept of normativity and the challenges in communication when vague terms are used, like giving directional instructions without clarity.?

To enhance semantic precision, Howard emphasises the need for controlled vocabularies, differentiating between terms such as glossaries, thesauri, enterprise Data Models, taxonomies, and ontologies, offering stronger semantics compared to enterprise Data Models. The goal is to improve clarity in definitions and the meaning of words by employing various expression methods.?

Figure 9 Why a Data Steward?
Figure 10 "Let's develop a MODEL"?
Figure 11 "Explain to a 5-year-old"?
Figure 12 Examples of Semantic Confusion

?

Figure 13 Normativity

?

Importance of Ontology and Semantic Clarity in Data Interpretation?

Enhancing inferences and reasoning through the integration of ontology offers added semantics and syntax for improved semantic clarity. Howard emphasises the importance of developing a Business Glossary and leveraging machine-readable ontologies to ensure information remains current and easily maintainable. He then highlights the necessity of using simple and precise language, providing contextualisation, and continuously validating definitions. Notably, the framework proposed by Mark Atkins and Terry Smith suggests writing business definitions as single, concise sentences that clarify concepts, such as defining an "employee" as a "person" under a classification rule. This structured approach facilitates clearer discussions and agreements, making it easier to build upon foundational statements rather than getting lost in complex paragraphs.?

Figure 14 Semantic Chain - Looking for More Meaning?
Figure 15 Semantic Clarity Can Be Achieved By

?

Importance of Inclusion in Business Rules?

The concept of exclusivity in defining objects within a business context is present in the potential roles of an employee who can simultaneously be a client and linked as a partner to another client. Howard emphasises the importance of establishing clear business rules regarding exclusivity to prevent ambiguity, suggesting that specific definitions and conditions should be documented to ensure adherence and avoid any potential loss of crucial information.?

Understanding the Six Friends?

Confusion can arise around defining the concept of "Premium" within a business context. This is why Howard highlights the importance of defining key terms through a framework called the "Six Friends." This framework analyses six different perspectives of the term, including interpretations such as "paid amount," "charge amount," and "specified amount." The variability in these definitions can lead to confusion and inconsistencies in calculations related to premiums and associated risks. Additionally, a single, clear reason for each definition is emphasised to avoid ambiguity. This approach aims to clarify semantic issues in business terminology, helping stakeholders understand and navigate the complexities in their communication.?

Figure 16 Term Development?
Figure 17 Find your 6 Friends

?

Building Data Models?

The process of conducting a Data Management Maturity Assessment emphasises the importance of understanding key terms and their relationships. Howard highlights the role of a Data Owner as a business steward accountable for a specific domain, detailing their responsibilities, such as ensuring data trustworthiness. He shares insights on building a Business Glossary that connects definitions to a Data Model, utilising graphs to visualise relationships and employing tools like OntoUML for converting UML documents into ontologies. This method facilitates the verification of semantic clarity within Data Models, supported by practical applications in higher education contexts. Additionally, Howard references Steve Hoberman's Data Modelling scorecard to assess dimensions like accuracy and completeness, showcasing a comprehensive approach to Data Management.?

Figure 18 Business Term: Premium

?

Figure 19 Dictionary: Data Management Maturity Assessment?
Figure 20 Data Owner Structured Definition?
Figure 21 Progressing through the Structured Definition?

Knowledge Audit, Normative Guidelines, and Decontamination in Language Processing?

To effectively enhance the Data Management processes, it is imperative to conduct a knowledge audit akin to a maturity assessment, establish normative guidelines, and validate the Business Glossary consistently. Creating robust conceptual and logical models will aid in assessing and populating this glossary, with enterprise industry Data Models serving as a valuable starting point. It is crucial to develop taxonomies while also distinguishing fact from hallucination—recognising that both large language models and human interpretations can contribute to such distortions. Strategy must include comprehensive training, measurement mechanisms, and a clear call to action to ensure these objectives are met.?

Figure 22 World Relationship (Graph)?

Understanding Knowledge Management and Addressing Knowledge Gaps?

A knowledge audit serves as a diagnostic framework in knowledge management, categorising organisational knowledge assets into four types: what we know (tacit knowledge converted to semantic knowledge). What we don’t know we know (tacit knowledge that exists but isn’t documented), what we know we don’t know (knowledge gaps identified from asset depletion), and what we don't know we don't know (knowledge threats that pose significant risks, often linked to overconfidence, as seen in the Dunning-Kruger effect). Understanding these categories enables organisations to develop a knowledge strategy to bridge gaps and safeguard critical knowledge from being lost, particularly when experts leave the organisation.?

Figure 23 Semantic Clarity & Consistency Management?
Figure 24 Knowledge Audits?
Figure 25 Diagnostic Framework for K-Audit?
Figure 26 Knowledge Assets, Acquisition, Seepage, and Threats?

Understanding the Importance and Implications of Normative Guidelines?

Normative guidelines serve as essential frameworks for appropriate behaviour and communication among adults, much like the basic rules we teach children, such as saying "please" and "thank you." These guidelines help individuals navigate complex social interactions and ensure fairness, consistency, and ethical conduct. It's important to recognise that language evolves over time; words may acquire new meanings or become inappropriate, highlighting the necessity for awareness of their usage. For instance, the introduction of preferred pronouns underscores the potential to offend when misused, while terms like "golliwog" are now recognised as derogatory and unacceptable. Ultimately, normative guidelines encourage responsible language use, fairness, and respectful engagement in society.?

Figure 27 Knowledge Strategy with Clarity?

Understanding and Defining Entities in Data Models?

The importance of aligning logical and conceptual models with intended meanings and relationships is particularly evident in Data Modelling. Howard shares that his past experiences in coding and the challenges faced in accurately defining entities. He expressed appreciation for collaborative efforts with project data and business students, noting the value of focusing on precise definitions. Howard admires the succinct, bullet-point approach taken by colleagues Mark Atkins and Terry Smith , which allows for clarity without unnecessary elaboration. It is stressed that every entity and business attribute within these models should be clearly defined in the context of the business environment to ensure the integrity and accuracy of the Data Model.?

Figure 28 Simple Explanation (5 Year Old)?
Figure 29 Normative Guidelines?
Figure 30 Data Model Guidance?

The Benefits of Ontology and Taxonomies in Data Management?

Taxonomies and ontologies serve as frameworks for organising information, with taxonomies focusing on grouping items—like sorting toys into categories—while ontologies explore the relationships between these groups. Taxonomies enhance clarity and reduce confusion by creating clear categories, akin to organising shoes efficiently. In contrast, ontologies delve deeper by illustrating how items across groups relate to one another, addressing questions such as the connection between a car and its components.??

Unlike static Data Models that struggle with dynamic relationships and temporal constraints, ontologies allow for expressiveness, reasoning, and inference, enabling a more comprehensive understanding of the elements and their interactions. This robustness in defining relationships draws from philosophical roots, providing a structured way to assess knowledge and identify gaps or inaccuracies in data representation.?

Figure 31 Taxonomy Explanation

?

Figure 32 Ontology Explanation?
Figure 33 Ontology Vs Data Model?

Application of Ontology and Data Modelling in Real-Time Systems?

Capturing temporal elements and state changes within ontologies expresses conditions and mathematical state changes that Data Models cannot. Tools like RefChecker are mentioned for detecting subtle inaccuracies by comparing outputs against a normative knowledge base, emphasising the importance of keeping knowledge updated. Additionally, OntoUML is introduced as a tool to transform UML models into JSON, which can then be imported into a knowledge graph for automated validation against enterprise normative guidelines. This process allows for the quick identification of semantic inconsistencies. Overall, ontologies offer powerful expressions for a range of concepts, including atomic events, hierarchy, associations, and various forms of commitment and delegation.?

Figure 34 Increasing Clarity?
Figure 35 Fact Vs Hallucination Detection?
Figure 36 Text Hallucination Tools?
Figure 37 Model Hallucination Tools?
Figure 38 Example of OntoUML?

Figure 39 Example Enhanced?

The Role of Ontologies and Semantic Clarity in Data Integration?

The focus on semantic clarity emphasises the importance of consistently refining terminology, glossaries, and normative guidelines within communities of practice. This ongoing collaboration aims to enhance communication and understanding of meaning, particularly in the context of evolving technologies and ontologies. Recognising the limitations of ontologies is crucial to avoid unrealistic expectations, as they cannot address every aspect of data integration. With the increasing volume and variety of real-time data, manual integration is no longer feasible; thus, leveraging Metadata-driven integration and transformations becomes essential. Utilising a canonical model helps establish standardised vocabulary, facilitating better Data Management and collaboration across diverse data providers.?

Figure 40 Call to Action?

Semantic Clarity and Business Definition Management?

The importance of semantic clarity within a global framework is highlighted by the challenges of achieving consistent terminology across 22 operating companies with varying laws and measurement practices. A notable aspect was the use of a shoe analogy to illustrate taxonomy, which could be further developed for ontology. An attendee notes that this was a project that involved creating a comprehensive glossary consisting of 280 terms, categorised into certified terms—precisely defined and universally applied—and uncertified terms, which had designated data owners accountable for their usage.??

A "parking lot" was established to collect all proposed terms and definitions from different areas, addressing discrepancies such as varying definitions of "churn" across departments. This initiative is part of a five-year program aimed at standardising language for improved business management and communication.?

Developing precise business definitions and terminology is imperative to mitigate risks and enhance clarity within an organisation. Howard highlights the need for structured approaches in defining terms to avoid contradictions that can arise from unclear language. Engaging business professionals in this process is crucial despite their busy schedules and the associated costs. Priority should be given to terms that present significant business value or risks, as exemplified by a bank that faced a 54 million Rand fine due to confusing terminology. This underscores the necessity of conducting knowledge audits to identify and address the most impactful areas.?

Figure 41 Measurement & Feedback?

Data Quality in Organizations?

Data Quality in semantic terms within an organisation is imperative. Key quality dimensions include ensuring that terminology is understandable and unambiguous and eliminates any potential misinterpretation. The focus should be on conciseness, brevity, precision, and completeness of definitions aligned with established standards. Organisations can automate the evaluation of term accuracy and relationships by assessing quality at the Data Model or ontology level rather than at a paragraph level. This approach allows for greater alignment, consistency, and clarity, ultimately laying a strong foundation for effective Data Management.?

The Challenges and Future of Data Management and Ontology?

The historical significance of Metadata and Ontology can be traced back to Aristotle in the 4th century BC, where he referred to them as the first philosophy in his work "Metaphysics." Despite advancements, modern challenges persist in ensuring precise and ethical use of data across diverse teams, as exemplified by a nuclear power project involving individuals from 70 countries and various cultures.?

Effective communication and specific terminology are crucial; without them, reliance on AI may increase due to human inaction. Ultimately, improving Data Quality is tied to change management and organisational culture, necessitating a focus on proper data categorisation and the need to enhance understanding of Ontology for better outcomes.?

The critical role of organisational culture and effective communication in driving change within companies is particularly evident in Data Governance. Howard highlights the importance of motivating employees by continuously reinforcing the consequences of past mistakes, such as significant fines, to prevent regression. It is then suggested that Data Governance professionals must actively educate and unify the team to foster a shared language and common goals.??

An attendee mentions their belief that Data Management will evolve into a concept of "meaning management" as the next frontier, acknowledging the need to adapt to advancements in technology while remaining mindful of past failures. Additionally, they acknowledge the diversity in perspectives on emerging technologies, with encouragement for open dialogue among participants.?

Artificial Intelligence, Machine Learning, and Data Governance?

The complexities of artificial intelligence (AI) and machine learning emphasise the need for clarity in definitions to avoid misunderstandings. Attendees and Howard acknowledge the current challenges arising from varied interpretations of terms within the industry and the risks of relying solely on AI for comprehension. Howard then highlights the significance of Ontology as a means to bridge understanding by incorporating business context alongside data relationships.??

The potential for utilising ontologies to enhance traditional Data Modelling is present in examples such as the use of automated development from ontological frameworks like the Financial Industry Business Ontology. Ultimately, the attendees and Howard agreed that

?

Max B?reb?ck

Enterprise architecture supports and enable business to be successful

3 天前

Howard, I cannot join today, but love to view the recording if that is possible. BR Max

Jolene du Plessis

Data Solutions Technical Lead at 4Sight

4 天前

Please share the recording

Yehia EL HOURI

Experienced Data Manager | MBA, PMP, CDMP | Expert in Data Governance, Business Intelligence & Project Management | Delivering Efficiency & Strategic Insights

5 天前

Fascinating how semantic clarity can simultaneously simplify and deepen our understanding of complex data relationships.?

回复

要查看或添加评论,请登录

Howard Diesel的更多文章