登录查看更多内容

Data Governance BIM & MDM

Hanabal Khaing

Senior Enterprise Data Modeler, Data Gov, CDO, CTO, Law & BRD to Multidimensional legal compliance real-time anti-fraud ERP Data Model, 60 SKILLSETS, $53,000,000+ annual, Bank/F500/Gov/AI Data fix $4 BILLION Min Deposit

发布日期: 2020年10月18日

Data Governance is the methodical macro and micro management of data storage and flow between countries and companies based on laws, between people and companies based on policies, and within a company based on business rules. Data Governance Lead Analysts also have the unfortunate job of telling business leaders things that no one likes to hear, but it will always be the factual truth backed by evidence and metrics.

The macro level of Data Governance fulfills requirements by law and policy compliance. Companies must comply, at the data level, with laws and regulations such as the Sarbanes–Oxley Act of 2002, which protects investors and public corporations from fraud by requiring transparency and accountability in accounting practices, Base I, which are deliberations from global central banks which sets minimum capital limits for banks, Base II, Base III, and CCAR which are laws and accords that provide guidelines for financial market stress testing, liquidity risk, and proactive decision making, which requires Business Information Modeling (BIM), combined with a business glossary. Data Governance also fulfills requirements for data privacy laws such as HIPAA, a health records privacy law, and GDPR, an EU law for the protection of personal data.

Data Governance also ensures compliance with laws that govern trade and manufacturing such as IMDS related laws, an international materials data system which was originally created for the automotive industry, but is also required for other industries as part of government regulation, but also manufacturing and supply chain management systems and systems that deal with hazardous materials. Data Governance also addresses laws such as Trade Facilitation and Trade Enforcement Act (TFTEA), which prohibits the import of goods created with enslaved labor or forced labor. Additionally, good Data Governance will prevent international boycotts of entire companies over social issues such as forced labor by proactively looking for violations of law and policy locally, and internationally. Many companies are unaware of forced labor and slavery in their supply chain until after an international boycott costs them hundreds of millions or even billions of dollars. Additionally, tarnishing the reputation of a brand can drive companies completely out of business long-term.

The micro level of Data Governance is the most ignored and under estimated part of Data Governance, despite the fact that there is no way to adequately perform the macro level of Data Governance without first mastering the micro level of Data Governance. Most companies pay millions or even billions of dollars in fines because their data quality is not good enough to comply with laws and court orders, especially since laws are constantly changing and a court can impose a data requirement at any time without notice.

For example, a court can ask for financial records from a bank in relation to a case that involves a client of a bank. If the bank cannot produce the records within the time-frame set by the court, the bank itself, not the defendant in the case, may have to pay millions of dollars in fines repeatedly until they produce the records. Additionally, banking regulation requires banks to monitor suspicious activity, such as deceased customers withdrawing money. Most banks do not even have a customer status data model dimension that tells them when their customers are alive, dead, or incapacitated. Nor do they have a big data system that automatically collects unstructured data such as public records, which include death certificates. At most banks, it is possible to use a social security number from a person who has been dead for years to open an account or take over an existing account with a socially engineered identity. At the very least, banks should maintain a global database for the deceased and criminally convicted to reduce fraud. Recently Deutsche bank was fined 150 million dollars due to transactions from the accounts of a diseased customer who was also a sex offender. The best policy is to "be ready for anything!"

The micro level of Data Governance ensures high data quality throughout the life-cycle of data. Data Governance uses ACID compliance, Atomicity, Consistency, Isolation, and Durability, to ensure high data quality, but also scale-ability in performance, and compatibility for analytics tools.

Atomicity on the macro level grantees that each transaction is treated as a single unique unit that either completely succeeds or completely fails with no partial writing of data or deletion of data. If the transaction fails, the data within the system is unchanged and unaffected. At the micro level, the data model level, the data that is stored should be atomic, the lowest level of detail. For example product data should contain a unique product identifier. In the field that holds that piece of data, no other data should exist. The common or marketing name of the product should not be in the same field as the unique identifier. Another example is storing product colors in an extended star schema design so that one product can be mapped to multiple colors for marketing and supply chain management purposes. Atomic data also applies to data modeling structure.

Consistency mandates that all data that goes into or out of a database complies with the rules of the database and data model. For example data must comply with primary key constraints, foreign key constraints, triggers, etc. to prevent data corruption. Many companies are unknowingly implementing non-ACID compliant columnar databases which are notorious for data corruption because they can bypass the data model design, which is based on business rules, and access data by column. The relational row based database was created to replace columnar databases, tabular databases, and other legacy technology decades ago, but the wisdom of the past has been lost at most companies. Many are seeing errors that were resolved in the 1970's come back like a ghost. The UNLOAD COPY Failure Data Corruption stl_load_errors MAXFILESIZE is common for columnar databases that bypass unique constraints and primary keys and rely on select distinct SQL queries. Data Governance policies should restrict the use of all technology that is not ACID compliant.

Isolation compliance mandates that multiple transactions can be executed on a database without interfering with each other. Since a company can have millions of employees and billions of customers, a data system must be able to provide services to all of them at the same time. For example, even if two people are reading and writing the same data record at the same time, the database should apply either pessimistic or optimistic locking automatically at a speed measured in nano-seconds. Pessimistic locking is rarely required, but prevents reading and writing of a record until the current person working on it has finished and committed the changes. Optimistic locking allows everyone to read a record while it's being modified in it's current state, then applies changes as they are committed, making the changes visible only after the commit operation is complete. Databases such as Hadoop do not have this capability.

Durability mandates that the data that has been committed to the system should be saved and should always still exist even if the hardware is powered off. Some systems, such as Oracle, PostgreSQL, and IBM DB2, take durability a step further with transaction logging and backup. If a data systems data storage fails, the transaction log can be executed on an older backup copy of the database to restore the data system back to it's state at the moment of failure, preventing any data loss. Obviously the DBA and Architect should be funded well enough to supply a separate and redundant storage system for transaction logs and not put everything on one system with a single point of failure. (HINT HINT) This should be an internal Data Governance policy.

The key to accomplishing real-world micro Data Governance is combining the phases of known international methodologies, analysis, design, and implementation, with the three types of data models, conceptual, logical, and physical, and the three frameworks of Business Information Modeling (BIM), theoretical framework, analytical framework, and conceptual framework, respectively. The analysis is the most important part of any project and requires the most time. If the analysis is skipped or incomplete, the project has a 93 percent chance of failure. Nobody can create anything without a plan and a plan requires accurate information. Only the most patient resources can complete an analysis, however patient resources are in short supply.

In the analysis phase, senior business analysts should create conceptual data models from the fields gathered in the meetings with business units. If a business analyst does not have conceptual data modeling skills, the project should be halted for training, which should take less than 24 hours. The conceptual data model can be a grouping of fields needed by the business units. The fields should be recorded using the business unit common name, the industry standard name, how the data is used, if the field is always required, whether or not is it dependent upon another field or business defined status, and most importantly how the data is uniquely identified. Again, this is the most critical point of the project. If it fails, the project fails. The business analyst should also understand the documented theoretical framework.

The theoretical framework should be included inside the business requirements document, which is also managed by a business analyst. The theoretical framework states what is possible. It may include estimates, goals, and risks. This phase should be a series of daily one or two hour meetings in which all questions and answers are captured with daily action items to be resolved in the next meeting. The questions and answers should be grouped by theoretical framework, business, unit, etc. If the business analyst is not trained to create business requirements documents, the project should be halted for more training, which should take less than 8 hours.

领英推荐

The Future is Big (Data): Opportunities & Challenges…

UnitedLex 1 年前

From Back Office to Boardroom: The New Era of Data…

ProcDNA 1 个月前

Why Data Governance is the Cornerstone of Modern…

Data Meaning 9 个月前

Once the business requirements document, analysis, conceptual data model, and theoretical frameworks are complete and documented, the project can move into the design phase. In the design phase the logical data model is created solely to capture business logic and encode it into the data model. The physical model in the next phase will use it to derive a model designed for performance and enforcement of business rules at the data level. The analytical frameworks will explain the logic behind the theoretical frameworks. One must ask the questions, "How do you know that?" and "How did you reach your conclusion?" to extract the logic from the framework. Projects may have multiple analytical frameworks and formulas.

For example, one analytical framework may be a return on investment formula that produces metrics. Another framework may show action versus inaction outcomes based on historical data. At this point in the project absolutely no code should have been written. Most teams make the mistake of writing code before the analysis and design phases are complete. If that happens, the project will have a 93 percent chance of failure. Furthermore, completing the analysis, design, and logical model, and analytical frameworks will enable a master data logical data model to be created which will lead to Master Data Management viability (MDM). If the theoretical framework is not viable, it should be discovered in this phase. If there is no data, logic, nor analytical framework that supports the theoretical framework, the project should be halted and reevaluated.

The final implementation phase of the project is not a deployment to production, it is the implementation of what was discovered in the analysis and design phases. The next major steps is for the Enterprise Data Modeler to create a physical data model. The physical data model can be deployed to a database. IDE tools can then connect to the database and generate well over 90 percent of the code required for applications. A working proof of concept prototype application can easily be created and deployed to development. The conceptual framework should indicate what is expected to be found. The functionality, performance, and viability of the proof of concept system, which includes the application, data model, and non-production test data can then be compared to the framework and scored accordingly.

If everything is approved and works as expected, an explicit Business Information Modeling (BIM) system should be maintained to display all of the data captured from the project. IBM Cognos can be connected to a physical BIM data model and used as a Business Information Modeling Dashboard. Cognos can also be used for KPI metrics and other business intelligence functions. If the project was an all inclusive project for ERP, supply chain, HR, CRM, etc., almost anything should be possible; legal compliance, rapid application development, performance, scale-ability, customer experience improvement, etc. The BIM will prevent future leadership from making the same mistakes as previous leadership teams. Wisdom and hard lessons are often forgotten over time.

The BIM records what happened, actual versus projections for decisions, projects, and return on investment. Since it also records questions and answers along with the aforementioned frameworks, it will show what people were thinking when they made critical decisions. The historical data and predictive analytics of a BIM enables proactive decision making and more effective management. If I had my way, I would halt every project and create a BIM first to use it as an enhancement to project management and Data Governance. A BIM system can make sure the understanding of data and decisions are globally consistent and globally understood. Companies have a tendency to throw away their Data Governance team after they have completed the first project. The BIM will show the futility of such an action by proving that it is cheaper and safer to keep such a team in place as a cost cutting measure and defense against the constant threat of poor data issues.

Did I mention that the threat of poor data is constant. As long as there are people, there will be some who create bad data like bandits and refuse to follow policies out of pure laziness. The Data Governance team is like the sheriff's department. The BIM helps develop long-term policies for Data Governance that are tailored for a specific company. Although most business rules can be encoded directly into the data models, the policies and rules must also distributed in document format.

Data Modelers, Data Architects, Data Analysts, and Data Scientists must also have absolute job protection. Since global fraud is currently over 5 trillion dollars annually, most companies and government organizations will have at least one group of people committing fraud. The last thing that group will want are people who can provide data governance with real-time analysis and long-term data integrity. Such groups will use anything means, including violence, ethnic job discrimination, sex discrimination, age discrimination, etc. to remove any competent data professional so they can continue to hide and commit fraud. Since there is a 78 percent chance that any Data Architect or other data professional will be female or some other type of minority, and most fraud, 85 percent, is committed by men, most of them white and under 35, job discriminations and internal corporate fraud go hand-in-hand.

The largest side effect is that the data quality will suffer, causing any company or government to lose approximately 33 percent of revenue due to bad data and inefficiencies in addition to the cost of the fraud. Bad data affects everything from demographics and marketing to supply chain and manufacturing. There are major companies that cannot even track their entire supply chain nor their retail partner locations due to bad data. Because most companies do not adequately protect their workforce from groups that commit fraud internally, they risk literal eventual bankruptcy for the entire company. No company can stay in business while constantly losing over 30 percent of revenue due to inefficiencies plus another 5 percent to 20 percent in fraud. Before payroll is even paid, half of the revenue could be lost to bad data related inefficiencies and fraud.

Job protection for data professionals should be part of every data governance policy. That policy should be enforced directly by the board of directors since, in many cases, the CEO and CFO are behind the majority of the fraud, especially if the CEO is not the original founder of the company. Additionally, honest and competent data professionals are hard to find and are always in short supply with over 800,000 data jobs constantly open in the US alone. Data Governance is no longer optional or a "nice-to-have" IT feature, it is required for survival.

Thank you for reading my article. May your data always have integrity.

Hanabal Thalen Khaing

Data Governance SVP, Enterprise Data Modeler

要查看或添加评论，请登录

Hanabal Khaing的更多文章

Complex and Correct vs Simple and Wrong

2024年9月5日

Complex and Correct vs Simple and Wrong

Eight years ago, a company hired me to fix a telecommunications system by updating the data model. The fix cost…
How People Steal Millions from Coworker 401K

2024年4月4日

How People Steal Millions from Coworker 401K

Have you ever taken clothes out of the dryer, matched up all the socks, but had one sock left over? How did that…
How People Steal a Million Dollars from the Data Modeling IT Budget

2024年3月25日

How People Steal a Million Dollars from the Data Modeling IT Budget

How Do Data Models Either Prevent or Enable IT Budget Theft Real, theft-deterrent Data models can only be created…

1 条评论
How to Spot a Fake Data Model

2024年3月15日

How to Spot a Fake Data Model

Why is the Data Modeler and your Data Model More Important than the CEO, all C-Level Staff, and the Board of Directors?…
The 30 Plus Skillsets of a Data Modeler

2021年4月27日

The 30 Plus Skillsets of a Data Modeler

The Major Skillsets of a Data Modeler The total skillset count is at minimum 36 and may exceed 60 total skillsets…
Why are over 800,000 Data Jobs Always Open?

2020年9月28日

Why are over 800,000 Data Jobs Always Open?

I could answer the question, "Why are 800,00 Data Jobs Always Open," with one sentence. MOST, not all, of the resources…
UNDERSTANDING SAP HANA

2020年9月26日

UNDERSTANDING SAP HANA

First I would like to note that SAP HANA, the platform, versus SAP HANA S/4, the replacement for the SAP ERP / SAP ECC…
Canonical Data Model Data Dictionary Mapping

2020年9月23日

Canonical Data Model Data Dictionary Mapping

The purpose of a canonical model is to provide inter-operability between multiple systems at the data level. Once the…
Asset Valuation Alert System for Real Estate & Securities Investments

2020年9月11日

Asset Valuation Alert System for Real Estate & Securities Investments

One of the most frequent requests I get as a data modeler is to integrate external unstructured "Big Data" with…
Serial Murder in Healthcare & FHIR

2020年9月10日

Serial Murder in Healthcare & FHIR

A Brief History of the Lack of FHIR Implementation FHIR stands for Fast Healthcare Interoperability Resources. One of…

See all articles

Data Governance BIM & MDM

Hanabal Khaing

Senior Enterprise Data Modeler, Data Gov, CDO, CTO, Law & BRD to Multidimensional legal compliance real-time anti-fraud ERP Data Model, 60 SKILLSETS, $53,000,000+ annual, Bank/F500/Gov/AI Data fix $4 BILLION Min Deposit

领英推荐

Hanabal Khaing的更多文章

社区洞察

其他会员也浏览了

Data Governance – Starter Kit

Harnessing Data Governance for Effective Business Intelligence Outcomes

Driving Data Governance Excellence with AI: A Symbiotic Relationship

The Evolution of Data Governance: Trends to Watch in 2023

How can I minimise resistance to governance policies?

Navigating new horizons: The imperative of rigorous data governance in the era of Microsoft CoPilot

The Importance of Data Governance

Data Stewardship: A Strategic Imperative in the Digital Age

Part 4 - A foray into Data Governance

Data Governance

领英推荐

Hanabal Khaing的更多文章

Complex and Correct vs Simple and Wrong

How People Steal Millions from Coworker 401K

How People Steal a Million Dollars from the Data Modeling IT Budget

How to Spot a Fake Data Model

The 30 Plus Skillsets of a Data Modeler

Why are over 800,000 Data Jobs Always Open?

UNDERSTANDING SAP HANA

Canonical Data Model Data Dictionary Mapping

Asset Valuation Alert System for Real Estate & Securities Investments

Serial Murder in Healthcare & FHIR

社区洞察

其他会员也浏览了

Data Governance – Starter Kit

Harnessing Data Governance for Effective Business Intelligence Outcomes

Driving Data Governance Excellence with AI: A Symbiotic Relationship

The Evolution of Data Governance: Trends to Watch in 2023

How can I minimise resistance to governance policies?

Navigating new horizons: The imperative of rigorous data governance in the era of Microsoft CoPilot

The Importance of Data Governance

Data Stewardship: A Strategic Imperative in the Digital Age

Part 4 - A foray into Data Governance

Data Governance