Data Governance BIM & MDM
Hanabal Khaing
Senior Enterprise Data Modeler, Data Gov, CDO, CTO, Law & BRD to Multidimensional legal compliance real-time anti-fraud ERP Data Model, 60 SKILLSETS, $53,000,000+ annual, Bank/F500/Gov/AI Data fix $4 BILLION Min Deposit
Data Governance is the methodical macro and micro management of data storage and flow between countries and companies based on laws, between people and companies based on policies, and within a company based on business rules. Data Governance Lead Analysts also have the unfortunate job of telling business leaders things that no one likes to hear, but it will always be the factual truth backed by evidence and metrics.
The macro level of Data Governance fulfills requirements by law and policy compliance. Companies must comply, at the data level, with laws and regulations such as the Sarbanes–Oxley Act of 2002, which protects investors and public corporations from fraud by requiring transparency and accountability in accounting practices, Base I, which are deliberations from global central banks which sets minimum capital limits for banks, Base II, Base III, and CCAR which are laws and accords that provide guidelines for financial market stress testing, liquidity risk, and proactive decision making, which requires Business Information Modeling (BIM), combined with a business glossary. Data Governance also fulfills requirements for data privacy laws such as HIPAA, a health records privacy law, and GDPR, an EU law for the protection of personal data.
Data Governance also ensures compliance with laws that govern trade and manufacturing such as IMDS related laws, an international materials data system which was originally created for the automotive industry, but is also required for other industries as part of government regulation, but also manufacturing and supply chain management systems and systems that deal with hazardous materials. Data Governance also addresses laws such as Trade Facilitation and Trade Enforcement Act (TFTEA), which prohibits the import of goods created with enslaved labor or forced labor. Additionally, good Data Governance will prevent international boycotts of entire companies over social issues such as forced labor by proactively looking for violations of law and policy locally, and internationally. Many companies are unaware of forced labor and slavery in their supply chain until after an international boycott costs them hundreds of millions or even billions of dollars. Additionally, tarnishing the reputation of a brand can drive companies completely out of business long-term.
The micro level of Data Governance is the most ignored and under estimated part of Data Governance, despite the fact that there is no way to adequately perform the macro level of Data Governance without first mastering the micro level of Data Governance. Most companies pay millions or even billions of dollars in fines because their data quality is not good enough to comply with laws and court orders, especially since laws are constantly changing and a court can impose a data requirement at any time without notice.
For example, a court can ask for financial records from a bank in relation to a case that involves a client of a bank. If the bank cannot produce the records within the time-frame set by the court, the bank itself, not the defendant in the case, may have to pay millions of dollars in fines repeatedly until they produce the records. Additionally, banking regulation requires banks to monitor suspicious activity, such as deceased customers withdrawing money. Most banks do not even have a customer status data model dimension that tells them when their customers are alive, dead, or incapacitated. Nor do they have a big data system that automatically collects unstructured data such as public records, which include death certificates. At most banks, it is possible to use a social security number from a person who has been dead for years to open an account or take over an existing account with a socially engineered identity. At the very least, banks should maintain a global database for the deceased and criminally convicted to reduce fraud. Recently Deutsche bank was fined 150 million dollars due to transactions from the accounts of a diseased customer who was also a sex offender. The best policy is to "be ready for anything!"
The micro level of Data Governance ensures high data quality throughout the life-cycle of data. Data Governance uses ACID compliance, Atomicity, Consistency, Isolation, and Durability, to ensure high data quality, but also scale-ability in performance, and compatibility for analytics tools.
Atomicity on the macro level grantees that each transaction is treated as a single unique unit that either completely succeeds or completely fails with no partial writing of data or deletion of data. If the transaction fails, the data within the system is unchanged and unaffected. At the micro level, the data model level, the data that is stored should be atomic, the lowest level of detail. For example product data should contain a unique product identifier. In the field that holds that piece of data, no other data should exist. The common or marketing name of the product should not be in the same field as the unique identifier. Another example is storing product colors in an extended star schema design so that one product can be mapped to multiple colors for marketing and supply chain management purposes. Atomic data also applies to data modeling structure.
Consistency mandates that all data that goes into or out of a database complies with the rules of the database and data model. For example data must comply with primary key constraints, foreign key constraints, triggers, etc. to prevent data corruption. Many companies are unknowingly implementing non-ACID compliant columnar databases which are notorious for data corruption because they can bypass the data model design, which is based on business rules, and access data by column. The relational row based database was created to replace columnar databases, tabular databases, and other legacy technology decades ago, but the wisdom of the past has been lost at most companies. Many are seeing errors that were resolved in the 1970's come back like a ghost. The UNLOAD COPY Failure Data Corruption stl_load_errors MAXFILESIZE is common for columnar databases that bypass unique constraints and primary keys and rely on select distinct SQL queries. Data Governance policies should restrict the use of all technology that is not ACID compliant.
Isolation compliance mandates that multiple transactions can be executed on a database without interfering with each other. Since a company can have millions of employees and billions of customers, a data system must be able to provide services to all of them at the same time. For example, even if two people are reading and writing the same data record at the same time, the database should apply either pessimistic or optimistic locking automatically at a speed measured in nano-seconds. Pessimistic locking is rarely required, but prevents reading and writing of a record until the current person working on it has finished and committed the changes. Optimistic locking allows everyone to read a record while it's being modified in it's current state, then applies changes as they are committed, making the changes visible only after the commit operation is complete. Databases such as Hadoop do not have this capability.
Durability mandates that the data that has been committed to the system should be saved and should always still exist even if the hardware is powered off. Some systems, such as Oracle, PostgreSQL, and IBM DB2, take durability a step further with transaction logging and backup. If a data systems data storage fails, the transaction log can be executed on an older backup copy of the database to restore the data system back to it's state at the moment of failure, preventing any data loss. Obviously the DBA and Architect should be funded well enough to supply a separate and redundant storage system for transaction logs and not put everything on one system with a single point of failure. (HINT HINT) This should be an internal Data Governance policy.
The key to accomplishing real-world micro Data Governance is combining the phases of known international methodologies, analysis, design, and implementation, with the three types of data models, conceptual, logical, and physical, and the three frameworks of Business Information Modeling (BIM), theoretical framework, analytical framework, and conceptual framework, respectively. The analysis is the most important part of any project and requires the most time. If the analysis is skipped or incomplete, the project has a 93 percent chance of failure. Nobody can create anything without a plan and a plan requires accurate information. Only the most patient resources can complete an analysis, however patient resources are in short supply.
In the analysis phase, senior business analysts should create conceptual data models from the fields gathered in the meetings with business units. If a business analyst does not have conceptual data modeling skills, the project should be halted for training, which should take less than 24 hours. The conceptual data model can be a grouping of fields needed by the business units. The fields should be recorded using the business unit common name, the industry standard name, how the data is used, if the field is always required, whether or not is it dependent upon another field or business defined status, and most importantly how the data is uniquely identified. Again, this is the most critical point of the project. If it fails, the project fails. The business analyst should also understand the documented theoretical framework.
The theoretical framework should be included inside the business requirements document, which is also managed by a business analyst. The theoretical framework states what is possible. It may include estimates, goals, and risks. This phase should be a series of daily one or two hour meetings in which all questions and answers are captured with daily action items to be resolved in the next meeting. The questions and answers should be grouped by theoretical framework, business, unit, etc. If the business analyst is not trained to create business requirements documents, the project should be halted for more training, which should take less than 8 hours.
领英推荐
Once the business requirements document, analysis, conceptual data model, and theoretical frameworks are complete and documented, the project can move into the design phase. In the design phase the logical data model is created solely to capture business logic and encode it into the data model. The physical model in the next phase will use it to derive a model designed for performance and enforcement of business rules at the data level. The analytical frameworks will explain the logic behind the theoretical frameworks. One must ask the questions, "How do you know that?" and "How did you reach your conclusion?" to extract the logic from the framework. Projects may have multiple analytical frameworks and formulas.
For example, one analytical framework may be a return on investment formula that produces metrics. Another framework may show action versus inaction outcomes based on historical data. At this point in the project absolutely no code should have been written. Most teams make the mistake of writing code before the analysis and design phases are complete. If that happens, the project will have a 93 percent chance of failure. Furthermore, completing the analysis, design, and logical model, and analytical frameworks will enable a master data logical data model to be created which will lead to Master Data Management viability (MDM). If the theoretical framework is not viable, it should be discovered in this phase. If there is no data, logic, nor analytical framework that supports the theoretical framework, the project should be halted and reevaluated.
The final implementation phase of the project is not a deployment to production, it is the implementation of what was discovered in the analysis and design phases. The next major steps is for the Enterprise Data Modeler to create a physical data model. The physical data model can be deployed to a database. IDE tools can then connect to the database and generate well over 90 percent of the code required for applications. A working proof of concept prototype application can easily be created and deployed to development. The conceptual framework should indicate what is expected to be found. The functionality, performance, and viability of the proof of concept system, which includes the application, data model, and non-production test data can then be compared to the framework and scored accordingly.
If everything is approved and works as expected, an explicit Business Information Modeling (BIM) system should be maintained to display all of the data captured from the project. IBM Cognos can be connected to a physical BIM data model and used as a Business Information Modeling Dashboard. Cognos can also be used for KPI metrics and other business intelligence functions. If the project was an all inclusive project for ERP, supply chain, HR, CRM, etc., almost anything should be possible; legal compliance, rapid application development, performance, scale-ability, customer experience improvement, etc. The BIM will prevent future leadership from making the same mistakes as previous leadership teams. Wisdom and hard lessons are often forgotten over time.
The BIM records what happened, actual versus projections for decisions, projects, and return on investment. Since it also records questions and answers along with the aforementioned frameworks, it will show what people were thinking when they made critical decisions. The historical data and predictive analytics of a BIM enables proactive decision making and more effective management. If I had my way, I would halt every project and create a BIM first to use it as an enhancement to project management and Data Governance. A BIM system can make sure the understanding of data and decisions are globally consistent and globally understood. Companies have a tendency to throw away their Data Governance team after they have completed the first project. The BIM will show the futility of such an action by proving that it is cheaper and safer to keep such a team in place as a cost cutting measure and defense against the constant threat of poor data issues.
Did I mention that the threat of poor data is constant. As long as there are people, there will be some who create bad data like bandits and refuse to follow policies out of pure laziness. The Data Governance team is like the sheriff's department. The BIM helps develop long-term policies for Data Governance that are tailored for a specific company. Although most business rules can be encoded directly into the data models, the policies and rules must also distributed in document format.
Data Modelers, Data Architects, Data Analysts, and Data Scientists must also have absolute job protection. Since global fraud is currently over 5 trillion dollars annually, most companies and government organizations will have at least one group of people committing fraud. The last thing that group will want are people who can provide data governance with real-time analysis and long-term data integrity. Such groups will use anything means, including violence, ethnic job discrimination, sex discrimination, age discrimination, etc. to remove any competent data professional so they can continue to hide and commit fraud. Since there is a 78 percent chance that any Data Architect or other data professional will be female or some other type of minority, and most fraud, 85 percent, is committed by men, most of them white and under 35, job discriminations and internal corporate fraud go hand-in-hand.
The largest side effect is that the data quality will suffer, causing any company or government to lose approximately 33 percent of revenue due to bad data and inefficiencies in addition to the cost of the fraud. Bad data affects everything from demographics and marketing to supply chain and manufacturing. There are major companies that cannot even track their entire supply chain nor their retail partner locations due to bad data. Because most companies do not adequately protect their workforce from groups that commit fraud internally, they risk literal eventual bankruptcy for the entire company. No company can stay in business while constantly losing over 30 percent of revenue due to inefficiencies plus another 5 percent to 20 percent in fraud. Before payroll is even paid, half of the revenue could be lost to bad data related inefficiencies and fraud.
Job protection for data professionals should be part of every data governance policy. That policy should be enforced directly by the board of directors since, in many cases, the CEO and CFO are behind the majority of the fraud, especially if the CEO is not the original founder of the company. Additionally, honest and competent data professionals are hard to find and are always in short supply with over 800,000 data jobs constantly open in the US alone. Data Governance is no longer optional or a "nice-to-have" IT feature, it is required for survival.
Thank you for reading my article. May your data always have integrity.
Hanabal Thalen Khaing
Data Governance SVP, Enterprise Data Modeler