What is Data Governance? And why is it necessary especially now?
Pillars of Sagrada Familia Barcelona ?Mahtab Syed

What is Data Governance? And why is it necessary especially now?

With the advent of Machine Learning and Artificial Intelligence for Predictions (Business metrics like Inventory, Profitability, Customer Retention ) and Generative AI for Generation (human like response grounded on internal data) its key to have good Quality Data available almost real time. Most organisation are struggling with this as Data Governance was never setup well. I have a real example below.


A large bank with 30-40 source systems generating data, 10-20 date warehouses collecting data for analysis, and many non-integrated SaaS databases, in most cases the basic principles of Data Governance are not followed.

In this bank a new Data Analyst starts and is given the task of creating Reports of Taxable obligations for all customers (with a complex business logic), and is pointed to few Data warehouses – 1st which has Customer master data, 2nd which has all Transactions, 3rd where Taxable income is calculated, plus other sources like below

-??????CRM with master data of all Banks customers

-??????Loan system

-??????Credit card system

-??????Banking Transaction systems

-??????Tax calculation system

-??????Few Data warehouses which get feeds from above at a different frequency


Within few days of analysis its apparent that there is no documented source of Master Data, no Data Catalog, no Metadata, Data calculations not explained, multiple sources of same data, multiple columns with similar data and poor Data Quality. With this there is no confidence in the Reports and the Data Analyst gets frustrated and quits; and with her goes all the knowledge gathered.


Data Governance principles and operations can help here, and DAMA DMBOK (Data Management Body Of Knowledge) https://www.dama.org/cpages/body-of-knowledge ?identifies the following knowledge areas:

Data Management Knowledge Areas

  1. Data Architecture
  2. Data Modelling and Design
  3. Data Storage and Operations
  4. Data Security
  5. Data Integration and Interoperability
  6. Document and Content Management
  7. Reference and Master Data
  8. Data Warehousing and Business Intelligence
  9. Metadata
  10. Data Quality

Apart from these terms there are other jargons (which are possibly included in the above):

  1. Data Catalog
  2. Metadata
  3. Master Data Management
  4. Accountability
  5. Data Lineage
  6. Data Lifecycle Management
  7. Data Ethics and Privacy

?

For an Enterprise, getting up the Data Maturity Model and following principles and operations of Data Governance is not easy and can’t be done quickly. This is due to many reasons:

  • Business silos are built around application and Databases
  • Accrued work that is “owed” to an IT system (Technical debt) keeps growing and fixing that is never in the Budget
  • Data Business value ignorance amongst Executives
  • Setting up Data Governance is a multiyear program and needs collaboration across many silos
  • There is a Cultural change needed across the organisation

Here’s how can we slowly clean-up the above-mentioned Data mess in an enterprise by taking these steps, and get ROI over small investments at the same time…:

  1. Data Catalog – Start with an enterprise-wide Data Catalog. Can be a tool like Microsoft Purview or Databricks Unity Catalog
  2. Metadata – Collect data about data which explains Business Logic, Calculations, Data sources used as Master. Specifics can be collated in Business Metadata, Technical Metadata, Operational Metadata, Reference Metadata
  3. Data Accountability - With accountability defined for each Business unit, who has the final say on Data sources can be clarified, but nevertheless this effort should be a crowd sourcing effort from across the organisation
  4. Data Lineage – Define how to get an audit trail of Data’s evolution as it moves through various systems and workflows. Tools which manage this help users understand the origins and dependencies of the data, and where the Data is Transformed using which Business logic.
  5. Master Data Management – What are the business Entities and where is the Master record stored and if it gets updated does it reflect in every system which uses it immediately? Or worse does this get updated in other systems and we have many variations of the Master record?
  6. Data Quality – Agree with Business and Technology stakeholders on the acceptable deviation on Data Quality and with the help of Data Accountability control these. Data Quality has three main characteristics: Accuracy, Completeness and Timeliness
  7. Data Modelling and Design – Simplify the Data Models used in Databases with Master Data Management in mind.
  8. Data Lifecycle Management – Keep Data Lifecycle in mind in your Data Governance. It follows the stages of Creation -> Storage -> Usage and Enhancement -> Archival -> Destruction. For Regulatory compliance there are rules and timelines defined for Data storage, Archival and mandatory Deletion (“The right to forget”) with an audit control to prove this.
  9. Data Security, Ethics and Privacy – Apply Data Security using the “Least privilege for the least amount of time”. Manage Ethics and keep human Bias out of data. And pay special attention to Personally Identifiable Information which should be stored in separate, and secure way (encryption while transit and rest)
  10. Business value of Data - For this the 1st step is educating Business Stakeholders on the benefits of Data Maturity, how to get valuable insights by self-serve Analytics (which is possible when the Data Governance is streamlined), and how it can save from the scrutiny of Regulators.


A word of caution: Align Data Governance with Business initiatives - Don’t propose the value of data Governance on its own.

Approximately 90% of data governance programs struggle because they made a business case for Data Governance on its own without articulating how the program will support funded Business initiatives sponsored outside of the data team. Instead of proposing the value of Data Governance on its own, we need to work backward from Business initiatives.


Reach out if you need help?with Data Governance.


Some fundamentals:

  1. Data Engineering Lifecycle and Data Lifecycle are two different things and Data Lifecycle is the superset

No alt text provided for this image

Source: Fundamentals of Data Engineering by?Joe Reis ???and?Matthew Housley


?2. Data Lifecycle starts from Creation -> Storage ->Usage and Enhancement -> Archival?-> Destruction

No alt text provided for this image

Source: https://www.dataworks.ie/5-stages-in-the-data-management-lifecycle-process/

?

Acknowledgements:

  1. The opportunities to learn from large enterprises with poor Data Quality and spaghetti Data Engineering Lifecycle
  2. Fundamentals of Data Engineering by?Joe Reis ???and?Matthew Housley


Mahtab Syed, Melbourne, 26 Mar 2023, updated 25 Jun 2024

要查看或添加评论,请登录

Mahtab Syed的更多文章

  • AI Agents or Agentic Systems

    AI Agents or Agentic Systems

    In the new year 2025 we see everyone talking about “Agents” or Agent like systems called “Agentic Systems”. I recently…

    1 条评论
  • Develop your career in AI in 2025

    Develop your career in AI in 2025

    The hype of AI, especially in 2023 and continuing in 2024 and now in 2025, has created a supply of various courses. And…

    1 条评论
  • Generative AI - Learnings 2023

    Generative AI - Learnings 2023

    This year 2023 has been the year of Generative AI using Large Language Models both closed source and open source. Like…

    2 条评论
  • On Emotional Intelligence

    On Emotional Intelligence

    From my old archives - published on Tue 02 Nov 2010 in https://mahtabsyed.blogspot.

    1 条评论
  • Its end of year again… And I have no new year resolutions…

    Its end of year again… And I have no new year resolutions…

    Its 31 Dec 2022, an end of a year again… And I am quite happy and contented. ?? I have a clear vision of what I will do…

    3 条评论
  • Machine Learning Blog – 9

    Machine Learning Blog – 9

    Machine Learning using 3 ways - Full code vs. No Code vs.

    3 条评论
  • Winning with life which keeps throwing new challenges every day...

    Winning with life which keeps throwing new challenges every day...

    I had written this self care tip few months back which I thought its better to be published as an article..

    2 条评论
  • The Silence within

    The Silence within

    Its peak winter in Melbourne and early morning of Wed 29 May 2019, and so far it’s the coldest day this year. I am at…

  • This year 2021… was in the trenches of worries

    This year 2021… was in the trenches of worries

    This year 2021… was in the trenches of worries due to Covid lockdowns, number of daily cases, economic slowdown…

    1 条评论
  • Machine Learning Blog – 8

    Machine Learning Blog – 8

    Multi-Layer Stacking Ensemble and Optuna Hyperparameter Tuning In this blog I will illustrate and link to the code of a…

    1 条评论

社区洞察

其他会员也浏览了