The Data Library

The Data Library

Power comes from the sharing of information. The ability to access, gather, and use intelligence to advance an individual’s goal is critical to the continued evolution of society. Books are a means with which an individual can attain such knowledge. Just as books are a conduit of interaction between the reader and the words, the data catalog is a channel between the data owner and data consumer. As we look to the logistical distribution of books in modern times, we can benefit from evaluating and leveraging much of these lessons to incorporate in our data governance approach.

The organized distribution of books has been in existence since the earliest time. The concept of a library could date back to Babylonian in the first half of 3rd Millenium BC or under the Han dynasty in 206 BC. In modern times, books have taken many forms, such as e-books, audio books and paperback. As the type of books evolved, the repository that held these books has changed. You may have a brick-and-mortar library, or you have a digital library that leverages AI for discovery using natural language. Just as a library is an organized repository for the community to access information by way of books, a data catalog is a means for which individuals can gather knowledge with data to achieve a particular task. As technology evolves or data grows, so does the requirements for managing and facilitating discovery of the knowledge being housed. The growth of data in various locations and forms requires new ways to consolidate, curate and distribute to the masses responsibly.

Imagine if you are writing a WWII research paper and you attempt to search a digital library with no catalog system in place. How could you possibly find the books you need in any reasonable amount of time? You couldn't. Data discovery is no different. Consumers of data need to be able to easily find and access healthy data to complete their tasks. Curating data allows for individuals to find the data they need using business concepts. Microsoft Purview enables organizations to seamlessly curate their data for easy discovery and use. The Purview Data Catalog is powered by Business Concepts to help categorize data for easier management and discovery of data.

Business concepts are the conceptual and logical layers that sit on top of the physical layer. Think of Business Domains as the various genres of books and in our case, 'History'. For data, Business Domains are the first point of federation across the organization to empower different groups of data experts to own their own destiny. Within the History genre, you might have specific categories of information, such as 'WWII'. This layer of categorization aligns to the business concept, Data Products which are logical groupings of related physical assets. Organizations can curate Data Products further with other business concepts like glossary terms, CDEs or OKRs.

Applying AI on top of this for managing and discovering this data through natural language. Just as a student may search using terms 'World War II' in the digital library, a data consumer could search 'revenue sales in the United States' in the data catalog. When you empower your data stewards to effectively curate your business concepts, you empower all your catalog consumers to quickly find the data they need. Ancient civilizations were using their own form of classification of knowledge and the modern construct of a cataloging system for books applies just as easily as data. As organizations align their Data Governance practice with the business, creating a cataloging system aligned to the business is essential. Applying these Business Concepts to your Data Catalog creates a seamless data management and discovery experience. Microsoft Purview provides a way to empower organizations and accelerate their responsible AI innovation. ?

Syed Zeeshan A.

Data & AI Expert @ Microsoft | Data Enthusiast | Ex-Deloitte | Ex-Hitachi

8 个月

Great article, Alex Posar. Loved the analogy about the data catalog is like a channel between the data owner and data consumer.

Natasha Scott

Demand Generation Leader | Digital Engagement | Inbound Lead Generation

8 个月

As someone who always feels at home in a library I love this analogy - it's beautifully simple! Also taking on board the points about how pre-defined data structures and models are becoming less relevant. This as true in master data management as it is in data governance. The relational database, with its rows and columns, is no longer representative of the highly interconnected nature of data, which is why we (and analyst firms like Gartner) are predicting a shift away from this traditional approach to Graph-based systems which can handle complex relationships and give a much more holistic view.

回复
Wolfgang Hackenbroch

Driving Data Excellence & Governance

9 个月

Libraries and their features are a good analogy for Data Governance and its artefacts, like the Data Catalog, Data Ownership, Data Classification etc. Nevertheless, it seems to me that there are also quite some differences.? Libraries have a somewhat "ancient" touch (at least to me, as in the wonderful picture of the post above). Nowadays, pre-defined data structures, classifications, etc. (still applicable in libraries) become less and less relevant for data management, in favor of searchability of the catalog, "self-service" structuring and crawlability. Think of google & ChatGPT, in comparison to yahoo back then... With AI around, this becomes even more important: what cannot be crawled (including metadata) will not be discovered, described, used. So, instead of a top-down structured library, we should be striving for a kind of bottom-up, self-organizing data estate, with the authors = data product owners publishing and describing their books = data products, to be found and used by everyone. All that supported more and more by AI, for discovery, auto-classification and natural language search. We are on this journey here, leveraging the power of Purview to generate worthwhile use cases for our data community.?

Well said, Alex Posar! Organization is key. Imagine a nicely labelled and categorized library on one side (just like the one in your post) and a pile of books thrown together on the other...that's the difference between a well-governed data estate with business concepts curated by business experts (or the librarians in this analogy), and a data swamp where realizing value from data becomes incredibly difficult.

David Lindop

Board Advisor | Non-Executive Director | NED | Fractional Chief Data Officer | AI Leader | Keynote Speaker

9 个月

Great analogy Alex Posar A lot of our historical data governance focus has been on security, privacy and protection. As new tools like #MicrosoftFabric enable us to federate data and make it accessible to a much wider audience, discoverability will become more critical then ever if we are to unlock the business value of A.I.

要查看或添加评论,请登录

Alex Posar的更多文章

  • Get 'data' healthy!

    Get 'data' healthy!

    As individuals enter the new year, they will often take time to reflect not the past year and create ‘goals’ for the…

    3 条评论
  • Cooking With Data

    Cooking With Data

    As many of you already know, governance is always on my mind, and I find myself comparing governance to everyday life…

    3 条评论
  • Governance as a Story

    Governance as a Story

    No matter the role, you are influencing others every day. It requires awareness, focus and practice.

    7 条评论
  • Purview Data Governance GA Rollout Begins!

    Purview Data Governance GA Rollout Begins!

    Our team is thrilled to announce our reimagined Data Governance solution is now in GA! This journey has been fueled by…

    11 条评论
  • The Courage of Data Governance

    The Courage of Data Governance

    I have been thinking about the importance of courage when navigating a Data Governance journey. Being part of…

    6 条评论
  • Data Governance for the Age of AI

    Data Governance for the Age of AI

    The era of generative AI has arrived offering the possibility of advanced innovation for every industry. At the same…

    9 条评论
  • Data Governance is a team sport.

    Data Governance is a team sport.

    Football is a sport that captivates millions of fans worldwide and the Super Bowl as its premiere event. Winning the…

    1 条评论