Data Nugget February 2024
Data Management Association Norway (DAMA)
Accelerating Data Management in Norway
29 February 2024
We are back with a fresh episode of our data nugget newsletter. So grab a cup of coffee and read more about some interesting thoughts from the world of data management.?
First, we have a thought-provoking?book review on data governance. The second nugget sheds light on the security of data and its encryption.?Third, we have a thorough explanation of the difference between data warehouse and data lake.?And last but not least, we have the next podcast focusing on data skills for the future.
Happy reading!
Let's grow Data Nugget together. Forward it to a friend. They can sign up here to get a fresh version of Data Nugget on the last day of every month.
A Wake-up Call for Data Governance
Nugget by Winfried Adalbert Etzel
Traditional approaches to governance have fallen short. While the intention is to ensure control and accuracy, focus on rigid protocols has hindered the effective utilization of data assets. Traditional control and compliance-focused data governance forgot about one important thing?: ?the user. There is little to no value in data if it is not used. Laura Madsen’s book “Disrupting Data Governance: A Call to Action” explores the need for a paradigm shift in data governance, emphasizing the importance of balancing control with utilization to maximize the value derived from data.
Who is this book for?
“Radically democratizing access to data means that we have to trust each other. The data professionals have to recognize that the average end-user is just trying to get their job done. And the average end-user has to acknowledge that the data team cannot conceivably address every data or business nuance, especially without the context.”
This book is a call to action, and therefore, primarily addresses data governance professionals to take a step back and reevaluate their approach.
Inherent to this proposed shift in data governance is a focus on end users. Every end user in an organization that has interacted with their data department or data steward will happily read this book as an “I told you so”.
We bring out the best in people (and data) if we enable them to own their decisions. I think this might be one of the main messages in the book.
The ineffectiveness of traditional data governance approaches
“ […] like nature, users will find a way, in this case to the data. This fundamental truth requires leaders to take a fresh look at their data governance models and to be open to new approaches.”
Traditional data governance models have typically prioritized control over utilization. However, this emphasis on control often results in cumbersome processes that hinder innovation and data-driven decision-making. End users that need to work with data will find a way, and data professionals should rather adjust for this.
Moreover, the lack of clear definitions and objectives in governance efforts leads to disjointed initiatives, where the focus shifts towards correctness rather than understanding the nuances of data. And this is where data governance professionals lack an understanding of end-user needs. In particular, in Data Quality, there is a tendency to perfectionism that values getting it right over getting it used. I agree with Laura that data people should try to turn a tide, but rather channel it. I think known quality is much more valuable than talking about ‘good or bad’ quality: it gives the end user a choice to use data with a certain disclaimer.
Tying governance to tangible business value
A fundamental flaw in many data governance strategies is not to align with tangible business outcomes. While everyone agrees on the necessity of governing data, the degree to which it is implemented varies dramatically. To address this, it is essential to tie data governance directly to business value, ensuring that efforts are focused on activities that drive organizational success.?
And again a ‘Hurrah!’ For me, ?there is no reason to navigate data governance if it does not lead to business value. Though it has been traditionally tough to tie these together, looking at data from a business perspective can ensure a much more focused approach.
Recognizing data quality as a symptom
First of all, it is important to recognize that data quality and data governance are an inseparable couple and mutually reliant. The prevalence of “bad data” often comes from broken or misaligned processes rather than inherent flaws in the data itself. Viewing data quality issues as symptoms of underlying problems underscores the importance of allocating resources effectively to address root causes. Trust and democratizing access to data become critical components in this context and require collaboration between data professionals and end-users. In the end, data quality is about transparency.
Balancing promotion and protection
A key aspect of Laura’s disrupting approach to data governance is finding the balance between promoting data utilization and protecting data assets. Rather than solely focusing on control, governance should empower individuals to work with data while ensuring adequate safeguards are in place. So a yes to transparency as a basis for creating trust in data. And maybe this was one of the messages from Laura’s call to action that stuck with me and provided thorough context to predominant discussions on data democratization: If you lock data away, if you are not open about the challenges with your data, your end user will get frustrated, misinformed and ultimately not able to create value with data.?
Embracing change and collaboration
Successful data governance requires a shift towards agility and collaboration. Concepts like DataOps, which emphasize iterative improvements and cross-functional teamwork, offer a modern approach to governance. Prioritizing people and processes over technology is essential to achieving this change effectively. Lastly, effective communication, empathy, and change management skills are also crucial for data governance leaders to navigate the complexities of organizational dynamics.
Towards a culture of data utilization
Ultimately, the goal of data governance should be to foster a culture where data is utilized effectively to drive business outcomes. This requires a shift in mindset, moving away from strict control towards empowering individuals to leverage data for decision-making. By focusing on gradual improvements and continuous learning, organizations can unlock the full potential of their data assets.
My recommendation
Organizations need to rethink data governance and strike a balance between control and utilization. By aligning governance efforts with tangible business value, addressing root causes of data quality issues, and fostering a culture of collaboration and innovation, organizations can unleash the true potential of their data assets. Effective data governance is not about achieving perfection but about continuous improvement and adaptation of the data landscape to the business problems to solve.?
I certainly recommend reading this book by Laura Madsen as a wake-up call for data governance to focus away from chasing that unattainable perfect state towards what matters: giving data to the people to create value.
How to Prepare for Q-day?
Nugget by Isa Oxenaar
What is Q-day?
The security of data relies on encryption. Asymmetric Encryption, the most commonly used form of encryption, translates data into ciphertext so only users with a private decryption key can decrypt and use it later. Cyber defense reaches the deadline for a coming threat in encryption: quantum computing. This deadline is referred to as Q-day, the day that current algorithms will be vulnerable to quantum computing attacks. The currently used set of protocols, the public key cryptography protocols, contain algorithms, the RSA protocol for example, that have been used since 1977. A new set of algorithms, called Post-Quantum Cryptology, or PQC, will replace the outdated set of protocols to withstand future possible attacks.
Risk: Harvest now, decrypt later.
A risk posed by quantum computing is called HDNL or “Harvest now, decrypt later”. HDNL means that attackers harvest sensitive data with a long shelf life to save it for a time when it can be decrypted by quantum computers. Harvested data is lost data meaning that it can no longer be protected from future quantum attacks.
Driven by the recent release of the PQC standard by the National Institute of Standards and Technology (NIST), the recent developments by the European Telecommunications Standards Institute (ETSI) and the recent breakthrough in a quantum computing research by a Pentagon-funded team, it is predicted that the deadline for the transition to PQC has come closer than the previously estimated year 2035. It is therefore key to start migrating to the use of PCQ algorithms as soon as possible.
What is PQC exactly?
PQC is a new set of protocols or algorithms that cannot be broken by quantum computed attacks. Instead of being based on a binary system with 0′s and 1′s, quantum computing uses qubits. Qubits “exploit the ambiguous nature of subatomic particles to embody every possible value between 0 and 1”. The final output will be a 0 or 1, but before the output is given there are multiple in-between states of the qubit that are a percentage of 0 and 1, for example, 20% towards 0 and 80% towards 1. This creates more space for the large amount of variables needed for complex calculations. The qubit can be influenced until the final output is asked and therefore depends on the isolation of the qubit. It resolves into a more sensitive bit that can handle a larger amount of variables, which is useful for breaking an enemy code, a computation that uses a large number?of variables.
The qubit allows quantum computers to complete more complex calculations than classical computers. PQC algorithms are designed to withstand the heavier attacks of quantum computers. NIST recently closed the public comment period for the first three PQC algorithms. Two use lattice-based cryptography, an error-based method, and one uses hash-based cryptography, a hash-function-based method. The algorithms are planned for widespread use within a year.
How to start migrating to PQC?
领英推荐
There are several approaches to transitioning to PQC three of which are:
Hybrid solutions can be used in conjunction with classical cryptography, benefits being the possibility to experiment with PQC quickly and having a double protective layer for the data: for future quantum as well as current threads.
Organizations with complex infrastructures can undergo a phased transition with interim evaluation periods. This approach can lead to support from departments across the business because it minimizes downtime of affected systems and because lessons can be drawn from the evaluated periods for further implementation.
An immediate migration suits smaller organizations and new initiatives. It also offers immediate protection against HNDL attacks, but difficulties may occur due to inadequate preparation. ?
2024 is a year in which more organizations will start to move towards PQC readiness. Businesses most likely need to identify external partnerships to find a suitable approach. Vendors can be categorized into three: large cloud service providers, traditional security vendors and specialized security vendors recently established. There is no one-size-fits-all solution for migrating to PQC, a tailored approach for each case has to be composed by the organization in collaboration with vendors.
Sources:?
Data Warehouse Vs. Data Lake
Nugget by Gaurav Sood.
Data warehouse (DWH) has been the backbone of any data lead organization for decades now. DWH has been architectured and rearchitectured several times in this period, owing to the increase in the quantity and complexity of data in various organizations.
Of late we have started using Data Lake (DL) as another way of storing and processing data in large data lead organizations. Both DWH and DL are essentially similar in that they both store and process data, with their specific nuances.
What is a Data Lake?
A data lake is a centralized repository that ingests and stores large volumes of data in its original form. The data can then be processed and used as a basis for a variety of analytic needs. Due to its open, scalable architecture, a data lake can accommodate all types of data from any source, from structured (database tables, Excel sheets) to semi-structured (XML files, webpages) to unstructured (images, audio files, tweets), all without sacrificing fidelity. The data files are typically stored in staged zones—raw, cleansed, and curated—so that different types of users may use the data in its various forms to meet their needs. Data lakes provide core data consistency across a variety of applications, powering?big data analytics,?machine learning, predictive analytics, and other forms of intelligent action.
What is a Data Warehouse?
A data warehouse is your classic RDBMS kind of system. The structure or schema is modelled or predefined by business and product requirements that are curated, conformed, and optimized for SQL query operations. While a data lake holds data of all structure types, including raw and unprocessed data, a data warehouse stores data that has been treated and transformed with a specific purpose in mind, which can then be used to source analytic or operational reporting. This makes data warehouses ideal for producing more standardized forms of BI analysis, or for serving a business use case that has already been defined.
Data Lake use cases
Here are just a few use cases of how organizations across a range of industries use data lake platforms to optimize their growth:
You can read more here.?
MetaDAMA?2#16: Data Skills for the Future
Nugget by Winfried Adalbert Etzel
"When technology evolves really fast, also the skills you need to hire for evolve really fast." ?
Within a rapidly changing environment?fueled by technology and great ideas, it can be hard to define a stable career path. So I brought in an expert on developing companies and building legacies. Pedram Birounvand has a background in quantum physics, data engineering, experience from Spotify and moved into private equity 6 years ago. Now Pedram started a new chapter in his career as the CEO of a startup working with data monetization.
Here are my key takeaways:
Data skills for?the future
Recruitment
You can listen to the podcast??here??or on any of the common streaming services (Apple Podcast, Google Podcast, Spotify,? etc.)?Note: The podcasts in our monthly newsletters are behind the actual airtime of the MetaDAMA podcast series.
Thank you for reading this edition of Data Nugget. We hope you liked it.
Data Nugget was delivered with a vision, zeal and courage from the editors and the collaborators.
You can visit our website here, or write us at [email protected].
I would love to hear your feedback and ideas.
Data Nugget Head Editor
Data Elder | Fractional CDO | Writer of Books | Speaker of Hard Truths | Advisor to Data & AI Teams | Alpha Disrupter | Proud GenXer | Drinker of Coffee | INTJ | Juggler of Motherhood, Wifedom, Entrepreneurship
9 个月I'm honored to be included. Thank you!
Chief Data Officer at COUNTRY Financial?
9 个月Great content! Focusing on utilization and delivering business value is so important in a successful Data Governance program
Teamlead | Power BI | CDMP | Data Governance Specialist | DAMA Norway |
9 个月Great work on the nugget ???? Thanks Winfried Adalbert Etzel Isa Oxenaar Gaurav Sood