Data Nugget February 2024

Data Nugget February 2024

29 February 2024

We are back with a fresh episode of our data nugget newsletter. So grab a cup of coffee and read more about some interesting thoughts from the world of data management.?

First, we have a thought-provoking?book review on data governance. The second nugget sheds light on the security of data and its encryption.?Third, we have a thorough explanation of the difference between data warehouse and data lake.?And last but not least, we have the next podcast focusing on data skills for the future.

Happy reading!

Let's grow Data Nugget together. Forward it to a friend. They can sign up here to get a fresh version of Data Nugget on the last day of every month.


A Wake-up Call for Data Governance

Nugget by Winfried Adalbert Etzel

Traditional approaches to governance have fallen short. While the intention is to ensure control and accuracy, focus on rigid protocols has hindered the effective utilization of data assets. Traditional control and compliance-focused data governance forgot about one important thing?: ?the user. There is little to no value in data if it is not used. Laura Madsen’s book “Disrupting Data Governance: A Call to Action” explores the need for a paradigm shift in data governance, emphasizing the importance of balancing control with utilization to maximize the value derived from data.

Who is this book for?

“Radically democratizing access to data means that we have to trust each other. The data professionals have to recognize that the average end-user is just trying to get their job done. And the average end-user has to acknowledge that the data team cannot conceivably address every data or business nuance, especially without the context.”

This book is a call to action, and therefore, primarily addresses data governance professionals to take a step back and reevaluate their approach.

Inherent to this proposed shift in data governance is a focus on end users. Every end user in an organization that has interacted with their data department or data steward will happily read this book as an “I told you so”.

We bring out the best in people (and data) if we enable them to own their decisions. I think this might be one of the main messages in the book.

The ineffectiveness of traditional data governance approaches

“ […] like nature, users will find a way, in this case to the data. This fundamental truth requires leaders to take a fresh look at their data governance models and to be open to new approaches.”

Traditional data governance models have typically prioritized control over utilization. However, this emphasis on control often results in cumbersome processes that hinder innovation and data-driven decision-making. End users that need to work with data will find a way, and data professionals should rather adjust for this.

Moreover, the lack of clear definitions and objectives in governance efforts leads to disjointed initiatives, where the focus shifts towards correctness rather than understanding the nuances of data. And this is where data governance professionals lack an understanding of end-user needs. In particular, in Data Quality, there is a tendency to perfectionism that values getting it right over getting it used. I agree with Laura that data people should try to turn a tide, but rather channel it. I think known quality is much more valuable than talking about ‘good or bad’ quality: it gives the end user a choice to use data with a certain disclaimer.

Tying governance to tangible business value

A fundamental flaw in many data governance strategies is not to align with tangible business outcomes. While everyone agrees on the necessity of governing data, the degree to which it is implemented varies dramatically. To address this, it is essential to tie data governance directly to business value, ensuring that efforts are focused on activities that drive organizational success.?

And again a ‘Hurrah!’ For me, ?there is no reason to navigate data governance if it does not lead to business value. Though it has been traditionally tough to tie these together, looking at data from a business perspective can ensure a much more focused approach.

Recognizing data quality as a symptom

First of all, it is important to recognize that data quality and data governance are an inseparable couple and mutually reliant. The prevalence of “bad data” often comes from broken or misaligned processes rather than inherent flaws in the data itself. Viewing data quality issues as symptoms of underlying problems underscores the importance of allocating resources effectively to address root causes. Trust and democratizing access to data become critical components in this context and require collaboration between data professionals and end-users. In the end, data quality is about transparency.

Balancing promotion and protection

A key aspect of Laura’s disrupting approach to data governance is finding the balance between promoting data utilization and protecting data assets. Rather than solely focusing on control, governance should empower individuals to work with data while ensuring adequate safeguards are in place. So a yes to transparency as a basis for creating trust in data. And maybe this was one of the messages from Laura’s call to action that stuck with me and provided thorough context to predominant discussions on data democratization: If you lock data away, if you are not open about the challenges with your data, your end user will get frustrated, misinformed and ultimately not able to create value with data.?

Embracing change and collaboration

Successful data governance requires a shift towards agility and collaboration. Concepts like DataOps, which emphasize iterative improvements and cross-functional teamwork, offer a modern approach to governance. Prioritizing people and processes over technology is essential to achieving this change effectively. Lastly, effective communication, empathy, and change management skills are also crucial for data governance leaders to navigate the complexities of organizational dynamics.

Towards a culture of data utilization

Ultimately, the goal of data governance should be to foster a culture where data is utilized effectively to drive business outcomes. This requires a shift in mindset, moving away from strict control towards empowering individuals to leverage data for decision-making. By focusing on gradual improvements and continuous learning, organizations can unlock the full potential of their data assets.

My recommendation

Organizations need to rethink data governance and strike a balance between control and utilization. By aligning governance efforts with tangible business value, addressing root causes of data quality issues, and fostering a culture of collaboration and innovation, organizations can unleash the true potential of their data assets. Effective data governance is not about achieving perfection but about continuous improvement and adaptation of the data landscape to the business problems to solve.?

I certainly recommend reading this book by Laura Madsen as a wake-up call for data governance to focus away from chasing that unattainable perfect state towards what matters: giving data to the people to create value.


How to Prepare for Q-day?

Nugget by Isa Oxenaar

What is Q-day?

The security of data relies on encryption. Asymmetric Encryption, the most commonly used form of encryption, translates data into ciphertext so only users with a private decryption key can decrypt and use it later. Cyber defense reaches the deadline for a coming threat in encryption: quantum computing. This deadline is referred to as Q-day, the day that current algorithms will be vulnerable to quantum computing attacks. The currently used set of protocols, the public key cryptography protocols, contain algorithms, the RSA protocol for example, that have been used since 1977. A new set of algorithms, called Post-Quantum Cryptology, or PQC, will replace the outdated set of protocols to withstand future possible attacks.

Risk: Harvest now, decrypt later.

A risk posed by quantum computing is called HDNL or “Harvest now, decrypt later”. HDNL means that attackers harvest sensitive data with a long shelf life to save it for a time when it can be decrypted by quantum computers. Harvested data is lost data meaning that it can no longer be protected from future quantum attacks.

Driven by the recent release of the PQC standard by the National Institute of Standards and Technology (NIST), the recent developments by the European Telecommunications Standards Institute (ETSI) and the recent breakthrough in a quantum computing research by a Pentagon-funded team, it is predicted that the deadline for the transition to PQC has come closer than the previously estimated year 2035. It is therefore key to start migrating to the use of PCQ algorithms as soon as possible.

What is PQC exactly?

PQC is a new set of protocols or algorithms that cannot be broken by quantum computed attacks. Instead of being based on a binary system with 0′s and 1′s, quantum computing uses qubits. Qubits “exploit the ambiguous nature of subatomic particles to embody every possible value between 0 and 1”. The final output will be a 0 or 1, but before the output is given there are multiple in-between states of the qubit that are a percentage of 0 and 1, for example, 20% towards 0 and 80% towards 1. This creates more space for the large amount of variables needed for complex calculations. The qubit can be influenced until the final output is asked and therefore depends on the isolation of the qubit. It resolves into a more sensitive bit that can handle a larger amount of variables, which is useful for breaking an enemy code, a computation that uses a large number?of variables.

The qubit allows quantum computers to complete more complex calculations than classical computers. PQC algorithms are designed to withstand the heavier attacks of quantum computers. NIST recently closed the public comment period for the first three PQC algorithms. Two use lattice-based cryptography, an error-based method, and one uses hash-based cryptography, a hash-function-based method. The algorithms are planned for widespread use within a year.

How to start migrating to PQC?

There are several approaches to transitioning to PQC three of which are:

  • Implementing hybrid post-quantum solutions

Hybrid solutions can be used in conjunction with classical cryptography, benefits being the possibility to experiment with PQC quickly and having a double protective layer for the data: for future quantum as well as current threads.

  • Adopting a phased approach.

Organizations with complex infrastructures can undergo a phased transition with interim evaluation periods. This approach can lead to support from departments across the business because it minimizes downtime of affected systems and because lessons can be drawn from the evaluated periods for further implementation.

  • Complete migration in a single transition

An immediate migration suits smaller organizations and new initiatives. It also offers immediate protection against HNDL attacks, but difficulties may occur due to inadequate preparation. ?

2024 is a year in which more organizations will start to move towards PQC readiness. Businesses most likely need to identify external partnerships to find a suitable approach. Vendors can be categorized into three: large cloud service providers, traditional security vendors and specialized security vendors recently established. There is no one-size-fits-all solution for migrating to PQC, a tailored approach for each case has to be composed by the organization in collaboration with vendors.

Sources:?

https://www.cryptomathic.com/news-events/blog/pqc-and-how-organizations-are-preparing-for-the-quantum-security-era?

https://www.aliroquantum.com/blog/what-is-post-quantum-cryptography-pqc

https://www.sdxcentral.com/articles/analysis/deloitte-predicts-2024-will-be-a-breakthrough-year-for-post-quantum-cryptography/2023/12/


Data Warehouse Vs. Data Lake

Nugget by Gaurav Sood.

Data warehouse (DWH) has been the backbone of any data lead organization for decades now. DWH has been architectured and rearchitectured several times in this period, owing to the increase in the quantity and complexity of data in various organizations.

Of late we have started using Data Lake (DL) as another way of storing and processing data in large data lead organizations. Both DWH and DL are essentially similar in that they both store and process data, with their specific nuances.

What is a Data Lake?

A data lake is a centralized repository that ingests and stores large volumes of data in its original form. The data can then be processed and used as a basis for a variety of analytic needs. Due to its open, scalable architecture, a data lake can accommodate all types of data from any source, from structured (database tables, Excel sheets) to semi-structured (XML files, webpages) to unstructured (images, audio files, tweets), all without sacrificing fidelity. The data files are typically stored in staged zones—raw, cleansed, and curated—so that different types of users may use the data in its various forms to meet their needs. Data lakes provide core data consistency across a variety of applications, powering?big data analytics,?machine learning, predictive analytics, and other forms of intelligent action.

What is a Data Warehouse?

A data warehouse is your classic RDBMS kind of system. The structure or schema is modelled or predefined by business and product requirements that are curated, conformed, and optimized for SQL query operations. While a data lake holds data of all structure types, including raw and unprocessed data, a data warehouse stores data that has been treated and transformed with a specific purpose in mind, which can then be used to source analytic or operational reporting. This makes data warehouses ideal for producing more standardized forms of BI analysis, or for serving a business use case that has already been defined.

Data Lake use cases

Here are just a few use cases of how organizations across a range of industries use data lake platforms to optimize their growth:

  • Streaming media.?Subscription-based streaming companies collect and process insights on customer behavior, which they may use to improve their recommendation algorithm. ?
  • Finance.?Investment firms use the most up-to-date market data, which is collected and stored in real time to efficiently manage portfolio risks. ?
  • Healthcare.?Healthcare organizations rely on big data to improve the quality of care for patients. Hospitals use vast amounts of historical data to streamline patient pathways, resulting in better outcomes and reduced cost of care. ?
  • Omnichannel retailer.?Retailers use data lakes to capture and consolidate data that is coming in from multiple touchpoints, including mobile, social, chat, word-of-mouth, and in-person. ?
  • IoT.?Hardware sensors generate enormous amounts of semi-structured to unstructured data on the surrounding physical world. Data lakes provide a central repository for this information to live in for future analysis. ?
  • Digital supply chain.?Data lakes help manufacturers consolidate disparate warehousing data, including EDI systems, XML, and JSONs. ?
  • Sales.?Data scientists and sales engineers often build predictive models to help determine customer behavior and reduce overall churn.?

You can read more here.?


MetaDAMA?2#16: Data Skills for the Future

Nugget by Winfried Adalbert Etzel

"When technology evolves really fast, also the skills you need to hire for evolve really fast." ?

Within a rapidly changing environment?fueled by technology and great ideas, it can be hard to define a stable career path. So I brought in an expert on developing companies and building legacies. Pedram Birounvand has a background in quantum physics, data engineering, experience from Spotify and moved into private equity 6 years ago. Now Pedram started a new chapter in his career as the CEO of a startup working with data monetization.

Here are my key takeaways:

Data skills for?the future

  • As a leader in the data domain, you need to be a storyteller, to tell the story about the necessity of data, like quality or governance.
  • The job titles for Data Scientist, Data Engineer, etc. stayed consistent, but what we expect from someone with that job title changed greatly through the last years
  • Hard skills in data are not as important as they were for every company
  • Make sure you know?what you are optimizing for in your career Are you optimizing for flexibility??Self-employed consultant is best. Are you optimizing for building a legacy??Be an entrepreneur. Are you optimizing for leading people and seeing people grow??Become a line manager.
  • Do not become a manager?if your passion is within engineering. You will need to optimize your time for coaching people, not working on problem-solving as an engineer.
  • The technology of applying AI and ML becomes more and more simple and becomes more and more commoditized.
  • Do not hire Data Scientist to build models that you can buy out of the box.
  • Don’t hire Data Scientist if you need Data Analysts. They work entirely differently, and the work is not comparable.
  • If you hire a Data Scientist before having good Data Engineers, then the Data Scientist cannot create any value
  • To be successful as an engineer, you need to have a really transformative mindset.
  • You need to enjoy the learning process, if not focus on something else in the IT domain.
  • Adopt an agile mindset. Agile fundamentals are key to today's work life.
  • You need to embrace to be able to incubate things.?Build incubator squads as soon as a good idea pops up.

Recruitment

  • In a small company, you need to be much more flexible and broader in the way you tackle problems than in a larger company where you can be more specialized.
  • As a hiring manager, do not lean too much on the titles, but make sure you understand what you need in your company. This is key to writing a good ad and attracting the right talent
  • In a job advertisement, be specific: What does it mean to be a Data Engineer in the context of your business?
  • What is important for you as a company today?based on the trends coming?
  • Building code has become so much simpler. Do you still need developers who need to know all the details about a certain language?
  • Maybe a person who can be close to the business?and not so deep in programming can add more value.
  • You have to know what it is you are optimizing for.?If you have an extremely complicated technology stack you need deep knowledge, if not do not hire it.
  • In a recruitment process, focus on the soft skills of rapid learners who can adjust to new situations and have an interest in understanding your business use cases.
  • My interview secret: Share a whiteboard session with me. Try to figure out how self-sufficient a candidate can be in understanding how the business works and where to get relevant data Test how candidates react in uncomfortable situations with customers that are not always happy about results and solutions. Look for candidates that show resilience in new and uncomfortable situations.
  • Career models should be technology-agnostic.

You can listen to the podcast??here??or on any of the common streaming services (Apple Podcast, Google Podcast, Spotify,? etc.)?Note: The podcasts in our monthly newsletters are behind the actual airtime of the MetaDAMA podcast series.


Thank you for reading this edition of Data Nugget. We hope you liked it.

Data Nugget was delivered with a vision, zeal and courage from the editors and the collaborators.

You can visit our website here, or write us at [email protected].

I would love to hear your feedback and ideas.

Nazia Qureshi

Data Nugget Head Editor

Laura Madsen

Data Elder | Fractional CDO | Writer of Books | Speaker of Hard Truths | Advisor to Data & AI Teams | Alpha Disrupter | Proud GenXer | Drinker of Coffee | INTJ | Juggler of Motherhood, Wifedom, Entrepreneurship

9 个月

I'm honored to be included. Thank you!

Ray Morris

Chief Data Officer at COUNTRY Financial?

9 个月

Great content! Focusing on utilization and delivering business value is so important in a successful Data Governance program

Bjarte Tolleshaug

Teamlead | Power BI | CDMP | Data Governance Specialist | DAMA Norway |

9 个月

Great work on the nugget ???? Thanks Winfried Adalbert Etzel Isa Oxenaar Gaurav Sood

要查看或添加评论,请登录

社区洞察

其他会员也浏览了