The Data Swamp Monster
Jose Almeida
?? Freelance Data Consultant/Advisor ?? Data Strategy | Data Governance | Data Quality | Master Data Management ?? Remote/Onsite Consulting Services in EMEA
Every forecast agrees that the data lake market will grow by at least two digits in the years to come. Largely driven by a shift towards cloud-based data platforms and by the increasing need to derive in-depth insights from larger volumes, and streamlined access to organizational data from departmental, mainframe, and legacy silos.
This fast-paced growth may imply that the existing data swamps will most likely grow in the same proportion.
An ocean of possibilities
Data Lakes hold an ocean of possibilities for organizations ready to face the emergence of new business innovations and new forms of competition, with constant advances in digitization, analytics, artificial intelligence, machine learning, internet of things or robotics - to enable the competitive edge, that makes the difference in an increasingly competitive business environment.
Data Lakes offers the capability to introduce cutting edge data initiatives and to capture vast amounts of data from unrelated sources - social, mobile, cloud applications, or IoT – enabling working with “raw” data in its native format, structured, semi structured, or unstructured.
Data Lakes allows breaking down the data silos, centralizing and consolidating the organization’s batch and streaming data assets into a complete and authoritative data store, enabling organizations to:
Having an unquestionable set of advantages, Data Lakes also have their own challenges and risks that an organization needs to tackle to avoid turning them into data swamps.
The Data Swamp
When organizations are faced with the inability to find, understand, and trust the data they need from their data lakes for business value and to gain a competitive edge – They are facing a Data Swamp.
As the data “hoarding” develops, so the risks of falling into the murky waters of what was initially a Data Lake, easily recognized by the lack of metadata, large volumes of irrelevant, incomplete or inconsistent data, absence of governance processes and nonexistent data quality strategy.
Bottom line:
Resulting in organizations that will not succeed exploring their data assets effectively to reach their strategic goals and create value.
领英推荐
Data that was initially a business asset becomes a business liability.
Data lake governance
It seems clear that the more data an organization has available, more easily it will be able to pursue its objectives, but it does not always work that way, especially if what happens is that data is simply being moved to the Data Lake without any Data Governance in place – ultimately replicating all the data silos in a single place.
Organizations that aim to effectively and efficiently maximize the value of data stored in data lakes, need to implement policy-driven processes that classify and identify what information is in the lake, why it’s there, what it means, who owns it, and who uses it, to ensure that high-quality data is available throughout the data's full life cycle.
If an organization wants to use the data stored in the data lake to inform their decisions, the data must be governed.
Understanding which organization’s strategic goals are underlying the Data Lake is essential. Clear, focused and business-oriented objectives, controlled through Data Governance processes are paramount for success during implementation.
Other aspects, also critical and directly depending on a robust Data Governance framework, will contribute directly to avoid some of the pitfalls that will lead inevitably to a Data Swamp, and some of the negative effects mentioned above:
The ability to successful implement a Data Governance plan in the Data Lake, and subsequently the creation of an efficient Data Lake, will bring clear benefits to the organization both from an operational and a management perspective:
Is it too late?
Data Governance is commonly seen as a long, painful, and expensive process, and it can be, and for an organization already facing the problem of having a Data Swamp, it seems to be an even larger challenge.
It doesn't have to be so.
Adopting a business-driven approach, aligned with business objectives, priorities, and needs, complemented with an agile implementation approach where the benefits of each governance initiative can be quickly apprehended by the stakeholders, will allow the organization to start regaining control over the Data Lake.
Founder, Business Development Executive (BDE) and CEO at The Action for Kommunity Development Foundation (TAKODEF)
2 年Jose, this is good work you are doing. Keeping each one of us updated and informed. Keep it up Jose.
Artificially Intelligent. Bringing together people, ideas, and data. I am because we are.
2 年If Data Governance sounds like too big of a lift, how about starting with a Data Lake Governance? #datalakegovernance #datalakeintelligence #datalakequality #unitedbydata
Managing Consultant
2 年I like your insight on the concepts of data lakes and data swamp.How can you relate these concepts to metaverse?