Unstructured Wonderland: shaping potential with sustainability
"Welcome to Wonderland! Here AI-driven innovation turns unstructured data into pure gold: unlocking groundbreaking services, happy customers and profits that soar to new heights!"
This is more or less the tone of many communication campaigns flooding our feeds daily. While there’s some truth to it, what’s often left unsaid is how to proceed for long-term success, and that’s where the real focus should be.
The promise of unlocking value from unstructured data has never been closer, but let’s be clear: AI isn’t a magic wand. The journey from potential to actual value is filled with challenges.
Ready to explore how deep the rabbit hole really goes?
Structuring the unstructured
In our daily lives, habits and work, the world around us is inherently unstructured. There’s no need to list the countless file formats devised over the years to manage unstructured data - a concept dating back to Hadoop and the dawn of Big Data over a decade ago. It’s enough to observe that our reality is unstructured while we, as humans, strive to impose structure as a way to gain control.
We roll up our sleeves, working to fit everything into Excel spreadsheets or shaping and modeling it into database tables. Yet, the essence of life remains unstructured. Even time itself, a natural, inexhaustible flow tied to human experience, is something we attempt to organize into seconds, days, quarters and fiscal years.
Our relentless pursuit of structure, our effort to refine a reality that inherently resists refinement -preferring instead to flow freely, unbound and unstructured - inevitably leads to compromises and limitations. While this drive for precision and accuracy in what we define and constrain has its merits, it ultimately restricts us - typically by restricting its original shape or size.
Recent technological innovations finally can address the true value of unstructured data, but we must tread carefully. We cannot blindly trust the first convenient solution, assuming a tool designed for a specific use case can be universally applied to all contexts and scenarios. Moreover, we must not neglect to review the solution from a more systemic perspective. Otherwise we risk repeating the mistakes of the past 20-30 years in IT: overemphasizing applications at the expense of data, creating technological debt and developing solutions that are unsustainable in the long term, destined to collapse under their own weight.
To avoid this pitfall, I recommend adopting an approach that treats data as a product - yes, even unstructured data. In this article, we’ll explore the additional capabilities and requirements needed for this approach. But first, let’s take a step back to understand why this shift is necessary in the first place.
Free entry to Wonderland: the trap of PoC enthusiasm
AI has made significant progress in processing unstructured data, bringing us closer to uncovering insights hidden in messy formats. But relying on AI alone? Without a strong foundation in Information Architecture, it's like working with your hands tied, risking more harm than good in the long run.
PoCs often seem like innovation’s golden ticket - offering agility, quick wins, and a shiny demo to impress stakeholders. But too many PoCs are seen as the end goal, not the starting point. The real challenge comes when moving from PoC to a fully scaled, integrated solution. Why? Because agility in prototyping isn’t the same as resilience in deployment. Without a strategic roadmap, PoCs turn into tactical dead ends, leading to fragmented systems that are hard to maintain, evolve or secure.
Want a solution that works at scale? Stop treating the PoC like a trophy and start using it as a blueprint. Unstructured data isn’t just tough to process, it’s a challenge to manage sustainably. The only way to turn it into a strategic asset is by embedding it into a broader ecosystem with proper governance, lifecycle management and integration.
"Wake up, Neo": the reality of scaling solutions
Quoting an iconic scene from The Matrix - yes, the one that came out in 1999, 26 years ago! - it’s time to wake up from the illusion that powerful tools alone can hand you a ready-made solution. Sure, they can deliver results, but seamlessly integrating those results into your ecosystem - your infrastructure, regulations, processes, skills and data - is far from guaranteed. And if your goal is to do this sustainably and with a long-term vision, then you’d better settle in for a serious conversation..
Wake up! This is exactly what organizations need to hear when they think shiny tools, quick demos or even successful experiments are all it takes to unlock value.
Tools can bring you to the door of production, but they won’t walk you through it. It’s a careful dance of planning and alignment. Without it, shadow IT creeps in, bringing uncontrolled tactical solutions, technical debt and a nightmare of isolated systems.
The challenge is clear: Innovate, yes - but do it with an eye on sustainability. Otherwise, you’re just adding complexity instead of value.
So, how deep is this rabbit hole?
If unstructured data is more accessible than ever, why are the challenges still so daunting? Here’s a checklist of what it really takes to implement reliable, scalable and sustainable solutions:
Each of these points is a potential stumbling block. Addressing them demands a strategic approach grounded in maturity and balance.
Great, I see I've captured your attention. Surely, you've encountered or experienced at least one of these challenges, if not fallen victim to them. Now, let's figure out together how to overcome them.
The Data Product revolution
Here’s where the approach that treats data as a product enters the stage.
I won't explain it here, but I'll focus on how it can be applied to unstructured data and tackle its unique challenges. However, if you want a clearer understanding or to explore the topic further, I highly recommend this excellent book: Managing Data as a Product: Design and build data-product-centered socio-technical architectures by Andrea Gioia . It will clarify these aspects and provide many other valuable insights for data management, I guarantee it.
Returning to our main point, this approach, borrowed from structured data management, emphasizes treating data as a first-class citizen. This attention enhances business agility and democratization. Applying it to unstructured data unlocks new possibilities.
At its core, a Data Product for unstructured data includes:
领英推荐
This modular design doesn’t stop at the input and output ports described.
Need to combine structured and unstructured data? Add more ports.
Want to repurpose or share outputs? No problem.
The approach enables the use of multiple heterogeneous input and output ports, offering greater flexibility and versatility for solutions handling unstructured data. This allows for capabilities such as:
The real power lies in the reuse of assets and collaboration between Data Products: both input and output ports are designed to interact with other Data Products, as the true strength of this paradigm lies in the reuse of assets.
The design and functionality of a Data Product remain consistent regardless of the type of data it handles (both unstructured and structured), enabling the convergence of practices and tools for centralized and unified governance of all assets.
However, there are key differences..
This changes the game
Interacting with unstructured data requires specific tools and tailored approaches. The lifecycle of Data Products is governed and standardized through platform services. In addition to the traditional services for managing traditional Data Product, i.e. dealing with structured data, handling unstructured data calls for specialized services, including:
In both cases, shared platform services simplify the integration of necessary functionalities by abstracting their complexity, ensuring a clear separation from the underlying infrastructure. These services are employed during the Data Product's definition phase using a declarative approach.
At this point you still may be wondering how this is different from a tactical approach. You're served below.
In a tactical approach every application takes care of all the aspects, including those that should be centralized, shared and managed by a dedicated platform team. Moreover, it treats data and metadata as byproduct not as foundational ingredients that must be shipped to the asset. There are little or no shared services and changes to the tech adopted impacts all the applications - this doesn't imply necessarily completely changing a product but even changing its version tipically implies redeploying all the involved applications and, in the worst case, even changing the internal code. Take for example the case of an application that directly uses ChatGPT APIs: what if you need to change the endpoint called or the model type or service provider?
The data contract specifies references to ontology concepts, capturing the semantic elements that define the business context. This serves as the bridge between Data Products and knowledge, enabling autonomous yet centrally regulated access to semantics. This connection enriches unstructured input data with relevant context, essential for producing meaningful outputs.
Through platform services, Data Products interact with specific ontology elements. For example, they can dynamically retrieve the concept of a VAT number to locate its instances in unstructured documents. They can also extract ontology segments to enhance processing context - such as obtaining a document's lexical structure for precise analysis.
By integrating practices for structured and unstructured data, this approach establishes unified governance. Data Products adapt seamlessly, whether processing batch jobs, streaming in real time, or providing APIs for user interaction. The result is a flexible, modular, and scalable framework that maximizes value extraction.
The benefits of treating unstructured data through a Data Product lens should be clearer now:
Conclusions: building the future, not just the now
The promise of unlocking unstructured data’s value is no longer a dream, it’s a tangible opportunity. However, achieving long-term success requires more than quick wins or reliance on shiny tools. It demands a strategic shift in mindset and approach.
Treating unstructured data as a first-class citizen through the Data Product paradigm offers a clear path forward. By unifying practices for structured and unstructured data, fostering collaboration, and enabling modular, scalable solutions, this approach bridges the gap between innovation and sustainability.
But let’s be clear: success doesn’t come from shortcuts. It requires careful planning, robust governance and a commitment to designing systems that stand the test of time. Organizations must move beyond the trap of Proof-of-Concept enthusiasm and focus on solutions that deliver consistent, long-term value without creating unnecessary complexity or technical debt. The choice is simple: embrace a model centered on flexibility, reusability and integration, or risk repeating the mistakes of the past
In the end, the value of unstructured data lies not just in processing it but in making it a strategic asset. Are you ready to step up and take the leap? Or will you let Wonderland remain just that - a dream, seductively out of reach?
The future of data-driven innovation depends on the steps we take today. Let’s make them count.
From Mad Scientist to Tech Leader | Empowering Data Nerds to Excel & Lead | Guiding Tech Talent from Analysis to Leadership with Science-Driven Insights
1 个月Really like this reframing Pietro: “Treat unstructured data as a first-class citizen - equal to its structured sibling”. How do you usually tackle this cultural shift in organisations?
Information Management & AI Consultant | Ethical AI & IM | Data Strategy, Governance & Compliance | Ontology & Information Architecture | International Standards | RIMPA Global Ambassador
1 个月Ah, welcome to the data poor relation of the great big unregarded world of Information Management, with centuries of wisdom.
So true, Pietro La Torre! It's not just about fancy tools; it’s about the mindset and the right approach to make unstructured data truly valuable.