登录查看更多内容

A beautiful mind: turning complexity into value

Pietro La Torre

Data strategy??, management ????♂? and tales???

发布日期: 2024年11月5日

What sets a genius apart? It's the ability to create value - mainly new knowledge - from what everyone has access to: raw materials like data, natural phenomena or past theories. Add to that a rare talent for spotting hidden connections and finding patterns in chaos. In short, turning complexity into value and making it simple for others to use.

John Nash, a brilliant but misunderstood genius, dedicated his life to cracking the code of game theory, unfortunately at great personal cost. In today's data world, I think this role belongs to knowledge graphs: their expressiveness and flexibility allow us to shape knowledge in a fast and accessible way for both humans and applications. Thankfully, when it comes to knowledge graphs, we don’t need to sacrifice our sanity or spend decades in isolation to gain incredible insights.

Knowledge graphs are the game-changer in turning unstructured data into valuable assets, empowering AI to unlock actionable insights with precision.

The Rise of Knowledge Graphs?

Recently, there's been a surge of interest in knowledge graphs, driven by the need to make sense of complex data. To see their full value, it's important to understand their true purpose, what they actually are and what they bring to the table.?

This is especially true in today's landscape, where the boom of AI allows for more agile interactions with unstructured data. The main challenge lies in integrating general-purpose models with the local knowledge within organizations, which often exists only in the minds of their employees. This fragmented knowledge leads to significant limitations in knowledge management, as it is not shared, not accessible for automated processes and tipically lacks proper oversight.?

However, while it’s easy to hear buzzwords like knowledge graphs, ontologies and all the other fancy stuff, they’re just as quickly thrown around without much understanding. It's time to clear things up. Let's get to the point: define the basics, understand who's responsible for what and take a practical look at what's happening so we can make better decisions on where to go next. In this article, we will explore various solutions using a retail example to make things simpler.

Before starting a step-by-step journey to the core of the topic introducing various concepts, here’s a quick insight: what makes a graph a knowledge graph is the application of an organizing principle, also known as semantics, that clarifies its meaning.?

A knowledge graph focuses on making data itself smarter instead of embedding intelligence into every consumer application. This approach enhances knowledge reuse and minimizes duplication and inconsistencies. This is yet another example of the "shift-left" trend that's increasingly seen in data management. Consider, for instance, data mesh and data contracts, which aim to hold data producers accountable for its quality and value. In the context of knowledge graphs, this means moving semantics closer to the data itself, so that all its consumers can benefit from it.?

There are different ways to organize data in a graph, each suited to specific problems. These methods can be combined to address various needs. We’ll start with the basics, covering the origins and meaning of graphs and then explore how knowledge graphs can tackle more complex issues.?

Building blocks?

Graph theory dates back to the 18th century and Euler, who used it to calculate the shortest path?to let the emperor?cross a town across several islands while ensuring he crossed each?bridge only once.?His?model proved effective by simplifying the problem into nodes (the islands) and edges (the bridges). By the way, Euler's analysis concluded that it was impossible to complete the journey as it would require at least one island with an even number of bridges connecting it to the others, which did not exist.?

Since then, the model has evolved in many ways. Some models feature directed relationships, with a clear start and end node, while others use undirected relationships that simply connect nodes. Additionally, models like hypergraphs allow relationships to connect multiple nodes at once.?

Some graph models, like the property graph, let both nodes and relationships have properties - keys and values. For instance, node properties might include a product’s name, while relationship properties can record counts. The property graph model is the most popular for modern graphs and knowledge graphs.?

"Graphs" usually refers to basic models without explicit organizing principles, where semantics is handled by its consumers.

For example, take a store’s sales data. The corresponding dataset is usually extensive and changes over time, driven by customer purchasing patterns and the product catalog and can be similar to this one:?

This is an example of a graph in its most generic form. Unlike other models of organizing information, like tables, it offers a navigation-based perspective. However, its meaning may seem unclear, as do the potential answers it can provide.?

Consumer trying to figure out the hidden semantics of a primitive graph?

A primitive graph has this limitation: the knowledge of how to interpret its data is embedded in the algorithms that process it. In other words, someone has to explain how to read the data because there's no organizing principle to help make sense of it.?

For example, if I told you that?

nodes P are products?

nodes C are customers?

the connections represent purchases?

you'd feel more comfortable answering questions like: ?

What products did customer C 2 buy? ?

Which customers bought product P 1? ?

Consumer approaching a primitive graph after getting context information?

These are valuable questions in retail. Similarly, you could calculate a product's popularity by counting its connections or identify the customers making the most purchases.?

The lack of an organizing principle is a problem. If implicit knowledge is lost (it’s undocumented, key people leave or details fade over time), you'd have to reverse-engineer the instance to understand its meaning - a long and painful process with no guarantees. Unfortunately, this is often the case today.?

It’s far better to implement a solution where the data in the graph is made smarter by applying an organizing principle. The organizing principle acts as a contract between the provider and the consumers of a knowledge graph and reveals the latent and implicit knowledge, transforming the graph into a knowledge graph.?

Knowledge Graphs?

Knowledge graphs?are a specific type of graph with an emphasis on contextual understanding. Knowledge graphs are sets of facts that describe situations and their relationships in a human- and machine-understandable format.?

A good knowledge graph unifies and organizes underlying data, enabling reasoning without altering the data itself. Its primary goal is to offer clear guidance on how to interpret the data, regardless of its source or technology.?

Knowledge graphs use an?organizing principle?so that both people and software can?reason?about its underlying data: the organizing principle is a layer that adds context, it makes the data itself smarter, rather than locking away the tools to understand data inside application code. In turn this both simplifies systems and encourages broad reuse.?

Property Graph?

Property graphs utilize a foundational organizing principle, making them more practical and expressive. They support categories of nodes, types and directions of relationships and properties on both nodes and relationships. This structure enables them to be processed by applications and easily understood by individuals.?

This organizing principle makes a graph self-describing to a certain extent and is a clear first step toward making data smarter.?

This time, it's much clearer what information the model contains and certain types of analysis can be performed independently, even without domain knowledge, by leveraging the organizing principle of the model. For example, it's easy to identify all purchased cameras or the age of customers.?

While common and powerful, the property graph model is a relatively low-level organizing principle. It becomes even more valuable when combined with higher-order organizing principles, such as the taxonomies and ontologies.?

Taxonomies?

Assigning categories is useful, but the property graph lacks depth: it doesn't reveal the relationships between categories or show that some products can be considered equivalent due to their category. These types of relationships are invaluable for businesses to better organize catalogs, process orders, and meet customer needs.?

For example, how can I answer high-level questions like:

"What products can I use to take photos from 30 meters away at night?"

This requires smarter data to handle such cases and it is where we introduce a way to organize categories into hierarchies: Taxonomy.?

The hierarchy is built by connecting category nodes with a "subcategory_of" relationship.

Products can then be linked to the appropriate part of the taxonomy for classification. In practice, the categories that were originally key-value pairs in the property graph are now cataloged and organized within the taxonomy and then associated with product nodes. This allows for more advanced analysis, as these semantic aspects are now made explicit.?

This is just the beginning. The beauty of knowledge graphs lies in the ability to use multiple taxonomies simultaneously and associate them with the same graph to gain more depth. This works because classification in knowledge graphs is dynamic: new categories, their organization and their association with graph nodes are simply additional nodes and relationships.?

Using multiple categories makes the data richer and enables more advanced analyses.?

Property graph connected to multiple Taxonomies

But this is not the end of our options for organizing knowledge; there are still higher-order organizing principles we can use.?

Ontologies?

Ontologies are also classification schemes that describe the categories in a domain and the relationships between them. But ontologies are not restricted to just hierarchical structures: ?they allow for the definition of more complex types of relationships between categories, such as?part_of,?compatible_with or?depends_on. They also allow for the definition of hierarchies of relationships and for further characterization of relationships (transitive, symmetric, etc.).?

Following the instructions in an ontology, we can explore the categories in a domain not just vertically (hierarchically) but also horizontally, where we can address cross-cutting concern.?

For example, we could reason that a Sony Alpha A7 is a valid search result for a customer looking for a mirrorless camera because it belongs to the mirrorless camera category in the taxonomy. From the semantics of the?UPSELL?relationship defined in the ontology, we can reason that a Sony Alpha A7 should be recommended to customers who own a Sony Alpha A6000.?

In large, multi-department systems, ontologies can bridge the gap between different departmental taxonomies, ensuring seamless, semantically-aligned integration. By establishing cross-taxonomy equivalence (linking the same concepts across areas) we enable full traversal of the business domain, turning complexity into value.?

Overview of our retail example from property graph (bottom layer) to ontology (upper layer)?

Initially, we could determine how different electronics complement each other using a single ontology. However, with a cross-domain comprehensive ontology we can address broader needs.

Ontologies make knowledge actionable by enabling both humans and software to perform more complex tasks. For instance, in our retail example we could connect the product hierarchy to stock management data to enable recommendation of alternative products when an item is out of stock or suggest higher-margin products. ?

In summary, physical occurrences of data are increasingly associated with various properties and rules to extract value. The value generated comes from both the multiple facets that can be defined and the ability to evolve these assets over time, using what's needed when it's needed. All these different uses of graphs are not necessarily alternatives but can work together to provide a heterogeneous and enriched view of the data. A graph is just a set of nodes and edges, while a knowledge graph may look the same but is governed by precise rules dictated by the organization's needs.

What’s the best choice??

The choice of organizing principle for a specific case clearly depends on the end goal. Experience and common sense suggest adopting the principle of "just enough semantics". This means avoiding the temptation to create overly complex graphs or to map everything imaginable. Instead, focus on concrete, limited needs to prevent unnecessary complexity. From there, the graph can gradually expand.

Iteratively building knowledge graphs helps avoid the pitfalls of suffering for ontological perfection, ensuring they remain useful over time and deliver value early.?

Where to start??

Many established ontologies are available out there for enhancing interoperability or allowing for the reuse of existing public models. If you operate within a field that has such standards, it’s wise to consider fully adopting that model.?

When creating an organizational framework, you have different options. One is using natural language to describe it. It is easy to start but not machine-readable.?

A more formal approach is to use languages like RDF Schema, OWL or SKOS. These languages offer varying levels of expressiveness from basic categories to complex constructs involving relationships and classes.?Using standard ontology languages offers strong software support. Visual ontology editors can simplify the process, as manual creation can be challenging and error-prone. However, learning these languages requires time and resources, which should be weighed against their benefits.?

Successful organizations with knowledge graphs typically balance standards with flexibility. This allows them to add context quickly and adapt to changing business needs, reflecting the dynamic nature of modern enterprises.?

When choosing an approach, align it with your organization’s needs. Relying too much on standards can limit flexibility, while complete customization may lead to inefficiency. Find your best balance and make sure that your knowledge graph evolves with your business.?

Final thoughts

Knowledge graphs stand out as essential navigational tools and best candidates to enable businesses to turn huge amounts of unstructured data into actionable insights.?

But we need to ensure that they don’t become a huge mess of semantics. The key lies in purposeful design: focus on the specific needs of your organization, start small, iterating as your need evolve and you gain more experience.?

Moreover, knowledge graphs should not operate in a vacuum or by replacing existing systems but close to them. By embedding them within your organizational workflow, you transform them from abstract concepts into dynamic assets that drive real business value.?

Arthur Feriotti

From Mad Scientist to Tech Leader | Empowering Data Nerds to Excel & Lead | Guiding Tech Talent from Analysis to Leadership with Science-Driven Insights

4 个月

Gaurav D., worth checking the article and Pietro La Torre’s work

2 次回应

查看更多评论

要查看或添加评论，请登录

Pietro La Torre的更多文章

Lost in translation: data without context is a body without brain

2025年3月3日

Lost in translation: data without context is a body without brain

Data cut off from its meaning is like speaking your own language in a foreign land: you can talk, but no one…

14 条评论
Unstructured Wonderland: shaping potential with sustainability

2025年2月3日

Unstructured Wonderland: shaping potential with sustainability

"Welcome to Wonderland! Here AI-driven innovation turns unstructured data into pure gold: unlocking groundbreaking…

6 条评论
Legacy systems and where to find them

2025年1月2日

Legacy systems and where to find them

Buried in the background, legacy systems keep the wheels of modern businesses turning, sustaining core operations…

6 条评论
The matrix: keep your brain in the game

2024年12月3日

The matrix: keep your brain in the game

In 1999, The Matrix introduced a world where machines create a virtual reality to keep humans trapped, unaware of their…

2 条评论
Limitless

2024年10月1日

Limitless

"Limitless" is a 2011 movie starring Bradley Cooper and Robert De Niro, which tells the story of a writer whose life…
The dark side of the moon: the human touch in the age of data

2024年9月2日

The dark side of the moon: the human touch in the age of data

"The dark side of the moon" is one of Pink Floyd's most famous songs and a monumental success in music history. It’s…

5 条评论
The many faces of monoliths, the hidden giants in organizations

2024年7月4日

The many faces of monoliths, the hidden giants in organizations

Monoliths are giants, most of the times invisible, that silently shape many organizations. They represent complex…
Data Singularity: how all-inclusive data platforms are conquering the entire data landscape

2024年6月18日

Data Singularity: how all-inclusive data platforms are conquering the entire data landscape

By "data singularity", I would like to refer to a defined, controlled space brimming with all the essentials for…
Farewell day @ Snowflake Data Cloud Summit 2024

2024年6月7日

Farewell day @ Snowflake Data Cloud Summit 2024

Today marked the final day of the Data Cloud Summit in the wonderful city of San Francisco. The closing sessions…
Third day in the snow @ Snowflake Data Cloud Summit 2024

2024年6月6日

Third day in the snow @ Snowflake Data Cloud Summit 2024

After two days filled with countless announcements - I didn't even attempt to list them - and eager audiences, today…

2 条评论

See all articles

The Rise of Knowledge Graphs?

Building blocks?

Knowledge Graphs?

Property Graph?

Taxonomies?

Ontologies?

What’s the best choice??

Where to start??

Final thoughts

Pietro La Torre的更多文章

Lost in translation: data without context is a body without brain

Unstructured Wonderland: shaping potential with sustainability

Legacy systems and where to find them

The matrix: keep your brain in the game

Limitless

The dark side of the moon: the human touch in the age of data

The many faces of monoliths, the hidden giants in organizations

Data Singularity: how all-inclusive data platforms are conquering the entire data landscape

Farewell day @ Snowflake Data Cloud Summit 2024

Third day in the snow @ Snowflake Data Cloud Summit 2024