Knowledge Graphs as Fancy Databases
Image by Freepik

Knowledge Graphs as Fancy Databases

Some friends have challenged me to describe knowledge graphs in even simpler terms than I have before.? This time, let me try to pitch it for CEOs who have a little extra time to catch up on tech over the holidays. (Feel free to tell me how well this post worked for you.)

Why are we talking about knowledge graphs? For one thing, most of your big competitors already have knowledge graphs at the heart of their businesses. Internet companies, eScience, and financial services have been working on them for years, sometimes calling them ontologies. Their graphs can be good, bad, flimsy, or robust, but your competitors have them and use them.? If you're reading this, then you're probably behind the curve. You need to decide soon whether this tech is hype or ripe.?

Your knowledge graph captures your view of your business. Knowledge graphs store not only way-too-micro data but broader knowledge as well. They contain a domain model like those used in software architecture or a digital twin like those used to model manufacturing processes or supply chains. They include the things, characteristics, and relations that are most important to your business.?

A knowledge graph can be as big or small as you want it to be and still add value. ?You can build one from your mission statement with a handful of keystone concepts and expand it into an encyclopedia with thousands of concepts – if you need to. The vast majority of companies will never build and never need a knowledge graph that covers everything from newts and nebulae to Nubians and nurses.?

How will a knowledge graph add value?

  • Alignment. By aligning across the business on what each keystone concept means and documenting that understanding in a knowledge graph, you improve communication across teams and increase shared understanding of goals.?
  • Data integration.? A knowledge graph helps you link existing data from different silos to make your existing data more valuable and more useful.?
  • Knowledge aggregation.? Knowledge graphs accumulate institutional knowledge and expert experience as they grow.?
  • AI. Knowledge graphs open the door to more reliable use of machine learning for search and recommendations and they are essential for accelerating ROI from generative AI.?

OK, so what is this graph thingy?

A graph stores and shows the different kinds of relations between different kinds of things. That's it.?

In a knowledge graph, the things are concepts of (or what we know about) individuals or categories and we store (then manipulate!) different kinds of relations between those concepts.?

The facts that we have about individuals and categories come in two basic types:? relations between things (the core of the graph) and characteristics of things (that we can usually count, measure, or label). Here are some examples:

  • dogs, subCategoryOf, mammals shows a specific relation (subCategoryOf) between two categories of things (dogs, mammals)
  • dogs, haveLegs, 4 shows a specific measurable characteristic of a category
  • Fido, instanceOf, dogs shows a specific relation between an individual and a category

It's important to remember that each of the concepts and relations have to be defined explicitly before the computer can understand and use them.?

What makes a graph a graph are the relations.

The same thing can be related in many ways to many other things, forming a network (or graph) of concepts. The characteristics are there to give us more information about each thing on its own. If there was only one kind of relation then we would have a tree or a list, not a graph.

Databases have lots of facts about characteristics of things (i.e., data) – usually in the same row – but can't easily store more complex relations between things (e.g., between rows) or track how one relation is linked to the next one, as in Fido > dogs > mammals. This linking of facts through relations is one kind of reasoning that we enable with knowledge graphs.

There are a few steps we need to take to make it possible for computers to store and manipulate facts by themselves rather than rely on us to do all the reasoning.

  • A simple, predictable format for facts is one key step. We often use triple format as in the examples above – each fact has three parts: subject, predicate, and object (thing, relation, thing). This way the computer doesn't have to wrestle with the (literally!) thousands of ways we can phrase facts in human languages. Wikidata (the knowledge graph behind Wikipedia) has literally billions of facts in the form of these triples.
  • Making meaning or concepts explicit is another key step.? Humans easily understand what dogs are; the facts about how dogs are related to other things like veterinarians and letter carriers and about the characteristics of dogs (breed, coat, coloring, …) are all in our minds. But to a mindless computer (even with some kinds of AI), dogs is just a sequence of letters, a meaningless string like Q144. Simply put, we need to unpack the concepts to teach the computer what we want it to know about things, what things are, what strings mean. The key role of a knowledge graph is to capture all this knowledge in a form that computers can use, not only as a reminder for people.
  • Minimizing synonyms and ambiguity is an important key step, too.? Many of the facts we have are available outside the knowledge graph in natural language sentences. But the words we use can have many meanings and there are lots of synonyms for each concept – computers have a hard time figuring out which is which – and people do, too.? So in knowledge graphs we store a single label for each concept, making the label unambiguous – and of course, we also accumulate the synonyms to help the system communicate with users. This way we can represent and compress thousands of sentences as one knowledge graph fact.
  • Another key step is to develop programs that can identify and manipulate neighborhoods and paths in this densely interconnected graph or network – structures that we can compare, classify, or combine.? These operations are how the computer "reasons" on top of a knowledge graph. For example, from a path like Fido > dogs > mammals, we want the program to conclude or infer that Fido has the characteristics (and many of the relations) of mammals – even though these facts about Fido are not actually stored anywhere. The ability to go beyond just retrieving stored information from memory (or from a database) is a key component of any intelligent system.

Confusions galore

Things get very confusing in the many cases where people have and present only strings in triple format (without making their meanings explicit).? Without other information to document the meaning of these strings in a computer-accessible form, this is a text graph and is similar to a language model, a model of how some strings are related to other strings, often regardless of their meaning.? Although text graphs can be very useful, they're far less useful for algorithms than knowledge graphs.

This confusion comes from the fact that when we provide tools or develop resources for humans, only the strings are really necessary – humans already have meanings for things and relations in their heads.? So translation tools, terminology tools, and most taxonomy tools rely on their expert human users for that crucial information – or display human-readable definitions (that algorithms can't process well) as reminders. These tools are made for humans so they're not very helpful for computers. Litmus test:? have the graph vendor point to where definitions are stored.? If the definitions are sentences for humans, then computers can't easily read them – so they're not knowledge-graph ready.

Getting fancy

Relational databases are good at storing, filtering, and sorting, but not so good at other tasks. They assume that the columns of a database table are independent and unrelated, even when that's not true in real life.??

One key motivation for using knowledge graphs is to overcome this limitation and make the relations between columns clear and explicit.? In knowledge graphs, database column labels become relations:? "first-class citizens" that we can relate, group together, and describe with attributes. The paths between relations/columns and the reliance on machine-accessible definitions enable knowledge graphs to do a wide range of useful things that run-of-the-mill databases cannot.

  • Knowledge graphs explicitly relate specific individuals to categories and categories to other categories, so they enable different kinds (and granularities) of meaning-based data aggregation and data integration across different data sources.? With this, you can slice and dice your data in more ways across more silos – without labels getting in the way.
  • Knowledge graphs enable a different kind of query expansion where we can search on the facts that make up a concept, not just on its label or synonyms. This is a key way to overcome the challenges of missing information, as well as jargon, terminology, and translation for language-independent semantic search.
  • Knowledge graphs allow us to follow paths – a chain of reasoning – from one concept to another to understand exactly how they are related.
  • Knowledge graphs allow us to flexibly build a definition of any target concept by following the many relations that link it to other things. They're self-defining. We can even control how granular the concept definition will be:? we can gather the concepts that are directly connected to a target (1-hop relations) or take it one or more steps further to find the additional concepts that are connected to those 1-hop relations, etc. – in as much detail as we need.
  • Knowledge graphs also allow us to identify neighborhoods of related concepts, create different types of neighborhoods, and describe degrees of similarity between concepts – so we can do fuzzy matching based on meaning, not just on strings.?

These things are all very difficult to do with regular relational databases but are standard functions of graph databases.

Of course there's much more to talk about, but knowledge graphs are the foundation – the? first and most important step in automating many kinds of reliable reasoning. Happy holidays!

Dhanush Kumar Selvaraj

Senior Software Engineer

1 年

That's an amazing way to explain Knowledge graphs! I've been working with graph databases and RDF

回复
Anthony Alcaraz

Senior AI/ML Strategist Startups & VC @AWS - Writing on AI/ML, analysis are my own ??

1 年
回复
Veronica Olazabal

Chief Impact Officer (Innovation, Integration, Inspiration)

1 年

Thanks - I now understand what a Knowledge Graph is so check! But why would a business need one - other than everyone is developing one? What value does it bring (in simple terms). What will a business be able to do with or not do without it?

Jonathan Abramson

Enabling enterprise translation with AI @ Lilt

1 年

Mike, I'm not a CEO, but regardless feel like I was the right audience for this post. I had to do some Googling, and GPT'ing before really diving in to orient myself with the subject matter, but once I did the details you provided were: a) fairly simple to absorb/understand, as you intended b) inspiring, in the sense they made me think harder about how I/sales people can or should use knowledge graphs. To clarify on b- does a CRM, in and of itself, constitute a knowledge graph? Thanks for the wisdom :)

Putcha Narasimham

Founder Proprietor at Knowledge Enabler Systems

1 年

Mike Dillinger, PhD, I agree. This is very good and "nearly" complete description of knowledge-graph or network. There is no need to refer to RDBMS and their restrictive relation tables. All natural languages have this simple Subject-Predicate-Object structure as in 3W specification, Resource Description Framework RDF. RDF is also restrictive in that the predicates have a single attribute while in reality there are many many predicates and each predicate has multiple attributes. I said "nearly" because a complete knowledge graphs has two parts. The first part is ontology, a network of "Concepts and Linking-Concepts" in which the axioms and rules of the domain of knowledge are generically specified. The second part is a network of "particulars" which correspond to instances of the "concept and linking concepts" of ontology. The first part provides a basis for the meaning and validity of the second part. We humans do not "formally" define and use ontologies but they are implied and essential. There are many simple and convincing examples which are easy to understand. I would be glad to share them via [email protected] Even RDF is restrictive. I have an extension to match NL expressivity. Let's discuss.

要查看或添加评论,请登录

Mike Dillinger, PhD的更多文章

社区洞察

其他会员也浏览了