Scoping Knowledge Graphs
Building knowledge graphs is supposedly a huge and terrifying project, like fighting dragons or sending humans to Mars. I hear or see it time and time again:
Knowledge graphs are too difficult, too time consuming, and too expensive to build.?
Think for a second about the people who say this.? Mostly software engineers. They're thinking that a knowledge graph has to be enormous – like one of the handful of widely known graphs such as Google's or Wikidata's with billions of triples. And they know that the foundation of a knowledge graph is coherent semantics – scary stuff that they were not trained to do. Of course they'll push back:? this kind of project does not set engineers up for success. It's like asking librarians and philosophers to build bulletproof cybersecurity infrastructure.
The vast majority of companies will never build and never need a knowledge graph that covers everything from newts and nebulae to Nubians and nurses.?
So billions of triples is just a scary bedtime story that engineers will tell. Implicit message: Rely on Google or someone else to do it for us.?
And the vast majority of software engineers will never do the semantic analysis needed to ensure the conceptual integrity and coherence that make knowledge graphs so valuable.? So "it's too hard to do" is another scary bedtime story that engineers will tell. Implicit message:? Don't ask me to build stuff I don't know how to build.
But tech organizations are overflowing with [add your own adjective here] engineering managers who think (or expect) that engineers can do anything and everything. And there are precious few product managers who are familiar enough with knowledge graphs to frame knowledge graph building in terms that managers and engineers can grok and buy into.?
End result:? Companies don't implement crucial but unfamiliar tech, like knowledge graphs.
Here's a quick primer so you can get an initial idea of the scope and goals of a knowledge graph project.
Your knowledge graph captures your view of your business, not the universe.
A knowledge graph captures and stores a model of your specific business domain – not the entire universe! It's a domain model like those used in software architecture or more recent "digital twins". It includes the entities, features, and relations that are most important in your domain. You can add or omit them according to the way you see your business or domain – your vision is often part of your special sauce, so it's important to make it crystal clear.?
I think of your knowledge graph as a kind of encyclopedia that describes your business in terms (knowledge graphs) that algorithms can actually understand, not just store and print. For example, you might be in HR Tech and summarize your business like this:??
We match candidates to jobs using skills so we can optimize hiring.?
In this case, you can build your knowledge graph around match, candidate worker, job, skill, and hiring. This makes sense: you can't make much progress as a business in this field without a clear idea of what a good match is, which skills best describe a good candidate and an open job position, what a skill is, and how all this will help your clients to hire.? That's just 5 concepts to focus on for starters. Nailing down and aligning on actionable information about these concepts is more challenging and valuable than you might imagine.
Can you leverage open source knowledge graphs and ontologies?? Sure. But remember that your knowledge graph is there to capture how you think about your business.? An off-the-shelf resource will only rarely do that, so you have to be extra careful. On the other hand, comparing your domain model with a public knowledge graph will highlight how your thinking is different from the status quo.
领英推荐
A knowledge graph can be as big or small as you want it to be and still add value.
Of course you need to scope and prioritize your project. You want to start with the keystone concepts for your business and build a graph around those. Keystone concepts are the ones that your documents mention most frequently, the ones that have a huge impact on what your tech stack looks like, the ones that guide marketing and sales. They should show up in a one-sentence summary of your business, like the example above. I've written about knowledge graphs and skills as different keystone concepts, and others like Ideal Channel Partner, Next-Million-Customers Profile, Most-Valuable-Customer Profile, Sales this Quarter, etc. are equally important. Any organization has -- and needs to align on -- their keystone concepts across all teams – and with clients, as well.
As an example, at one point, manager (and its thousands (!) of variants) was one of the most common job titles on LinkedIn. Just structuring that one concept more systematically had a visible impact on search, analytics, and recommendations across the board.?
How much detail do you need for each concept? Just key information, not every conceivable feature. A knowledge graph describes each concept as an array of features and relations or a collection of triples or facts. You need enough explicit features both to make sure that concepts are distinct (they'll often overlap, which is OK, but no dupes allowed) and to include what's most important to know – the features that are most important to your way of thinking or to your business processes.? It's straightforward to add more detailed information (more concepts, features, or triples) incrementally as necessary.
Along the way, you'll find that the same term (like manager) can cover two or more concepts (like people manager vs asset manager, which have very different skill sets) and you'll see lots of synonyms (like supervisor or coordinator). These are all important to distinguish:? the key idea is to have only one meaning for each concept, along with a list of synonyms.
How will a knowledge graph add value?
Who does what?
Don't ask engineers to build a knowledge graph. Instead, get the right people for the job. For the content of the graph – the coherent knowledge part –, you need analytic linguists and people with experience building ontologies to create a knowledge graph worth having.??The engineers will be busy enough building the infrastructure to store, serve, and leverage the knowledge graph. And you will need guidance, too, because knowledge graphs are still very new to everyone.
Get your data scientists involved early and often. As you develop your keystone concepts, involve your data scientists and analytics people.? The best knowledge graphs are the ones that are supported by (and integrate) data, not separate from it.??
The triples that describe your most important concepts in your knowledge graph should match or map to the SELECT clauses that the data scientists use to extract and analyze your data.? Predicates in knowledge graph triples should map directly to queries or column labels in your databases, even if the wording is different. Mismatches between knowledge graph concepts and database entities should be welcomed as opportunities to align strategy with data. If a feature is important enough to appear in a query, include it in your knowledge graph.
Concepts, categories, and queries are basically the same thing:? collections of features. The best knowledge graphs have clear, explicit links between features in the graph and the granular data (in different silos) that you already have.? This is how to realize the benefits of one of the key superpowers of knowledge graphs: data integration.?
Go for it!
Building a knowledge graph is new and unfamiliar, yes. But it doesn't have to be huge or terrifying. With a bit of guidance and the right talent, you can make it happen.
Senior analyst building a second brain
1 年Thank you Mike! ?? Very interesting how it scales to an enterprise level.At the personal level, to my understanding, even perfectly linked pieces of knowledge deliver very little value.I can refer to the feedback from the Obsidian.md Personal Knowledge Management tool users that I also share, that notes graph is the most impressive and surprisingly most useless feature of the tool ?? For manual research, it makes sense only for graphs that are 1-2 levels depth from a topic. Global graph is just a wonderful pic that looks great but impossible to explore. While technologies are moving forward, it would? be great to finally get enterprise level analysis tools for personal level knowledge lakes ??
Founder Proprietor at Knowledge Enabler Systems
1 年What is the definition of "knowledge" used here, please? How is it distinguished form "intelligence" natural or artificial.
Founder Proprietor at Knowledge Enabler Systems
1 年The structure of Node Link Node corresponding to Subject Predicate Object each having multiple attributes is well known and sufficiently expressive. At times the Predicate which links Subject and Object may itself need a link to another Object as in A killed B "with C (a dagger)". Is there any generic / standard way of Linking Predicate (kill) with its own Object or means or device C, without confusing it with Object B? I have for a long time tackled it by defining an attribute "using" or "byMeansOf" having a "value = dagger". Strictly, "using" or "byMeansOf" is NOT an attribute or inherent property of the Predicate Kill. It is actually a Predicate of Predicate P connecting with C as Kill is a Predicate of A, connecting B. My present practice violates the principle that "an object cannot be an attribute of another object or here, predicate. It (the new object) should be linked with another Predicate to the first Predicate". I feel that using a second Predicate to link the first Predicate with its own associated Object is structurally meaningful and consistent. Let me know other means of elegant modeling this common requirement. An other similar requirement is, P fell from Q on R.
--
1 年Thank you Mike. This is where I struggle with vector databases... they don't seem to bring together the human & the machine element.
--
1 年Very succinctly explained - and yes, #Incorvus agrees - that's why we say "No knowledge, no #AI"!