Building a Knowledge Graph for a Software Company
Using TopBraid to build a Knowledge Graph about TopBraid
Knowledge graphs can be used to represent information about any domain to support any processes. Many of TopQuadrant’s customers create knowledge graphs describing their data and domain landscape. For our own internal use we have designed a knowledge graph describing our product, TopBraid EDG. Our primary goal was to use that graph to improve our support and development processes. A secondary goal was to improve our product documentation.
Among others, this knowledge graph includes information about:
We have integrated this knowledge graph with various tools that we are using every day:
Let’s dive into how all this works.
The Knowledge Graph Ontology
We started the project by defining a simple Ontology using SHACL. This Ontology defines the types of assets that we want to represent in our knowledge graph, and what properties those assets can have.
The domain that we wanted to model is a software product. In our model a product consists of Components (or Features) that implement Capabilities. For example, the component “Graph Explorer Panel” implements the capability “Visualization”. We have introduced a couple of superclasses (see above) to represent shared characteristics of all knowledge concepts/assets. For example, each concept can be linked to instances of Person who are experts for a Component or Capability. As we will show, such information can be used to automatically assign Jira tickets to specific developers or support staff, or to send notifications when a Slack question about a certain topic was asked.
TopBraid EDG includes a comprehensive Ontology editor that makes defining such classes, shapes and properties easy. The screenshot below shows the class tree on the left and properties for the selected class Component on the right. The properties are arranged in (SHACL) property groups that are later used to lay out the forms to view and edit instances of these classes.
The Concepts in the Knowledge Graph
There are many ways to navigate information in TopBraid EDG. Among other UI panels, TopBraid offers an Assets Hierarchy panel that is ideal for displaying the composition of Components and their sub-components:
The hierarchy is rooted in a Component called TopBraid, breaks it down into Development and Product Components and then defines various high-level categories under the Product Components such as Graphs, Storage and User Interface. The UI components are roughly represented like the nesting of features in the user interface.
Definitions in the ontology drive a powerful UI for entering information about Components and other assets in the knowledge graph. Here is an example Component, the Graph Explorer Panel, as represented in the knowledge graph:
We are storing various properties for each Component, including links to related components and a dedicated owner.
TopBraid uses SHACL shapes to drive the user interface and to make sure that data conforms to the constraints that we want to enforce.
TopBraid includes comprehensive support for collaborative editing, including the ability to define workflows. This means that anyone can contribute their knowledge in a sandbox or branch, and these changes can get reviewed before being committed to the production copy.
Generating Documentation from the Graph
Obtaining immediate practical benefits is the best motivation for maintaining a knowledge graph. In our case, we wanted to improve our hand-written product documentation that became out of date and increasingly hard to maintain. We are now storing documentation snippets in the knowledge graph for each feature and let a script generate the documentation for each release.
Here is an example documentation text for the Graph Explorer Panel, written in the format that is used by our chosen documentation generation platform, Sphinx.
Note that such documentation snippets can reference other Components from the knowledge graph using the :ref:`...` syntax. Furthermore, the documentation generator can use the explicit links between components and capabilities to suggest relevant background articles or videos at the bottom of each help topic.
From version 7.8 onwards, our product includes context-sensitive help, so that users can click on an area of the screen to open the corresponding help page:
This was quite easy to implement using the knowledge graph. All we needed to do was to insert HTML data attributes into the user interface components, with those attributes matching the IDs (local names) of the assets in the knowledge graph. On mouse click, our code can easily find the corresponding help page because the help page generator was relying on these very same identifiers from the knowledge graph:
领英推荐
In general, we found that having unique and stable identifiers is one of the most important benefits of a company-wide knowledge graph. These identifiers really link everything together, ranging from the help system to Jira tickets to the source code and Slack messages. The identifiers can also be used to automatically verify that features mentioned in the help system still exist in the graph and vice versa.
Linking Source Code with the Knowledge Graph
Many components from our Knowledge Graph have links to the source code on GitHub. This makes it potentially easy for new developers to learn where and how a feature is implemented:
We are using a dedicated property to store these links, and they are updated periodically by scanning the source code files (mostly Java, JavaScript and Turtle files) for specific patterns. One of these patterns is to look for the help ids that the source code already contains to implement context-sensitive help. As a result, the ability to add links to the source code came basically for free. But developers can also place annotations into their Java files to establish these links.
Linking Jira Tickets with the Knowledge Graph
Like many software companies, we love using Jira to track and organize our work, bugs, feature requests and support questions from users. Jira includes a built-in field called components that can be used to tag Jira tickets with arbitrary named values. Jira also provides web services to generate these components. Using the built-in ADS JavaScript features of TopBraid, we have added a little script to TopBraid that generates one Jira component for each Component or Capability from the knowledge graph.
The script also sets the default assignee for Jira tickets, based on the owner triples stored in the knowledge graph or the owner of the super-component when no explicit owner has been set. As a result, when a new Jira ticket is created and a Component is selected, the default assignee of the ticket is automatically set to the Component’s owner.
Tagging Jira tickets with knowledge graph concepts also means that it becomes easy to find tickets for a given set of product features. The component information displayed in TopBraid EDG includes links to relevant Jira searches e.g., all Jira issues related to this component. These links are inferred in real time for each selected component using SHACL sh:values rules:
Overall, this handshake between Jira and the knowledge graph means that we can get better insights into which product features are causing most questions or bugs.
Using the Knowledge Graph and LLMs to Tag Slack Messages
Like many modern software companies, we have replaced most of our email communication with Slack. In addition to being a faster way of communicating, Slack provides extension points that can be used to install plugins. We have added a plugin that can be used to tag selected Slack messages with assets from the knowledge graph.
In the example below, a colleague has asked a question, mentioning terms like “taxonomy” and “Graph Explorer”:
For any such message, anyone in the team can open a dialog to select suitable tags, which then produce bot-generated “Tagged by” messages. The question above was tagged by myself to be about the Graph Explorer Panel, which is a Component in our knowledge graph.
For a selected message, a dialog opens where users can either manually select knowledge graph concepts or accept the suggestions that TopBraid produces:
Slack can interact with a TopBraid server to produce such lists of suggestions. TopBraid takes the text of the Slack message and sends it to an LLM-based index of the concepts in the knowledge graph. Since each of these concepts is linked to documentation texts, the system has a decent understanding of what each Component in the product is doing, and therefore typically produces good guesses about the topics of each Slack message.
Once a message has been tagged, the system also notifies the assigned experts via Slack, for example the owner of a Component in the product. This means that discussions can be routed to the most qualified people in the team, even if the person who asked a question doesn’t know who that might be.
The knowledge graph also gets notified when a Slack message has been tagged, and stores these messages as links. This means that for a given asset in the knowledge graph, users can quickly navigate to the relevant Slack discussions:
As an implementation note, TopBraid’s ADS JavaScript capability was used to implement the Slack integration. ADS scripts are stored with the Ontology and can query the knowledge graph, issue web service calls and also modify the knowledge graph.
Summary and Outlook
While we have just begun to use the knowledge graph describing our product, there have already been several benefits:
Like most RDF-based models, the knowledge graph we created is highly extensible and could grow into other areas of the business, including knowledge about customers and competitors.
The flexibility provided by TopBraid and its SHACL-based tooling as well as scripting and inferencing capabilities make building such solutions very easy. Everything that I have described above was developed with TopBraid out-of-the-box. Talk to us at TopQuadrant or send me a message if you want to solve similar use cases.
Software Development Consultant bei codecentric AG
1 年Awesome, thanks for sharing this. I had something like this in mind for several years to organize knowledge in companies, you actually realised it
Co-founder and CTO at AtomGraph
1 年Holger, have you considered exporting Slack data as RDF in bulk? https://atomgraph.com/products/octopus/
Lead Architect all things Information & Integrations
1 年Thank you for this post Holger, I will happily be sharing it to share the power and capabilities of knowledge graphs in combination with EDG. I can't think of a better example of 'data-driven development, integration and enablement' with the use of semantic technology. Your vision and accomplishments serve as a blueprint for our roadmap for our client where we use EDG as a major component in a data-fabric like landscape, integrating EDG with MongoDB and legacy on premise applications. Looking forward to the upcoming capabilities and please share more of these excellent posts in the future.
Data Governance | Knowledge Graphs | Architect | Engineer | All opinions are mine and do not reflect any of my employers position, unless stated.
1 年I'm happy that you shared this Holger Knublauch. This was a cool idea and project to showcase the capabilities.
-
1 年Very cool stuff! Maybe you'd like to check out a tool we're developing at SDSC to extract Github and Gitlab metadata out of repositories, and convert them to rdf graphs. More data to attach to people and projects! note: very much a work in progress, but we're actively working on it! https://github.com/SDSC-ORD/gimie