Building a Knowledge Graph for a Software Company
TopQuadrant's current Knowledge Graph rendered using TopBraid's Graph Explorer panel

Building a Knowledge Graph for a Software Company

Using TopBraid to build a Knowledge Graph about TopBraid

Knowledge graphs can be used to represent information about any domain to support any processes. Many of TopQuadrant’s customers create knowledge graphs describing their data and domain landscape. For our own internal use we have designed a knowledge graph describing our product, TopBraid EDG. Our primary goal was to use that graph to improve our support and development processes. A secondary goal was to improve our product documentation.

Among others, this knowledge graph includes information about:

  • What Components (Features) exist in our product
  • What Capabilities these Components implement
  • Who in the company has expertise on these Components and Capabilities

We have integrated this knowledge graph with various tools that we are using every day:

Knowledge Graphs can serve as reference data to integrate various tools and data silos


  • Jira: Concepts from the knowledge graph can be linked to Jira issues
  • Slack: Messages can be tagged with knowledge graph concepts
  • GitHub: Components in the knowledge graph link to their source code
  • Sphinx: We generate parts of the product documentation from text stored in the knowledge graph
  • Pendo: We can use Pendo to track how the product features are used

Let’s dive into how all this works.

The Knowledge Graph Ontology

We started the project by defining a simple Ontology using SHACL. This Ontology defines the types of assets that we want to represent in our knowledge graph, and what properties those assets can have.

The main Classes and Properties from our Knowledge Graph

The domain that we wanted to model is a software product. In our model a product consists of Components (or Features) that implement Capabilities. For example, the component “Graph Explorer Panel” implements the capability “Visualization”. We have introduced a couple of superclasses (see above) to represent shared characteristics of all knowledge concepts/assets. For example, each concept can be linked to instances of Person who are experts for a Component or Capability. As we will show, such information can be used to automatically assign Jira tickets to specific developers or support staff, or to send notifications when a Slack question about a certain topic was asked.

TopBraid EDG includes a comprehensive Ontology editor that makes defining such classes, shapes and properties easy. The screenshot below shows the class tree on the left and properties for the selected class Component on the right. The properties are arranged in (SHACL) property groups that are later used to lay out the forms to view and edit instances of these classes.

TopBraid EDG includes comprehensive Ontology editing capabilities, optimized for SHACL

The Concepts in the Knowledge Graph

There are many ways to navigate information in TopBraid EDG. Among other UI panels, TopBraid offers an Assets Hierarchy panel that is ideal for displaying the composition of Components and their sub-components:

An except of the (Software Product) Components represented in the Knowledge Graph

The hierarchy is rooted in a Component called TopBraid, breaks it down into Development and Product Components and then defines various high-level categories under the Product Components such as Graphs, Storage and User Interface. The UI components are roughly represented like the nesting of features in the user interface.

Definitions in the ontology drive a powerful UI for entering information about Components and other assets in the knowledge graph. Here is an example Component, the Graph Explorer Panel, as represented in the knowledge graph:

An example Component from the Knowledge Graph, edit with TopBraid

We are storing various properties for each Component, including links to related components and a dedicated owner.

TopBraid uses SHACL shapes to drive the user interface and to make sure that data conforms to the constraints that we want to enforce.

TopBraid includes comprehensive support for collaborative editing, including the ability to define workflows. This means that anyone can contribute their knowledge in a sandbox or branch, and these changes can get reviewed before being committed to the production copy.

Generating Documentation from the Graph

Obtaining immediate practical benefits is the best motivation for maintaining a knowledge graph. In our case, we wanted to improve our hand-written product documentation that became out of date and increasingly hard to maintain. We are now storing documentation snippets in the knowledge graph for each feature and let a script generate the documentation for each release.

Here is an example documentation text for the Graph Explorer Panel, written in the format that is used by our chosen documentation generation platform, Sphinx.

Components can contain documentation snippets for context-sensitive help

Note that such documentation snippets can reference other Components from the knowledge graph using the :ref:`...` syntax. Furthermore, the documentation generator can use the explicit links between components and capabilities to suggest relevant background articles or videos at the bottom of each help topic.

From version 7.8 onwards, our product includes context-sensitive help, so that users can click on an area of the screen to open the corresponding help page:

In TopBraid you can click on an area of the screen to jump to context-sensitive help

This was quite easy to implement using the knowledge graph. All we needed to do was to insert HTML data attributes into the user interface components, with those attributes matching the IDs (local names) of the assets in the knowledge graph. On mouse click, our code can easily find the corresponding help page because the help page generator was relying on these very same identifiers from the knowledge graph:

An example help page as seen by end users, yet generated from the knowledge graph

In general, we found that having unique and stable identifiers is one of the most important benefits of a company-wide knowledge graph. These identifiers really link everything together, ranging from the help system to Jira tickets to the source code and Slack messages. The identifiers can also be used to automatically verify that features mentioned in the help system still exist in the graph and vice versa.

Linking Source Code with the Knowledge Graph

Many components from our Knowledge Graph have links to the source code on GitHub. This makes it potentially easy for new developers to learn where and how a feature is implemented:

There are links from Components in the knowledge graph to the source code on GitHub

We are using a dedicated property to store these links, and they are updated periodically by scanning the source code files (mostly Java, JavaScript and Turtle files) for specific patterns. One of these patterns is to look for the help ids that the source code already contains to implement context-sensitive help. As a result, the ability to add links to the source code came basically for free. But developers can also place annotations into their Java files to establish these links.

Linking Jira Tickets with the Knowledge Graph

Like many software companies, we love using Jira to track and organize our work, bugs, feature requests and support questions from users. Jira includes a built-in field called components that can be used to tag Jira tickets with arbitrary named values. Jira also provides web services to generate these components. Using the built-in ADS JavaScript features of TopBraid, we have added a little script to TopBraid that generates one Jira component for each Component or Capability from the knowledge graph.

We can use the identifiers from the Knowledge Graph to categorize our Jira tickets

The script also sets the default assignee for Jira tickets, based on the owner triples stored in the knowledge graph or the owner of the super-component when no explicit owner has been set. As a result, when a new Jira ticket is created and a Component is selected, the default assignee of the ticket is automatically set to the Component’s owner.

Tagging Jira tickets with knowledge graph concepts also means that it becomes easy to find tickets for a given set of product features. The component information displayed in TopBraid EDG includes links to relevant Jira searches e.g., all Jira issues related to this component. These links are inferred in real time for each selected component using SHACL sh:values rules:

TopBraid makes it easy to find Jira issues for specific assets in the knowledge graph

Overall, this handshake between Jira and the knowledge graph means that we can get better insights into which product features are causing most questions or bugs.

Using the Knowledge Graph and LLMs to Tag Slack Messages

Like many modern software companies, we have replaced most of our email communication with Slack. In addition to being a faster way of communicating, Slack provides extension points that can be used to install plugins. We have added a plugin that can be used to tag selected Slack messages with assets from the knowledge graph.

In the example below, a colleague has asked a question, mentioning terms like “taxonomy” and “Graph Explorer”:

A Slack discussion thread including an annotation to tag it with a knowledge graph topic

For any such message, anyone in the team can open a dialog to select suitable tags, which then produce bot-generated “Tagged by” messages. The question above was tagged by myself to be about the Graph Explorer Panel, which is a Component in our knowledge graph.

For a selected message, a dialog opens where users can either manually select knowledge graph concepts or accept the suggestions that TopBraid produces:

Our plugin for tagging Slack messages uses an LLM to produce suggestions

Slack can interact with a TopBraid server to produce such lists of suggestions. TopBraid takes the text of the Slack message and sends it to an LLM-based index of the concepts in the knowledge graph. Since each of these concepts is linked to documentation texts, the system has a decent understanding of what each Component in the product is doing, and therefore typically produces good guesses about the topics of each Slack message.

Once a message has been tagged, the system also notifies the assigned experts via Slack, for example the owner of a Component in the product. This means that discussions can be routed to the most qualified people in the team, even if the person who asked a question doesn’t know who that might be.

The knowledge graph also gets notified when a Slack message has been tagged, and stores these messages as links. This means that for a given asset in the knowledge graph, users can quickly navigate to the relevant Slack discussions:

Concepts in the Knowledge Graph can link to relevant Slack messages

As an implementation note, TopBraid’s ADS JavaScript capability was used to implement the Slack integration. ADS scripts are stored with the Ontology and can query the knowledge graph, issue web service calls and also modify the knowledge graph.

TopBraid's ADS Script features make it easy to implement interoperability with tools like Slack

Summary and Outlook

While we have just begun to use the knowledge graph describing our product, there have already been several benefits:

  • Creating the knowledge graph has, for the first time, brought some structure into our shared understanding of the product and its features.
  • We now have stable identifiers for each component or capability of the product.
  • These identifiers can be used as reference data to consistently link various tools together, including Jira, GitHub and Slack.
  • We now have a structured way to generate context-sensitive help for our product, making the help pages much more maintainable.
  • Overall it has become easier to identify expertise within the company.

Like most RDF-based models, the knowledge graph we created is highly extensible and could grow into other areas of the business, including knowledge about customers and competitors.

The flexibility provided by TopBraid and its SHACL-based tooling as well as scripting and inferencing capabilities make building such solutions very easy. Everything that I have described above was developed with TopBraid out-of-the-box. Talk to us at TopQuadrant or send me a message if you want to solve similar use cases.

Angelo Veltens

Software Development Consultant bei codecentric AG

1 年

Awesome, thanks for sharing this. I had something like this in mind for several years to organize knowledge in companies, you actually realised it

回复
Martynas Jusevi?ius

Co-founder and CTO at AtomGraph

1 年

Holger, have you considered exporting Slack data as RDF in bulk? https://atomgraph.com/products/octopus/

回复
Elvin Dechesne

Lead Architect all things Information & Integrations

1 年

Thank you for this post Holger, I will happily be sharing it to share the power and capabilities of knowledge graphs in combination with EDG. I can't think of a better example of 'data-driven development, integration and enablement' with the use of semantic technology. Your vision and accomplishments serve as a blueprint for our roadmap for our client where we use EDG as a major component in a data-fabric like landscape, integrating EDG with MongoDB and legacy on premise applications. Looking forward to the upcoming capabilities and please share more of these excellent posts in the future.

Jeremy Debattista

Data Governance | Knowledge Graphs | Architect | Engineer | All opinions are mine and do not reflect any of my employers position, unless stated.

1 年

I'm happy that you shared this Holger Knublauch. This was a cool idea and project to showcase the capabilities.

回复

Very cool stuff! Maybe you'd like to check out a tool we're developing at SDSC to extract Github and Gitlab metadata out of repositories, and convert them to rdf graphs. More data to attach to people and projects! note: very much a work in progress, but we're actively working on it! https://github.com/SDSC-ORD/gimie

要查看或添加评论,请登录

Holger Knublauch的更多文章

社区洞察

其他会员也浏览了