Understanding Semantic and Property Graphs

Executive Summary

As enterprises increasingly adopt Graph Databases, to better reflect the nature of the data, or as integration platforms. However, there are significant differences between Semantic Graph Databases and Property Graph Databases, which are summarised in this paper.

In particular, Property Graph Databases are purely engineering tools, used by development teams, with semantic of data defined by code and by nature proprietary.

Semantic Graph Databases implement multiple World Wide Web Consortium standard, define and reuse Metadata in standard format, in some cases defined by industry bodies, and offer queryability and discoverablity to non-programmer users. Therefor, a Semantic Graph Database, while also can be used behind an application, can be implemented in its own right as a team or enterprise asset.

What is a Graph Model?

A Graph is a data/information model that intends to return information to its natural, connected state. In the real world, everything is connected, and no predefined structure for the information exists. And in our thinking we more often investigate information describing related entities than search and list through similar blocks of information for many unrelated entities.

However the restrictions of paper forms and catalogues, and than of relational model that was necessary due to highly limiting hardware, made us think in terms of tables and records.

The Graph model takes us back to the world of connected entities, which can be connected in various ways, and have whatever properties they please.

Graph Model as an Enterprise Data Storage

Relational Databases are not going anywhere. They will remain the only solution for working with massive quantities of pre-defined records. However Graph Databases are increasingly employed for tasks that fall outside of that formula. Graph model is seen as the way to integrate and connect many diverse datasets.

Also, for data that emphasise connectivity and complexity, a Graph database can significantly reduce development time while increasing performance.

Graph Models from Executive Perspective

There are two different Graph models:

  • Semantic Graph
  • Property Graph

While they have similarities, and some products support both, it is important to understand the difference.

Semantic Graph

Semantic Graph model is based on the “Semantic Web” ideas by the inventor of World Wide Web Sir Timothy Berners-Lee. The foundation of it is the idea of world-wide (or enterprise-wide) integration of information. The key features are:

  1. Built-in data and metadata integration and extensibility. As a result, any local, tactical development can be extended or integrated into an extended enterprise-wide Knowledge Graph.
  2. Strong international standards for representation of metadata, in form of OWL Ontologies.
  3. Presence of expertly defined Ontologies, covering Finance Industry (FIBO), Earth Sciences (SWEET), and many other domains.
  4. Strong standards for data representation and query, that together with common Ontologies enable involvement of external resources and expertise.
  5. Reasoning and generalisation of Classes and Properties allow for coexistence of more generic, business-friendly layer with detailed technical information
  6. Reasoning, classification and integration naturally provide platform for a decision-support system, even if not planned initially
  7. Non-technical domain experts can be trained to evaluate Ontologies and write SPARQL queries
  8. Data in a Semantic Graph is naturally discovered. For example, after 4-day training, the trainees were discovering information in DBPedia without any prior knowledge on its structures
  9. Another side of Semantic Graph is the Linked Data data publication standards.

Property Graph

Property Graph databases emerged bottom-up, from the projects that emphasised traversing of links over data filtering. A Property Graph implementation will likely be an engineering undertaking, with a lot of ad-hoc decisions on structure, properties, metadata etc. Some features of a Property Graph implementation would be:

  1. There is a standard query language, Gremlin, however it is an extension of Groovy programming language. Only engineers are expected to use it.
  2. Non-technical users will be accessing the Graph via applications. The input from stakeholders, technical users etc would be via traditional Business Analysis process.
  3. There are no standards on naming of the vertices (nodes), or representation of metadata.
  4. While introducing new Properties doesn’t require any additional effort, there is no established way of recording the properties and their meaning.
  5. Unlike with Semantic Graphs, integration of two Property Graphs will not happen automatically.

Overall, use of a Property Graph storage is not something to be exposed to end-users or stakeholders, who instead have to evaluate the expected visual or data management functionality.

Unlike a Semantic Graph, Implementing a Property Graph is destined to be a proprietary engineering feat which bring traditional problems associated with software engineering in a non-software enterprise: continuity, project monitoring, [lack of] documentation etc.

At the same time, for a team of engineers, a Property Graph platform can have advantages over a Semantic Graph platforms - some of them are outlined below.

Semantic and Property Graphs from Engineering Perspective

What the Graph consists of

A Semantic Graph consists of Triples in form of <subject> <predicate> <object>. Both Subject and Predicate must be URIs, while an Object can be either a URI or a Literal. A Semantic Graph doesn’t specifically defines nodes/vertices - the only way a Node can be known is by been mentioned in a Triple.

A Property Graph consists of Vertices (Nodes) and Edges. Each Edge and Vertice has an internal GUID and can have any number of Key-Value pairs associated with it. Thus a Property Graph can effortlessly associate values with Edges, which Semantic Graph cannot do. While some of the values can be URIs, it is not required.

Identity

For a Semantic Graph, URI is the GUID of a Node or Predicate. Please note, that URI of a Predicate defines the type of the Predicate, not the individual Triple (Edge in Property Graph talk)

Fro a Property Graph, each Edge and each Vertice (node) has an internal GUID. That enables multiple edges between the same Vertices with the same properties (key-values).

Metadata

For a Semantic Graph, Metadata makes a part of the Graph, and uses Predicates and Nodes defined by the OWL (Web Ontology Language). Metadata can be retrieved by a user with zero knowledge of a particular implementation of Graph. There are many Ontologies widely and freely available, and we usually recommend our clients to either reuse an Ontology, or develop for reuse.

For a Property Graph, there is no universal standard for Metadata. Naturally, any project with a chance of success must have some information about Vertices and Edges types and how they are represented and related, but that information would be project-specific, recorded in a project-specific way - or not.

Query

A Semantic Graph is queried using SPARQL, an SQL-like declarative query language. The author successfully taught SPARQL to non-technical, non-IT enterprise staff.

A Property Graph is queried by Gremlin query language, a Java-looking language (based on Groovy programming language) that has both declarative and imperative capabilities. We teach Gremlin together with some elements of Groovy, and recommend that the participants have at least some programming education or experience.

Reasoning

Most Semantic Graph platforms support classification and generalisation Reasoning. A query for a Person would return anyone defined as a Consultant, because a Consultant Class been defined as a subclass of Humans. A query for all Trainers would return anyone offering a Training Course, even if the person was not explicitly defined as a Trainer.

Property Graph doesn’t have Reasoning defined on the platform level.

Scalability

Both types of Graph storage can be scaled massively, although the approaches are different. For a massive Property Graph one has to choose a highly scalable Graph DB implemented on top of HBase or Cassandra. A Semantic Graph is scaled through so-called Virtual Graph, where data remain in Relational or NoSQL storage, however are made available for SPARQL queries.

That difference is due to higher complexity of Semantic storage, including reasoning. While there were several attempts to build a Semantic storage on top of Cassandra or HBase, none succeeded.

Analytics

One will find it rather awkward trying to implement graph algorithms, like Page Rank or Trust Propagation, on a Semantic Graph. Part of the problem is inability to attach values to triples.

It is possible to implement similar algorithms on a Property Graph.

However, true performance at scale would require exporting a graph into a sparse matrix file (possibly and HDFS file), then using either Apache Spark GraphX, or some GPU analytics package. Then the results must be populated back into the Graph.

Next Step

Introducing either Semantic or Property Graph is a serious paradigm shift for a traditional IT site. The critical step is training your core staff, the people you can trust, then running a small-scale project that delivers further learning of the technology. Business Abstraction can assist with out training and consulting:

Training for Semantic Graph technologies

Training for Property Graph

Contact us for more information.

要查看或添加评论,请登录

Alex Jouravlev的更多文章

  • You have your Business Architecture. Do you use it?

    You have your Business Architecture. Do you use it?

    As a Consultant, I often ask prospective clients about Business Architecture. The usual answer is that someone already…

  • The True Face of Level 4 Process Mapping

    The True Face of Level 4 Process Mapping

    We need to have a serious conversation about Process Centricity vs Data Centricity in the face of Digital…

  • Agile, Simplified

    Agile, Simplified

    It doesn’t look like there is a good working definition of what constitutes Agile. The Agile Manifesto is supposed to…

    6 条评论
  • As-is Modelling, the Sweet Wasteland of Enterprise Architecture

    As-is Modelling, the Sweet Wasteland of Enterprise Architecture

    Enterprise Architecture is under attack. On one side, the Service Design people are “planning and organizing people…

    81 条评论
  • Agile Expectations Board

    Agile Expectations Board

    An Agile Expectations Board seeks to prevent an Agile project from successfully delivering Iterations on the way to…

  • The Cost of the Right to be Different

    The Cost of the Right to be Different

    It is a high season for IT contracts here in Canberra, so the “Let the Hundred Flowers Bloom” anti-pattern is in full…

  • COTS or CRHMS? Understanding Full Stack of a Core Enterprise Software.

    COTS or CRHMS? Understanding Full Stack of a Core Enterprise Software.

    Commercial Off-the-shelf Software, or COTS, seems as, although expensive, a way to avoid risks and challenges…

    1 条评论
  • Some Inconvenient Thoughts about Architecture

    Some Inconvenient Thoughts about Architecture

    Enterprise Architecture should include Diagrams understood by the highest executive level to be useful. If you don’t…

  • Enterprise is the Data: Are Processes Overrated?

    Enterprise is the Data: Are Processes Overrated?

    The first thing I noticed when started to transition some of my clients from UML to OWL Ontology Modelling was that…

    6 条评论
  • Expect More Sparse Data

    Expect More Sparse Data

    One of the arguments for NoSQL Databases, along their ability to handle Big Data, is their ability to handle sparse…

    1 条评论

社区洞察

其他会员也浏览了