Ontologies in Knowledge Graph makes Data Smarter ..

Ontologies in Knowledge Graph makes Data Smarter ..

AI is an umbrella of disciplines like Machine Learning, NLP, Speech & Vision, Knowledge Representation, Robotics, and Problem Solving. However, AI is sometimes seen as a synonym for ML but it is not the case actually. ML can generate models giving outstanding results for most of the problems but models interpretability is sometimes like a Black Box which is one of the biggest problems for ML/DL disciplines. This issue of interpretability can be resolved to great extent using one of the another discipline's of AI known as KR - Knowledge Representation and Reasoning.

The main rationale behind KR is how to make your data smarter which can be reused and doesn't needs to be replicated everywhere thereby reducing the overall application logic or taking it out of data. A smarter data is the one which is well structured, and has well defined semantics which could be used by other applications.

 Ontology is a domain model that represents a particular domain. An ontology mainly focuses on three main characteristics:

  1. Structure i.e. Machine Readable.
  2. Explicit Description of Domain i.e. an Enumeration of Entities that belongs to a Domain that relates to each other thus forming a graph like representation.
  3. Finally, Shared (or agreed) Vocabulary i.e. that could typically be shared by a Domain Community.
No alt text provided for this image

Typical use of an ontology includes -

1. Inference - infer new knowledge/facts from existing data fragments.

2. Interoperability - with shared vocabulary the data exposed could be used by more people plus more contribution to application.

No alt text provided for this image

And when this ontology (our data model) combines or is applied to a set of individual data points it creates a knowledge graph. In other words:

              Ontology + Data = Knowledge Graph

Ontologies leverages overall quality of data. They work like brains that reasons with concepts and relationships just like human brain perceive interlinked relationships. Thus enables smart reasoning of data. It provides easy navigation of data in ontology structure.

No alt text provided for this image

Now talking about a Graph Database which uses a graph like structures to query, represent and store data with nodes (vertices), edges (relationships) and properties. A NoSQL database that addresses some of the limitations of relational databases. The underlying storage mechanism for graph database varies from database to database. Like some uses table, others Use key-value pair store, and while some use document oriented database for storage. Also different query languages are available to query the database like SPARQL (for Stardog), Cypher (for Neo4j), Gremlin etc.

A Graph database has one of these two models -

  1. Labeled-property graph (LPG) - Mainly used for storing and querying data. Nodes and edges have internal structure.
No alt text provided for this image
Vertices
Nodes: ID + set of key-value pairs

Edges
Relationships: ID + Type + set of key-value pairs

2. Resource Description Framework (RDF) - Used for data exchange. Nodes and edges have no internal structure.

No alt text provided for this image
Vertices
Resources: URIs
Attribute Values: Literal Values

Edges
Relationships: URIs

The resources i.e. vertices/nodes and relationships i.e. edges are identified by a URI, That’s a unique identifier which means nodes & edges doesn’t have an internal structure; they are purely a unique label. This is the main differences between RDF and labeled property graphs. LPG has an internal structure i.e. set of key-value pairs that describe them. Also, For RDF model, vertices can be one of the two things. RDF has this notion of a triple, that’s a statement composed of three elements which represents two vertices connected by an edge. This notion is called subject-predicate-object (SPO). Where Subject is a resource, or a node in the graph. The predicate represents an edge – a relationship — and the object is another node or vertex. RDF data can be serialised in one of the possible syntax like Turtle, SPARQL. Similarly LPG data can be represented using Cypher query language syntax.

No alt text provided for this image

Databases like Neo4j and Stardog could also be potential candidates to compare such graph models - LPG and RDF. Neo4j is native open source graph database whereas Stardog is an enterprise knowledge graph platform and database with high availability, performance and virtualisation. Neo4j primary database model is graph dbms whereas for Stardog it is either graph dbms or RDF store. Neo4j is open-source whereas Stardog has commercial license type. Neo4j uses query language like Cypher whereas Stardog uses query languages like SPARQL.

No alt text provided for this image

RDF stores are strong index-based types, while Neo4j is navigational. It implements index-free adjacency, which means that it stores the connections between connected entities, between connected nodes, in disks. And Index-based storage is fine for queries that aren’t very deep, but it’s very difficult to do path analysis. RDF triple stores ae not meant to be used in operational and transactional cases. They should be used in mostly additive, typically slow-changing scenarios. Conversely, on the other hand Neo4j is performant in highly dynamic scenarios and transactional cases where data integrity is key.

Hence, Inference based semantics is data-driven, server-side logic, which can be typically implemented as some rules. Adding intelligence to data can be driven by an ontology. That makes data smarter and useful over a period of time as it grows.

要查看或添加评论,请登录

Karuna Puri的更多文章

社区洞察

其他会员也浏览了