登录查看更多内容

Ontologies in Knowledge Graph makes Data Smarter ..

Karuna Puri

Data Engineer | AWS certified | Blogger

发布日期: 2020年6月3日

AI is an umbrella of disciplines like Machine Learning, NLP, Speech & Vision, Knowledge Representation, Robotics, and Problem Solving. However, AI is sometimes seen as a synonym for ML but it is not the case actually. ML can generate models giving outstanding results for most of the problems but models interpretability is sometimes like a Black Box which is one of the biggest problems for ML/DL disciplines. This issue of interpretability can be resolved to great extent using one of the another discipline's of AI known as KR - Knowledge Representation and Reasoning.

The main rationale behind KR is how to make your data smarter which can be reused and doesn't needs to be replicated everywhere thereby reducing the overall application logic or taking it out of data. A smarter data is the one which is well structured, and has well defined semantics which could be used by other applications.

Ontology is a domain model that represents a particular domain. An ontology mainly focuses on three main characteristics:

Structure i.e. Machine Readable.
Explicit Description of Domain i.e. an Enumeration of Entities that belongs to a Domain that relates to each other thus forming a graph like representation.
Finally, Shared (or agreed) Vocabulary i.e. that could typically be shared by a Domain Community.

Typical use of an ontology includes -

1. Inference - infer new knowledge/facts from existing data fragments.

2. Interoperability - with shared vocabulary the data exposed could be used by more people plus more contribution to application.

And when this ontology (our data model) combines or is applied to a set of individual data points it creates a knowledge graph. In other words:

              Ontology + Data = Knowledge Graph

Ontologies leverages overall quality of data. They work like brains that reasons with concepts and relationships just like human brain perceive interlinked relationships. Thus enables smart reasoning of data. It provides easy navigation of data in ontology structure.

Now talking about a Graph Database which uses a graph like structures to query, represent and store data with nodes (vertices), edges (relationships) and properties. A NoSQL database that addresses some of the limitations of relational databases. The underlying storage mechanism for graph database varies from database to database. Like some uses table, others Use key-value pair store, and while some use document oriented database for storage. Also different query languages are available to query the database like SPARQL (for Stardog), Cypher (for Neo4j), Gremlin etc.

A Graph database has one of these two models -

Labeled-property graph (LPG) - Mainly used for storing and querying data. Nodes and edges have internal structure.

Vertices
Nodes: ID + set of key-value pairs

Edges
Relationships: ID + Type + set of key-value pairs

2. Resource Description Framework (RDF) - Used for data exchange. Nodes and edges have no internal structure.

Vertices
Resources: URIs
Attribute Values: Literal Values

Edges
Relationships: URIs

The resources i.e. vertices/nodes and relationships i.e. edges are identified by a URI, That’s a unique identifier which means nodes & edges doesn’t have an internal structure; they are purely a unique label. This is the main differences between RDF and labeled property graphs. LPG has an internal structure i.e. set of key-value pairs that describe them. Also, For RDF model, vertices can be one of the two things. RDF has this notion of a triple, that’s a statement composed of three elements which represents two vertices connected by an edge. This notion is called subject-predicate-object (SPO). Where Subject is a resource, or a node in the graph. The predicate represents an edge – a relationship — and the object is another node or vertex. RDF data can be serialised in one of the possible syntax like Turtle, SPARQL. Similarly LPG data can be represented using Cypher query language syntax.

Databases like Neo4j and Stardog could also be potential candidates to compare such graph models - LPG and RDF. Neo4j is native open source graph database whereas Stardog is an enterprise knowledge graph platform and database with high availability, performance and virtualisation. Neo4j primary database model is graph dbms whereas for Stardog it is either graph dbms or RDF store. Neo4j is open-source whereas Stardog has commercial license type. Neo4j uses query language like Cypher whereas Stardog uses query languages like SPARQL.

RDF stores are strong index-based types, while Neo4j is navigational. It implements index-free adjacency, which means that it stores the connections between connected entities, between connected nodes, in disks. And Index-based storage is fine for queries that aren’t very deep, but it’s very difficult to do path analysis. RDF triple stores ae not meant to be used in operational and transactional cases. They should be used in mostly additive, typically slow-changing scenarios. Conversely, on the other hand Neo4j is performant in highly dynamic scenarios and transactional cases where data integrity is key.

Hence, Inference based semantics is data-driven, server-side logic, which can be typically implemented as some rules. Adding intelligence to data can be driven by an ontology. That makes data smarter and useful over a period of time as it grows.

要查看或添加评论，请登录

Karuna Puri的更多文章

Are you ready to code in Go?

2023年10月10日

Are you ready to code in Go?

Go came somewhere in 2007 by a team of programmers in Google. When Go was designed some of the most commonly used…

1 条评论
GraphQL: Data in Precise Request

2023年7月3日

GraphQL: Data in Precise Request

GraphQL A query language for reading and mutating data in API's along with a type system where one can provide schema…
The 3D's Architecture & Development ..

2022年10月8日

The 3D's Architecture & Development ..

Domain Driven Design: Crafting a software from interpretation of business terminologies aka. domain in turn leads to…
Scala Cats to handle side effects for writing purely functional code..

2022年9月19日

Scala Cats to handle side effects for writing purely functional code..

Scala Cats came into picture later after Scalaz a Scala library for functional programming wasn't able to meet the some…
Transform Data Source with AWS Glue: Managed ETL Platform

2022年1月25日

Transform Data Source with AWS Glue: Managed ETL Platform

AWS Glue: A data integration service. AWS Glue can be used for data enriching, cleansing, normalising, organisation…

1 条评论
Cloud experience with AWS vs. GCP

2020年12月12日

Cloud experience with AWS vs. GCP

Amazon Web Services a.k.

1 条评论
Can Elastic make your Stack better ...

2020年8月20日

Can Elastic make your Stack better ...

The moment you hear the word elastic what is that hit our mind first - something that is stretchable, resilient…
Is it Scala or Kotlin.... Kotlin or Scala.?

2020年5月13日

Is it Scala or Kotlin.... Kotlin or Scala.?

Scala and Kotlin are the two contender languages in the kingdom of JVM both of which is trying to bring better versions…
Digital Era Filters to Personalise User Experience & Satisfaction ...

2020年4月18日

Digital Era Filters to Personalise User Experience & Satisfaction ...

21st century Era of Information Age where number of new technologies is popping each day rapidly. In this era of…
NLP develops Context using Pre-Trained Models.

2019年5月28日

NLP develops Context using Pre-Trained Models.

NLP problems has always been unique and challenging. It reflects how complex and at same time beautiful human language…

See all articles

Ontologies in Knowledge Graph makes Data Smarter ..

Karuna Puri

Data Engineer | AWS certified | Blogger

Karuna Puri的更多文章

社区洞察

其他会员也浏览了

Essential AI skills to learn in 2024

The Role of AI in Big Data

How is Predictive Analytics Transforming IT Operations?

How Structured and Unstructured Data Drive AI Value

TimeGPT, Conformal Prediction, and Excel Integration in RoadMap TrailBlazer

Paper Review: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

What are Retrieval Augmented Generation (RAG) Systems?

Enhancing Data Science with Large Language Models within Select Industries.

Using Taxonomy and Ontology for Structuring Search Spaces in AI Systems

Karuna Puri的更多文章

Are you ready to code in Go?

GraphQL: Data in Precise Request

The 3D's Architecture & Development ..

Scala Cats to handle side effects for writing purely functional code..

Transform Data Source with AWS Glue: Managed ETL Platform

Cloud experience with AWS vs. GCP

Can Elastic make your Stack better ...

Is it Scala or Kotlin.... Kotlin or Scala.?

Digital Era Filters to Personalise User Experience & Satisfaction ...

NLP develops Context using Pre-Trained Models.

社区洞察

其他会员也浏览了

Essential AI skills to learn in 2024

The Role of AI in Big Data

How is Predictive Analytics Transforming IT Operations?

How Structured and Unstructured Data Drive AI Value

TimeGPT, Conformal Prediction, and Excel Integration in RoadMap TrailBlazer

Paper Review: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

What are Retrieval Augmented Generation (RAG) Systems?

Enhancing Data Science with Large Language Models within Select Industries.

Using Taxonomy and Ontology for Structuring Search Spaces in AI Systems