登录查看更多内容

My Experiment with Neo4j and the Power of Graph-Based RAG

Rohit Sharma

AI/ML Computational Science Manager

发布日期: 2025年1月19日

Semantic search and knowledge graphs are two pre-dominant paradigms for working with knowledge sources.

Semantic search as we know leverages embeddings and semantic similarity to retrieve relevant results, while the knowledge graphs structure the data into nodes, edges to represent the explicit relationships.

Semantic search is easier to implement but is bit limited in reasoning the insights. Knowledge graphs are way ahead in this but are traditionally hard to build.

In this post, I’ll try to cover the challenges in building KGs and how tools like the Neo4j LLM Knowledge Graph Builder helps reduce these pains.

This morning - I did an experiment comparing the semantic search based RAG and KG based RAG using the Apple Inc. 10-K report. (Yes this is one of my favourite document that I use in experiments since I know it well ??).

The goal was to analyse how both approaches performs in extracting insights from this multi-section doc.

?This document contains structured and unstructured data supply chain details, financial metrics, risks and corporate governance information.

Where Knowledge graph took the lead! ??

Semantic search leverages embeddings (OpenAI embeddings in my case) to retrieve relevant chunks of a doc based on cosine similarity with the user query.

A knowledge graph on the other hand structures the document into nodes (entities e.g. ?Apple, suppliers, risks) and edges (relationships like "depends_on," "faces_risk_from"). This graph is then queried to provide precise and relationship-driven insights.

Advantages of Knowledge Graphs in the Experiment:?

Multi-Hop Reasoning: KG enabled tracing relationships across multiple nodes and connections. E.g. a query about risks that Apple faces risks related to supply chain highlighted how dependencies on specific regions like manufacturing facilities in China could impact production during regional COVID lockdowns. This layered reasoning allowed deeper insights that are difficult to uncover with semantic search.
Explicit Relationships: Semantic search retrieved text snippets based on similarity but KG explicitly encoded relationships such as "currency fluctuations impact revenue streams." These explicit relationships made insights more explainable.
Cross-Section Integration: The 10-K document has related information across multiple sections. KGs excel at integrating these data points into a unified structure. E.g. financial risks mentioned in "Risk Factors" were directly linked to regional revenue breakdowns in the "Consolidated Financial Statements" section ensuring no critical detail is overlooked proving a holistic view of the document.
Context Preservation: Semantic search struggled with maintaining the context of a query when I saw KG preserving the context of entities and their relationships ensuring that higher accuracy. e.g. a query about Apple’s capital expenditures linked investment figures from the financial sections to corresponding strategic initiatives.
Scalability for Complex Queries: Complex queries e.g. "Which regions contribute most significantly to Apple’s revenue, and how are they impacted by geopolitical risks?" are resolved efficiently with KGs. The graph’s structured nature enabled multi-hop traversal while retaining query efficiency.
Data Normalization: KG normalized and integrated data from structured tables, unstructured text and hierarchical sections into a single coherent representation. e.g. supplier information from the "Management’s Discussion and Analysis" section was aligned with corresponding risks and financial metrics.

领英推荐

Instabase and NatWest Unlock Unstructured Data

Instabase 10 个月前

The Role of Object Storage in AI, The Modern Datalake,…

MinIO 8 个月前

Why Chasing the Hare is Killing Enterprise GenAI –…

Rajesh Iyer 5 个月前

The Challenge! ??

Creating the knowledge graph from unstructured documents is very challenging unlike creating VDB from embeddings.

Entity and relationship extraction: Extracting entities like suppliers and risks would require fine-tuned Named Entity Recognition (NER) models and manual validation. e.g. The term "supplier" appeared generically in multiple sections, requiring additional context to map it to specific entities.
Ontology design: Defining a schema to represent entities (e.g. Company/Risk/Region) and relationships (e.g. depends_on/operates_in) would involve iterative design. Balancing granularity (e.g., should "supply chain risk" be split into subtypes?) with could be very time-consuming.
Data Integration Across Sections: The document contains diverse data formats tabular (financials), textual (risk descriptions) and hierarchical (organizational structures). Aligning financial metrics (e.g. revenue by region) with textual mentions of risks would require custom preprocessing pipelines.
Scalability: The initial graph would 1000s of nodes and edges. Querying, visualizing and validating subgraphs efficiently is tough.

How Neo4j LLM Knowledge Graph Builder comes to the rescue! ??

To reduce the manual workload that I explained above - I explored Neo4j’s LLM Knowledge Graph Builder. And it was just amazing:

Entity Extraction: Automatically identified entities like companies, risks, and regions from the document.
Ontology Suggestions: Provided a starting schema, reducing the need for manual ontology design.
Direct Ingestion: Populated the graph in Neo4j, enabling immediate querying using Cypher.

While some manual adjustments were still necessary (e.g. fine-tuning entity disambiguation), the tool significantly accelerated the process.

?Within minutes I could explore my documents entities and relationships:

The level of relationships in a doc is Insane and "scary" when you explore it

Conclusion: The Future of Graph-Based RAG

My experiment highlighted the clear advantages of graph-based RAG over semantic search in handling complex, interconnected data like the Apple 10-K. While the upfront effort of building a knowledge graph is higher - the value it delivers in terms of reasoning, explainability, and cross-domain integration is unparalleled.

?As tools like Neo4j’s LLM Knowledge Graph Builder continue to mature, the barrier to entry for graph-based RAG will lower, making it an essential strategy for enterprises handling rich, relational data.

#GenerativeAI #KnowledgeGraphs #RAG #Neo4j #AI #GraphDatabases

Dilum Bandara

Principal Research Scientist

1 个月

I agree Neo4j KG Build is a good starting point to the topic. Only caveat for a beginner is it's too much focus on parallelisation, which makes the code long and hard to follow. Anyway, my experience trying to extend to a custom application has been painful, with its `main` branch not being stable, use of many deprecated Langchain libraries, and need to writing own code to automate entity disambiguation.

Siddhant Agarwal

DevRel Guy | Graph Enthusiast | Google Developer Expert AI/ML| Ex-Google, IBM

1 个月

Looks exciting. Would you be interested to talk about this at our next meetup in Delhi? https://www.meetup.com/graph-database-delhi-ncr/

2 次回应

查看更多评论

要查看或添加评论，请登录

Rohit Sharma的更多文章

Most Chatbot Suck (And It’s Not Because of the LLM)

2025年2月27日

Most Chatbot Suck (And It’s Not Because of the LLM)

It’s easy to plug in an API and get a chatbot to respond. In fact with the level of abstraction available today –…
Chat with SQL: AI-Powered Natural Language to Database Queries

2025年2月22日

Chat with SQL: AI-Powered Natural Language to Database Queries

In AI-driven applications, natural language to SQL (NLP-to-SQL) is becoming an essential capability. While not a…

1 条评论
DeepSeek-R1: Reasoning Capability with Reinforcement Learning

2025年1月25日

DeepSeek-R1: Reasoning Capability with Reinforcement Learning

I’ve been reading DeepSeek’s paper and what strikes me the most isn’t just the technical bits but the way these guys…

2 条评论
Revolutionizing AI with Jamba: The Cost-Effective Game-Changer for Long Contexts

2025年1月14日

Revolutionizing AI with Jamba: The Cost-Effective Game-Changer for Long Contexts

If you think all LLMs are the same - think again. Every time I find something new when I deep dive into a new…

1 条评论
Agentic AI Is Not a Magic Pill - Here’s Why Hard Work and Skills Still Matter

2024年12月8日

Agentic AI Is Not a Magic Pill - Here’s Why Hard Work and Skills Still Matter

As I wrap up another weekend tinkering with agentic AI - one thing is clear: frameworks and hype aside building…
LangChain Templates - Turbocharging AI Development

2024年4月27日

LangChain Templates - Turbocharging AI Development

Now quite a while ago – Langchain released “LangChain templates” which I think are going to revolutionize the AI app…

See all articles

My Experiment with Neo4j and the Power of Graph-Based RAG

Rohit Sharma

AI/ML Computational Science Manager

Where Knowledge graph took the lead! ??

Advantages of Knowledge Graphs in the Experiment:?

领英推荐

The Challenge! ??

How Neo4j LLM Knowledge Graph Builder comes to the rescue! ??

?Within minutes I could explore my documents entities and relationships:

Conclusion: The Future of Graph-Based RAG

Rohit Sharma的更多文章

社区洞察

其他会员也浏览了

Unraveling Minds: Decoding Texts for Hidden Insights and Emotions

Skypoint AI Platform SherloQ vs. Snowflake Cortex Analyst: Accuracy and Efficiency in Domain-Specific Text2SQL, Leading AI-Powered Self-Service

Few-Shot vs. Fine-Tuning: Detecting Contract Shaping in Federal Contracting

Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes

Revolutionizing Financial Data Retrieval: The Power of RAG in LoanPredictor+

Beginner’s Guide to Algorithms

Context-Aware Text-to-SQL Systems: A Comprehensive Guide

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 2)

Why AutoML failed to live up to the hype

Identifying GenAI Use Cases

Where Knowledge graph took the lead! ??

Advantages of Knowledge Graphs in the Experiment:?

领英推荐

The Challenge! ??

How Neo4j LLM Knowledge Graph Builder comes to the rescue! ??

?Within minutes I could explore my documents entities and relationships:

Conclusion: The Future of Graph-Based RAG

Rohit Sharma的更多文章

Most Chatbot Suck (And It’s Not Because of the LLM)

Chat with SQL: AI-Powered Natural Language to Database Queries

DeepSeek-R1: Reasoning Capability with Reinforcement Learning

Revolutionizing AI with Jamba: The Cost-Effective Game-Changer for Long Contexts

Agentic AI Is Not a Magic Pill - Here’s Why Hard Work and Skills Still Matter

LangChain Templates - Turbocharging AI Development

社区洞察

其他会员也浏览了

Unraveling Minds: Decoding Texts for Hidden Insights and Emotions

Skypoint AI Platform SherloQ vs. Snowflake Cortex Analyst: Accuracy and Efficiency in Domain-Specific Text2SQL, Leading AI-Powered Self-Service

Few-Shot vs. Fine-Tuning: Detecting Contract Shaping in Federal Contracting

Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes

Revolutionizing Financial Data Retrieval: The Power of RAG in LoanPredictor+

Beginner’s Guide to Algorithms

Context-Aware Text-to-SQL Systems: A Comprehensive Guide

Pydantic Guardrails for LLM Pipelines: Harnessing Cognitive Drift (Part 2)

Why AutoML failed to live up to the hype

Identifying GenAI Use Cases