Hybrid Graphs for Table-and-Text Based Question Answering Using LLMs
Florent LIU
Data architect, Full Stack Data Engineer in BIG DATA, and Full Stack Developer AI.
Background and Motivation
In today’s data-rich environment, information is often scattered across structured (e.g., tables, databases) and unstructured (e.g., raw text) sources.
Answering questions that require reasoning across both types of data—referred to as Table-Text Question Answering (QA)—poses significant challenges.
Current methods often rely on fine-tuning models with high-quality, human-curated data, which is expensive and time-consuming to obtain.
Recent advancements in Large Language Models (LLMs) have shown promise in zero-shot QA tasks, but their application to multi-source Table-Text QA remains underexplored.
Objective
To introduce ODYSSEY, a novel Hybrid Graph-based approach for Table-Text QA that leverages LLMs without fine-tuning.
The goal is to efficiently answer hybrid questions by constructing a unified Hybrid Graph from both tabular and textual data, pruning irrelevant information, and providing the LLM with concise, relevant context.
Key Contributions
1. Hybrid Graph Construction:
- A unified graph is built by integrating structured (table) and unstructured (text) data. The graph captures relationships between entities in the table and linked documents.
- The graph is pruned based on the input question to filter out noise and retain only relevant information.
2. Zero-Shot QA Framework:
- The system operates in a zero-shot setting, meaning it does not require fine-tuning or labeled training data.
- It uses LLMs (GPT-3.5, GPT-4, LLaMA-3) to answer questions by leveraging the pruned Hybrid Graph.
3. Efficiency Improvements:
- The approach reduces token usage by up to 53% compared to providing the full context (table and text) to the LLM.
- It achieves state-of-the-art (SoTA) performance on the Hybrid-QA and OTT-QA datasets, improving Exact Match (EM) scores by 10% on Hybrid-QA and 5.4% on OTT-QA.
Methodology
The ODYSSEY framework consists of four main steps:
1. Question Analysis:
- The input question is analyzed to extract key entities and map them to relevant table headers. This step identifies the necessary information to answer the question.
2. Hybrid Graph Construction:
- A sub-table is retrieved based on the relevant headers, and an Entity-Document Graph is constructed by linking entities from the text to the table cells.
- The two components (sub-table and Entity-Document Graph) are integrated into a single Hybrid Graph.
3. Hybrid Graph Traversal:
- The graph is pruned using Breadth-First Search (BFS) to retain only the most relevant paths for answering the question.
- The pruned graph is stored in a hop-wise dictionary, where each hop represents a level of traversal (1-hop, 2-hop, 3-hop).
4. Reader LLM:
- The pruned graph is passed to the LLM in a hop-wise manner. If the LLM cannot answer the question with the initial hop, additional hops are provided until the answer is found or the full context is used.
领英推荐
Evaluation and Results
The system was evaluated on two challenging datasets:
1. Hybrid-QA: A multi-hop Table-Text QA dataset based on Wikipedia.
2. OTT-QA: An open-domain QA dataset requiring retrieval of both tables and text.
Key Findings:
- Performance: ODYSSEY outperformed all baselines, achieving 58.4% EM on Hybrid-QA and 62.02% EM on OTT-QA using GPT-4. It also showed strong performance with smaller models like LLaMA-3-8B.
- Token Efficiency: The approach reduced input token size by 45.5% on Hybrid-QA and 53% on OTT-QA, significantly lowering computational costs.
- Hop-wise Analysis: Nearly 90% of questions were answered using 1-hop or 2-hop connections, demonstrating the effectiveness of the Hybrid Graph in filtering irrelevant information.
Comparison with Baselines:
- ODYSSEY consistently outperformed baselines like Base w/ Table & Text and Base w/ Summarized Text across all metrics (EM, F1, Precision, Recall).
- It also outperformed fine-tuned models on OTT-QA and achieved comparable results on Hybrid-QA, despite operating in a zero-shot setting.
Ablation Studies
1. Hop-wise Retrieval: Passing all pruned information at once (instead of hop-wise) resulted in a slight performance drop and increased token usage.
2. Pruned Graph: Using the entire Hybrid Graph (without pruning) led to lower accuracy and higher token costs, highlighting the importance of pruning for efficiency.
Error Analysis
The system’s errors were categorized into:
1. Formatting Errors: Differences in answer formatting (e.g., "Regis Philbin" vs. "Regis Philbin,") affected EM scores.
2. Semantic Module Errors: Issues in entity matching, extraction, and header mapping.
3. LLM Errors: The LLM occasionally failed to provide correct answers despite having the necessary context.
4. Dataset Issues: Ambiguous questions or anomalies in the dataset.
Conclusion
ODYSSEY introduces a zero-shot, fine-tuning-free approach for Table-Text QA, leveraging a Hybrid Graph to efficiently navigate multi-hop reasoning across structured and unstructured data. The system achieves state-of-the-art performance while significantly reducing token usage, making it a scalable solution for real-world applications. Future work could explore extending the approach to multi-modal datasets (e.g., images, videos) and improving entity-matching capabilities.
Limitations
1. Processing Time: The system incurs slightly more processing time than zero-shot baselines due to additional LLM calls and graph traversal.
2. Dependence on LLM Capabilities: Performance is tied to the capabilities of the underlying LLM, which may evolve over time.
3. Scope: The current implementation is limited to Table-Text data, but the approach could be extended to other multi-modal datasets.
Key Takeaways
- Efficiency: ODYSSEY reduces token usage by up to 53%, making it cost-effective for large-scale QA tasks.
- Performance: It achieves SoTA results on Hybrid-QA and OTT-QA, outperforming fine-tuned models in some cases.
- Scalability: The zero-shot, fine-tuning-free approach makes it adaptable to various domains without the need for expensive labeled data.