登录查看更多内容

Should we understand generative AI as something outside of the concept of data space?

Carlos Alonso Pe?a

Director de División. Dirección General del Dato

发布日期: 2023年10月19日

The latest frameworks and architectures that allow us to harness the power of LLM and generative AI, such as RAG and GRAG, could point us in the direction of their application to solve the interoperability of nascent data spaces.

Large Language Models (LLMs) and Generative AI are reshaping the business sector, potentially surpassing the pivotal transformations brought about by the mobile and internet revolutions. Are we certain that we cannot apply these concepts to resolve the interoperability issues within data spaces?

LLMs and Generative AI are technologies that can generate natural language responses based on a given input, such as a question, a prompt, or a context. They use deep learning models, such as transformers, that are trained on large amounts of text data. These models can learn the patterns and structures of natural language and can produce coherent and fluent texts that can answer queries, generate summaries, write essays, create stories, and more. As most knowledge tends to remain fairly consistent, such models are called foundational models, they represent a solid snapshot of knowledge at a given time that can be used for making inferences.

To address the actual limitations of LLMs beyond fine-tuning solutions, a fascinating approach within this domain is the powerful synergy achieved by combining LLMs with ontologies. This partnership empowers organizations to harness the capabilities of LLMs while operating within a controlled and structured environment. The collaboration establishes a feedback loop for continuous improvement, with ontologies providing context and validation to LLM responses.

This integration of LLMs and ontologies holds substantial promise for enhancing the efficiency and effectiveness of data spaces, potentially offering innovative solutions to the prevailing challenges in the field of semantic data interoperability.

These techniques are known as RAG (Retrieval Augmented Generator) or GRAG (Graph Retrieval Augmented Generator). GRAGs is an extension of Retrieval Augmented Generation (RAG), which is a pattern that works with pretrained LLMs and your own data to generate responses. RAG uses only text-based retrieval methods to augment the LLM with relevant documents from a large-scale text corpus. GRAG improve upon RAG by incorporating graph-based retrieval methods that can capture more structured and diverse knowledge from heterogeneous sources.

GRAG is a novel framework for generating natural language responses that are relevant and informative based on a given input. GRAGs leverages both graph-based and text-based retrieval methods to augment a large language model (LLM) with external knowledge from heterogeneous sources. GRAGs consists of three main components: a graph retriever, a text retriever, and a generator. The graph retriever queries a knowledge graph using the input and returns a set of relevant entities and relations. The text retriever then uses these entities and relations as queries to retrieve related documents from a text corpus. The generator (LLM) combines the input, the graph retrieval results, and the text retrieval results to produce a natural language response.

领英推荐

RAG Techniques Every AI/ML/Data Engineer Should Know!

Pavan Belagatti 6 个月前

Almost Timely News: ??? Small Language Models and…

Christopher Penn 5 个月前

Small Language Models: The Unsung Heroes of AI

Data Science Dojo 1 年前

With these techniques you can constrain natural language processing to your enterprise content sourced from graphs and vectorized documents, images, audio, and video. The graph is the instantiation of the ontology of your data. The vector index are numerical representations of concepts (data) converted to number sequences, which enable LLMs to understand the relationships between those concepts.

To apply this technologies RAG/GRAG we need to have in mind the following steps:

Data Vectorization: This process converts the data sources into numerical representations that can be stored in a vector database. This allows for fast and accurate retrieval of relevant data for specific queries.
Knowledge Graph Construction: This step creates a knowledge graph that represents and organizes the knowledge.
Knowledge Graph Querying: This stage uses the prompt to query the knowledge graph and return a set of relevant entities and relations that can provide context and information for the generation process.
Knowledge Graph Augmentation: This phase uses the entities and relations as queries to retrieve related documents from a text corpus using the vector embeddings, and then aligns them with the graph retrieval results. This can help to augment the input with additional knowledge from various sources.
Response Generation: This task uses a LLM to generate a natural language response based on the input, the knowledge graph retrieval results, and the text retrieval results.

?Perhaps we could employ this vision to address the interoperability challenges, especially those associated with semantics, that we encounter in data spaces.

Is it possible for an organization to provide an entry point, perhaps located in the data space connector, where someone query not only their data but also its knowledge?
Can we envision a network of smart agents that automatically collect and consolidate the knowledge that someone needs to answer a business question?
Is it feasible to encourage the development of data intermediaries that can store and merge multiple vector stores?

I believe it's an opportunity to consider the concepts of LLMs, ontologies, RAG, and GRAGs not merely as foreign constructs, but as practical tools that can empower data spaces. These technologies have the potential to address real-world challenges and pave the way for more efficient and effective data interoperability. They help enhance data quality, extend data spaces capabilities, and enable data spaces to respond more effectively and intelligently to user queries. By leveraging these technologies, data spaces can navigate the complex landscape of data and emerge as valuable assets in our data-driven world.

要查看或添加评论，请登录

Carlos Alonso Pe?a的更多文章

El Informe Catastral de Ubicación de las Construcciones (ICUC)

2018年1月1日

El Informe Catastral de Ubicación de las Construcciones (ICUC)

El “Informe Catastral de Ubicación de las Construcciones” (ICUC) permite verificar que las coordenadas de…

1 条评论
La Certificación Catastral descriptiva y gráfica, piedra angular de la Coordinación Catastro-Registro

2017年12月29日

La Certificación Catastral descriptiva y gráfica, piedra angular de la Coordinación Catastro-Registro

El nuevo escenario de uso de la información gráfica catastral como fundamento de la información gráfica registral…
Validación catastral de representaciones georreferenciadas: el Informe de Validación Gráfica (IVG)

2017年12月28日

Validación catastral de representaciones georreferenciadas: el Informe de Validación Gráfica (IVG)

El nuevo escenario de uso de la información gráfica catastral como fundamento de la información gráfica registral…
16 dígitos como expresión de la información gráfica precisa para alcanzar la Coordinación Catastro-Registro

2017年5月20日

16 dígitos como expresión de la información gráfica precisa para alcanzar la Coordinación Catastro-Registro

La Resolución de 26 de octubre de 2015, conjunta de la Dirección General de los Registros y del Notariado y de la…

Should we understand generative AI as something outside of the concept of data space?

Carlos Alonso Pe?a

Director de División. Dirección General del Dato

领英推荐

Carlos Alonso Pe?a的更多文章

社区洞察

其他会员也浏览了

RAG to Riches: Enhancing AI Applications!

The Art & Science of AI Whispering: Mastering Prompt Engineering for Enterprises in the Age of Language Models

Top LLM Papers of the Week (October Week 4, 2024)

?????? LLMs Opening Their Inner Eyes

Getting Started with Your First RAG System in LlamaIndex

The Future of AI: Small Language Models, Small Agent Models, and Agent AI

Our 4-Tool Stack + Strategy for Building Enterprise AI Solutions on LLMs - AI&YOU #53

AI Agents, RAG, and LLM Updates: Architecture and Relationships

How to Build Powerful LLM Apps with Vector Databases + RAG - AI&YOU #55

LLM: Train vs. Tune – Understanding the Key Differences

领英推荐

Carlos Alonso Pe?a的更多文章

El Informe Catastral de Ubicación de las Construcciones (ICUC)

La Certificación Catastral descriptiva y gráfica, piedra angular de la Coordinación Catastro-Registro

Validación catastral de representaciones georreferenciadas: el Informe de Validación Gráfica (IVG)

16 dígitos como expresión de la información gráfica precisa para alcanzar la Coordinación Catastro-Registro

社区洞察

其他会员也浏览了

RAG to Riches: Enhancing AI Applications!

The Art & Science of AI Whispering: Mastering Prompt Engineering for Enterprises in the Age of Language Models

Top LLM Papers of the Week (October Week 4, 2024)

?????? LLMs Opening Their Inner Eyes

Getting Started with Your First RAG System in LlamaIndex

The Future of AI: Small Language Models, Small Agent Models, and Agent AI

Our 4-Tool Stack + Strategy for Building Enterprise AI Solutions on LLMs - AI&YOU #53

AI Agents, RAG, and LLM Updates: Architecture and Relationships

How to Build Powerful LLM Apps with Vector Databases + RAG - AI&YOU #55

LLM: Train vs. Tune – Understanding the Key Differences