LangChain4j LLM framework with Oracle Database 23ai Vector Embedding Store - Fruit Search Java App

LangChain4j LLM framework with Oracle Database 23ai Vector Embedding Store - Fruit Search Java App

In part 8 of the Oracle Database 23ai series, we will see how to use the LangChain4j LLM framework with Oracle Database 23ai Vector Embedding Store to create a simple fruit search Java application. Fruit details are stored in an Oracle Database embedded vector store, users search queries are looked up in the embedded vector store, and corresponding results (fruit details) are displayed based on natural language processing. A single Java file is used to keep it simple, and the entire application is written from a basic HelloWorld Maven project. We will check and monitor the Docker container logs related to this application.


Table of Contents

  1. About LangChain4j
  2. Oracle Embedding Store implementation into the LangChain4j open-source framework.
  3. Install Maven, Docker and check Docker Information
  4. Create a Hello World Application
  5. About Question & Answer app & Natural Language Processing
  6. Building a Question and Answer app to Query Oracle Database 23ai Vector Embedding Store
  7. Source code - asset links
  8. About Oracle Database 23ai
  9. Conclusion
  10. Other Articles in Oracle Database 23ai series


01. About LangChain4j

The goal of LangChain4j is to simplify integrating LLMs into Java applications.

Here's how:

Unified APIs: LLM providers (like OpenAI or Google Vertex AI) and embedding (vector) stores (such as Pinecone or Milvus) use proprietary APIs. LangChain4j offers a unified API to avoid the need for learning and implementing specific APIs for each of them. To experiment with different LLMs or embedding stores, you can easily switch between them without the need to rewrite your code. LangChain4j currently supports 15+ popular LLM providers and 20+ embedding stores .
Comprehensive Toolbox: Since early 2023, the community has been building numerous LLM-powered applications, identifying common abstractions, patterns, and techniques. LangChain4j has refined these into a ready to use package. Our toolbox includes tools ranging from low-level prompt templating, chat memory management, and function calling to high-level patterns like AI Services and RAG. For each abstraction, we provide an interface along with multiple ready-to-use implementations based on common techniques. Whether you're building a chatbot or developing a RAG with a complete pipeline from data ingestion to retrieval, LangChain4j offers a wide variety of options.
Numerous Examples: These examples showcase how to begin creating various LLM-powered applications, providing inspiration and enabling you to start building quickly.
Comparison table of all supported Embedding Stores (not complete table shown here)

read more from LangChain4j official website.


02. Oracle Embedding Store implementation into the LangChain4j open source framework.

This module implements EmbeddingStore using Oracle Database.

Requirements


What is a POM?

A Project Object Model or POM is the fundamental unit of work in Maven. It is an XML file that contains information about the project and configuration details used by Maven to build the project. It contains default values for most projects. Examples for this is the build directory, which is target; the source directory, which is src/main/java; the test source directory, which is src/test/java; and so on. When executing a task or goal, Maven looks for the POM in the current directory. It reads the POM, gets the needed configuration information, then executes the goal.

Some of the configuration that can be specified in the POM are the project dependencies, the plugins or goals that can be executed, the build profiles, and so on. Other information such as the project version, description, developers, mailing lists and such can also be specified.


Installation, in the maven pom.xml file, add the following dependency.

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artificatId>langchain4j-oracle</artificatId>
    <version>0.1.0</version>
</dependency>        

The complete pom.xml file would like this as shown below.

GitHub Link


03. Download, Install Maven & Docker

Install Maven

Install Maven on your laptop or desktop or use a cloud computing instance such as OCI Cloud computing.

Download Link to Maven

On a Mac machine, you can install Maven using Brew, as shown below

brew install maven        

Check Maven version

mvn --version

-- shows the following result
Apache Maven 3.9.9        

Install Docker

Install Docker as per your operating system and installing docker desktop is optional

On Mac

brew cask install docker        

Check Docker version

madhusudhanrao@MadhuMac ~ % docker --version
Docker version 27.2.0, build 3ab4256        

Check Docker Info

madhusudhanrao@MadhuMac ~ % docker info
Client:
 Version:    27.2.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
...
Server:
ERROR: Cannot connect to the Docker daemon at unix:///Users/madhusudhanrao/.docker/run/docker.sock. Is the docker daemon running?
errors pretty printing info        

If you are using Docker Desktop, you can just start it or start docker from command line

Check for Docker information again

madhusudhanrao@MadhuMac temp % docker info         
Client:
 Version:    27.2.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /Users/madhusudhanrao/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /Users/madhusudhanrao/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
      ....
com.docker.desktop.address=unix:///Users/madhusudhanrao/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile        

List containers

madhusudhanrao@MadhuMac temp % docker container ls -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES        

04. Create a folder and create a basic Mavin Project

mvn archetype:generate -DgroupId=dev.langchain4j -DartifactId=QandA -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.5 -DinteractiveMode=false        

Open the newly created folder in VSCode or any IDE of your choice

Click on the Run button on the Java code in src folder

This will just print Hello World in the terminal, whats important is Apache Maven has created the complete Java project structure and ability to easily run this project including all dependencies.


04. About Fruit Search Java app (or Q & A app) & Natural Language Processing

Use case /thought process: I would like to create a basic application where I store details of each fruit in a vector embedding as individual database table records. When a question is related to fruit, we will run queries on vector embeddings that use natural language processes to get appropriate records. Our input data would look like this.

  1. Guava is a common tropical fruit cultivated in many tropical and subtropical regions. The common guava Psidium guajava is a small tree in the Myrtle family.
  2. Some of the best Apple are from Kinnaur and Shimla in India and Aomori in Japan. Apple comes in various colours like red, green and yellow.
  3. Some of the best mangoes are from Sindhudurg, Raigad and Ratnagiri.
  4. A banana is an elongated, edible fruit – botanically a berry produced by several kinds of large herbaceous flowering plants in the genus Musa.

And when the user asks questions like "Where can I find the best Apples?" or "Tell me about Guava", it would look into the inserted Vector Embeddings to find the best match.

Technically speaking, we would need a Database to store these vector records, so we will use Oracle Database 23ai docker image to insert these vector records and run a query against these records; our code should be able to fetch a docker image/container, setup a database and be able to run the above code. All this using a single Java code leveraging the power of LangChain4j with Oracle Embedding Store .



05. Building a Question and Answer app to Query Oracle Database 23ai Vector Embedding Store

Since we already have created our basic Maven app in the previous step, let's update the Maven pom.xml code as shown.

(please see download section below to download all the codes in this article)

Add JDBC Version in the properties

Immediately your VSCode will prompt you to update/synchronize Java classpath you can choose Always or Yes.

Under Dependencies add the following

Lets add a new Java code OracleEmbeddingStoreExample as shown below, this file should be in src/java/dev/langchain4j directory and in same directory as App.java which was previously generated for Hello world application

Add Imports

Download and Run Docker Image/Container or connect to an existing Docker image of Oracle Database 23ai.

Alternatively you can also provide JDBC Connection String if the Docker container is already downloaded and running.

Create a vector Embedding Store and a Retriever and name the table as test_content_retriever.

Insert Data into Database table and store it as Vectors

Query Data

Run the code by clicking on highlighted run button.

Important: Ensure that your Docker has been installed and the Docker engine is running. You can use Docker Desktop Optionally to check the status. Please check Docker Desktop terms and conditions and licensing requirements.

Alternatively, you can still monitor Docker run progress by using Docker command-line utilities. The alternative of running this from the command line would look like this.

 # /usr/bin/env /Users/madhusudhanrao/.vscode/extensions/redhat.java-1.34.0-darwin-x64/jre/17.0.12-macosx-x86_64/bin/java @/var
/folders/2h/23sml1d931q28lmpclvhwt7c0000gn/T/cp_2l27jzjbj6npykl5y4qypd2e4.argfile dev.langchain4j.OracleEmbeddingStoreExample         

Since i have installed Docker Desktop i can monitor the progress of images pulled and container created and running as shown below.

View Image pulled under Images Tab
View Docker Containers created and running

This will create 2 containers

Running containers

Click on gvenzl/oracle-free:23.5-full container and view the complete container logs.

Docker container log

Verify results in VSCode Terminal. We can see Vector Embeddings of a given string

View Vector Embeddings

Verify input data insert log

Data Insert Log
================================================
Inserting Record no. 1  TextSegment { text = "Guava is a common tropical fruit cultivated in many tropical and subtropical regions. The common guava Psidium guajava is a small tree in the myrtle family, native to Mexico, Central America, the Caribbean and northern South America" metadata = {} }
================================================
Inserting Record no. 2 TextSegment { text = "Some of the best Apple are from Kinnaur and Shimla in India and Aomori in Japan, Apple comes in various colors like red, green and yellow" metadata = {} }
================================================
Inserting Record no. 3  TextSegment { text = "Some of the best mangoes are from Sindhudurg, Raigad and Ratnagiri" metadata = {} }
================================================
Inserting Record no. 4  TextSegment { text = "A banana is an elongated, edible fruit botanically a berry produced by several kinds of large herbaceous flowering plants in the genus Musa. In some countries, cooking bananas are called plantains," metadata = {} }
=================================================        

Verify the output response for a given natural language question to Query the Vector Embeddings store.

Query Log or Print in console
=== Querying Oracle Database 23ai Embeddings ===
Question: What is large herbaceous flowering plants
Answer: TextSegment { text = "A banana is an elongated, edible fruit botanically a berry produced by several kinds of large herbaceous flowering plants in the genus Musa. In some countries, cooking bananas are called plantains," metadata = {} }
==================================================
Question: Where can i find best Apples
Answer:  TextSegment { text = "Some of the best Apple are from Kinnaur and Shimla in India and Aomori in Japan, Apple comes in various colors like red, green and yellow" metadata = {} }
=================================================
Question: Tell me about Guava
nAnswer:    TextSegment { text = "Guava is a common tropical fruit cultivated in many tropical and subtropical regions. The common guava Psidium guajava is a small tree in the myrtle family, native to Mexico, Central America, the Caribbean and northern South America" metadata = {} }
==================================================        
From the above Java code execution, We can see that our questions like "Where can I find the best Apples?" are answered as "Some of the best Apples are from Kinnaur and Shimla in India and Aomori in Japan. Apple comes in various colours like red, green and yellow" where the results have been searched in Oracle Database 23ai Vector store.

06. Source code - asset links

Download my version of pom.xml

Oracle AI Vector Search Integration with LangChain

Download Hello World Maven Java file

Download my version of Oracle Embedding Store Example Java file

Download Oracle version

LangChain4j Oracle Code Examples

Instructions on How to setup Apache Maven

Oracle Database Free Modules


About Oracle Database 23ai

Oracle Database 23ai , the latest release of Oracle’s converged database, is now generally available as a broad range of cloud services. This long-term support release includes Oracle AI Vector Search and more than 300 additional major features focused on simplifying the use of AI with data, accelerating app development, and running mission-critical workloads. The new AI Vector Search capabilities enable customers to securely combine search for documents, images, and other unstructured data with search on private business data, without moving or duplicating it. Oracle Database 23ai brings AI algorithms to where the data lives, instead of having to move the data to where the AI algorithm lives. This allows AI to run in real-time in Oracle databases, and greatly improves the effectiveness, efficiency, and security of AI.

read more.


07. Conclusion

In this article, we saw how to use LangChain4j along with Oracle Database 23ai docker image to create a simple Fruit Search application. Questions are stored in an Oracle database embedded vector store, users' queries are looked up in the vector store, and corresponding results based on natural language processing are displayed.


Thanks for reading, liking and sharing the article

Regards, Madhusudhan Rao


Other Articles in Oracle Database 23ai series


This is an interesting application of LLMs and Oracle Database 23ai. 23ai just came out and its refreshing to see it used with Java.

回复
Pablo Silberkasten

Senior Manager, Software Engineering at Oracle

1 个月

Excellent article! Thanks! From your 'examples' link, here is the one to the Oracle DB https://github.com/langchain4j/langchain4j-examples/blob/main/oracle-example/src/main/java/OracleEmbeddingStoreExample.java

要查看或添加评论,请登录

社区洞察

其他会员也浏览了