ChatGPT with your own data
Have you ever considered building chatbot like ChatGPT but on your own data?
Let's use FlowiseAI? to build a chatbot LLM app (Large language model).
FlowiseAI is an open-source platform powered by LangChain (open source framework) to develop LLM applications.
For training purposes, I have chosen MicroStrategy’s (my favorite company) annual report for 2022 in PDF format.
To start with you will need 2 API keys. One for ChatGPT and the second for a vector database - for example Pinecone, but it could be any other vector database (See short description for Vector database and vector search).
To install FlowiseAI refer to its documentation pages.
After succesful installation of FlowiseAI type https://localhost:3000, and you are ready to go.
To jumpstart with your first LLM app, use one of the provided chartflow templates from Marketplace (left side menu).
To build chatbot I would suggest to use “Conversational retrieval QA Chain”.
I have replaced the text file “document loader” with PDF “document loader” component because I have experienced when querying the chatbot with the text “document loader”, it could not return structured data, e.g., a table.
When I used PDF file, I could prompt chatbot with following and get result in table format:
“Please summarizes employee headcount for each year, group by department.”
Her are some prompts used for querying chatbot:
********************************************************
领英推荐
For future readings and comprehensive information and resources on LangChain, a framework for building applications with Large Language Models (LLMs), you can explore the following websites:
1. LangChain Official Website: The main site for LangChain, offering a complete overview of the framework, including its capabilities and use-cases. It provides links to documentation, blogs, and various LangChain products like LangSmith, Retrieval, and Agents.
2. LangChain Blog: This blog offers insights and updates on LangChain, discussing various aspects of its implementation and use in different projects.
3. GitHub - LangChain: For those interested in the technical and developmental side of LangChain, the GitHub repository is an invaluable resource. It houses the framework's codebase, issue tracking, and collaborative tools for developers.
4. AWS - What is LangChain?: Amazon Web Services provides an explanation of LangChain, highlighting its purpose, components, and applications in the context of LLMs.
5. Nanonets - LangChain Guide & Tutorial: Nanonets offers a comprehensive guide and tutorial on LangChain, which can be particularly helpful for those looking to understand how to effectively use this framework for developing intelligent applications.
These resources collectively offer a broad and detailed view of LangChain, from its basic principles to more complex applications and community contributions. Whether you're a developer looking to implement LangChain in your projects, or simply interested in learning more about this framework, these sites provide valuable information and insights.
Description for Vector database and Vector search:
Vector database is A vector database is a type of database designed specifically to handle vector embeddings typically used in machine learning and similar applications. These databases are optimized for storing and rapidly retrieving vectors, which represent complex data points like text, images, or sounds in a format that machines can understand. Vector databases facilitate efficient similarity searches, allowing for quick and accurate retrieval of items based on their content rather than just metadata or keyword matches. They are crucial in powering applications that require fast and semantically accurate search capabilities, like recommendation systems or semantic text searches.
Vector search refers to the method of searching through a database or collection of data by converting the query and the documents into vectors in a high-dimensional space. It calculates the similarity between the query vector and the document vectors, often using measures like cosine similarity. This approach is particularly effective for tasks like semantic search, where the goal is to find the most relevant items based on the meaning of the query, rather than exact keyword matches. It's widely used in information retrieval, natural language processing, and similar applications.
WARNING:
It is not safe to post sensitive or confidential content on ChatGPT or any other public conversational AI. OpenAI, the organization behind ChatGPT, advises users against sharing any sensitive, confidential, or personally identifiable information while interacting with the model. The data shared can potentially be used for model training purposes and, while there are safeguards in place, absolute privacy and confidentiality cannot be guaranteed. Always refrain from sharing anything that you wouldn't want to be public or that could lead to harm if disclosed. If you have sensitive content to work with, consider using localized or private instances of AI models, and always consult with the respective privacy policies and terms of service.
Principal BI Architect & Owner at Insights
9 个月Super interessant case Srdjan ??