Building a Q&A on custom docs using LangChain
Manisha Arora
Data Science, Google Ads | Data Science Coach | Helping Data Scientists Level Up in their Careers
Building Custom Q&A Model with LangChain
Welcome to this edition of our newsletter, where we'll explore the world of large language models (LLMs) and how they can be used to read custom documents.
About me:
Manisha Arora?is a Data Scientist at Google with 10 years' experience in driving business impact through data-driven decision making. She is currently leading ad measurement and experimentation for Ads across Search, YouTube, Shopping, and Display. She works with top Google advertisers to support their marketing and business objectives through data insights, machine learning, and experimentation solutions.
Overview of Language Models:
Language models have revolutionized the field of natural language processing, enabling machines to understand and generate human-like language. They are built using transformer architecture that involves training the model on massive amounts of text data.
If you are new to language models, check out this newsletter which talks about the evolution of Large Language Models and provides an overview about how they work.
In practice, large language models can be fine-tuned on specific tasks by adding additional layers on top of the pre-trained model and training on a smaller set of task-specific data. This allows the model to adapt to the specific nuances of the task and achieve state-of-the-art performance on a wide range of natural language processing tasks. We can also use these models to read custom text and use that for generating responses.
Langchain is a powerful deep learning framework that can be used to build custom Q&A models that can answer questions posed in natural language. In this article, we will explore how to build a custom Q&A model using Langchain.
Code Deep-dive:
This code is an attempt to build a first custom code using Langchain and OpenAI. The code is divided into two parts:
Part 1 - Getting comfortable with sequential prompts using Langchain
Part 2?- Building out a custom Q/A model using a file stored on personal drive
So let's get started...
Part 1 - Getting comfortable with Sequential Prompts through Langchain
Now that we have loaded the dependencies, let's get comfortable with some sequential prompts.
Great! It's working. Let's provide some user inputs.
领英推荐
Now let's take some user inputs and combine the chains.
Awesome, we have this working on user inputs. As a next step, we will use custom inputs to use Langchain to retrieve answers to user questions.
For this example, I have used a copy of 'The Mom Test' pdf, which is a great book if you want to get your business idea validating. This book is stored on my personal drive so I will mount my drive into colab and then train the model on the feature store generated from the pdf.
Once we have the pdf read into the colab, let's use a text splitter to extract the text from the pdf. I will use this store as embeddings and save it as a pickle file to be used for custom Q&A.
Amazing! Our embeddings store is now ready. Let's use this to test out if we are able to extract the right information from the pdf.
Good job! This is getting exciting. Let's test out with some more questions.
Yay! We were able to generate Q&A using custom documents :)
As a next step, this can be integrated in your application as a chatbot. Hope you had fun with it!
Check out the full code here.
If you are looking to hone up your skills in Data Science,?PrepVector?offers a comprehensive course led by experienced professionals. You will gain skills in Product Sense, AB Testing, Machine Learning and more through a series of live coaching sessions, industry mentors, and personalized career coaching sessions. In addition, you will also compound your skills by learning with like-minded professionals and sharing your learnings with the larger community along the way.
The next cohort will kick off May 29, 2023. Book a free consultation to know more! ??
Check out our previous newsletters:
CxO chez 199a Consulting
1 年Nice one, thank you. I'm getting crazy about LangChain and its capabilities !
Head of AI Product Marketing | x-Google, Meta, Microsoft | Advisor
1 年Great read! If you wanted to store the embeddings on the cloud instead of your personal device, how would you change that?