Building a Q&A on custom docs using LangChain

Building a Q&A on custom docs using LangChain

Building Custom Q&A Model with LangChain

Welcome to this edition of our newsletter, where we'll explore the world of large language models (LLMs) and how they can be used to read custom documents.


About me:

Manisha Arora?is a Data Scientist at Google with 10 years' experience in driving business impact through data-driven decision making. She is currently leading ad measurement and experimentation for Ads across Search, YouTube, Shopping, and Display. She works with top Google advertisers to support their marketing and business objectives through data insights, machine learning, and experimentation solutions.


Overview of Language Models:

Language models have revolutionized the field of natural language processing, enabling machines to understand and generate human-like language. They are built using transformer architecture that involves training the model on massive amounts of text data.

If you are new to language models, check out this newsletter which talks about the evolution of Large Language Models and provides an overview about how they work.

In practice, large language models can be fine-tuned on specific tasks by adding additional layers on top of the pre-trained model and training on a smaller set of task-specific data. This allows the model to adapt to the specific nuances of the task and achieve state-of-the-art performance on a wide range of natural language processing tasks. We can also use these models to read custom text and use that for generating responses.

Langchain is a powerful deep learning framework that can be used to build custom Q&A models that can answer questions posed in natural language. In this article, we will explore how to build a custom Q&A model using Langchain.


Code Deep-dive:

This code is an attempt to build a first custom code using Langchain and OpenAI. The code is divided into two parts:

Part 1 - Getting comfortable with sequential prompts using Langchain

Part 2?- Building out a custom Q/A model using a file stored on personal drive

So let's get started...


Part 1 - Getting comfortable with Sequential Prompts through Langchain

No alt text provided for this image

Now that we have loaded the dependencies, let's get comfortable with some sequential prompts.

No alt text provided for this image

Great! It's working. Let's provide some user inputs.

No alt text provided for this image

Now let's take some user inputs and combine the chains.

No alt text provided for this image

Awesome, we have this working on user inputs. As a next step, we will use custom inputs to use Langchain to retrieve answers to user questions.

For this example, I have used a copy of 'The Mom Test' pdf, which is a great book if you want to get your business idea validating. This book is stored on my personal drive so I will mount my drive into colab and then train the model on the feature store generated from the pdf.

No alt text provided for this image

Once we have the pdf read into the colab, let's use a text splitter to extract the text from the pdf. I will use this store as embeddings and save it as a pickle file to be used for custom Q&A.

No alt text provided for this image

Amazing! Our embeddings store is now ready. Let's use this to test out if we are able to extract the right information from the pdf.

No alt text provided for this image

Good job! This is getting exciting. Let's test out with some more questions.

No alt text provided for this image
No alt text provided for this image

Yay! We were able to generate Q&A using custom documents :)

As a next step, this can be integrated in your application as a chatbot. Hope you had fun with it!

Check out the full code here.

No alt text provided for this image

If you are looking to hone up your skills in Data Science,?PrepVector?offers a comprehensive course led by experienced professionals. You will gain skills in Product Sense, AB Testing, Machine Learning and more through a series of live coaching sessions, industry mentors, and personalized career coaching sessions. In addition, you will also compound your skills by learning with like-minded professionals and sharing your learnings with the larger community along the way.

The next cohort will kick off May 29, 2023. Book a free consultation to know more! ??

Check out our previous newsletters:

  1. Evolution of LLMs and their Impact on Search
  2. Causal Inference Fundamentals
  3. Trends and Career Paths in Data Science
  4. Search Rankings and Recommendations
  5. Skills and Growth as a Product Data Scientist


No alt text provided for this image


James Audry Spencer

CxO chez 199a Consulting

1 年

Nice one, thank you. I'm getting crazy about LangChain and its capabilities !

回复
Abhishek Ratna

Head of AI Product Marketing | x-Google, Meta, Microsoft | Advisor

1 年

Great read! If you wanted to store the embeddings on the cloud instead of your personal device, how would you change that?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了