登录查看更多内容

Building a Q&A on custom docs using LangChain

Manisha Arora

Data Science, Google Ads | Data Science Coach | Helping Data Scientists Level Up in their Careers

发布日期: 2023年5月9日

Building Custom Q&A Model with LangChain

Welcome to this edition of our newsletter, where we'll explore the world of large language models (LLMs) and how they can be used to read custom documents.

About me:

Manisha Arora?is a Data Scientist at Google with 10 years' experience in driving business impact through data-driven decision making. She is currently leading ad measurement and experimentation for Ads across Search, YouTube, Shopping, and Display. She works with top Google advertisers to support their marketing and business objectives through data insights, machine learning, and experimentation solutions.

Overview of Language Models:

Language models have revolutionized the field of natural language processing, enabling machines to understand and generate human-like language. They are built using transformer architecture that involves training the model on massive amounts of text data.

If you are new to language models, check out this newsletter which talks about the evolution of Large Language Models and provides an overview about how they work.

In practice, large language models can be fine-tuned on specific tasks by adding additional layers on top of the pre-trained model and training on a smaller set of task-specific data. This allows the model to adapt to the specific nuances of the task and achieve state-of-the-art performance on a wide range of natural language processing tasks. We can also use these models to read custom text and use that for generating responses.

Langchain is a powerful deep learning framework that can be used to build custom Q&A models that can answer questions posed in natural language. In this article, we will explore how to build a custom Q&A model using Langchain.

Code Deep-dive:

This code is an attempt to build a first custom code using Langchain and OpenAI. The code is divided into two parts:

Part 1 - Getting comfortable with sequential prompts using Langchain

Part 2?- Building out a custom Q/A model using a file stored on personal drive

So let's get started...

Part 1 - Getting comfortable with Sequential Prompts through Langchain

Now that we have loaded the dependencies, let's get comfortable with some sequential prompts.

Great! It's working. Let's provide some user inputs.

Suman Biswas 10 个月前

Exploring Llama 2: Open-Source LLM Advancements &…

Seaflux 8 个月前

Choosing the Right Tool: LangChain or LlamaIndex?

Vamshee Krishna 4 个月前

Now let's take some user inputs and combine the chains.

Awesome, we have this working on user inputs. As a next step, we will use custom inputs to use Langchain to retrieve answers to user questions.

For this example, I have used a copy of 'The Mom Test' pdf, which is a great book if you want to get your business idea validating. This book is stored on my personal drive so I will mount my drive into colab and then train the model on the feature store generated from the pdf.

Once we have the pdf read into the colab, let's use a text splitter to extract the text from the pdf. I will use this store as embeddings and save it as a pickle file to be used for custom Q&A.

Amazing! Our embeddings store is now ready. Let's use this to test out if we are able to extract the right information from the pdf.

Good job! This is getting exciting. Let's test out with some more questions.

Yay! We were able to generate Q&A using custom documents :)

As a next step, this can be integrated in your application as a chatbot. Hope you had fun with it!

Check out the full code here.

If you are looking to hone up your skills in Data Science,?PrepVector?offers a comprehensive course led by experienced professionals. You will gain skills in Product Sense, AB Testing, Machine Learning and more through a series of live coaching sessions, industry mentors, and personalized career coaching sessions. In addition, you will also compound your skills by learning with like-minded professionals and sharing your learnings with the larger community along the way.

The next cohort will kick off May 29, 2023. Book a free consultation to know more! ??

Check out our previous newsletters:

Data Science Growth Series

4,226 位关注者

James Audry Spencer

CxO chez 199a Consulting

1 年

Nice one, thank you. I'm getting crazy about LangChain and its capabilities !

Abhishek Ratna

Head of AI Product Marketing | x-Google, Meta, Microsoft | Advisor

1 年

Great read! If you wanted to store the embeddings on the cloud instead of your personal device, how would you change that?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Building a Q&A on custom docs using LangChain

Manisha Arora

Data Science, Google Ads | Data Science Coach | Helping Data Scientists Level Up in their Careers

领英推荐

Data Science Growth Series

4,226 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Navigating the AI Frontier: The Crucial Role of Vector Databases in LLM Model Training

Google MUM update: What can SEOs expect in the future?

Unlock the Power of Llama3 8B Model with Apple MLX Server and Chainlit

RAG Techniques Every AI/ML/Data Engineer Should Know!

Text-to-SQL business application - Part 5 Natural Language Processing

Almost Timely News: How Large Language Models Are Changing Everything (2023-03-19)

Retrieval-Augmented Generation (RAG): A Comprehensive Overview

Fine-Tuning Gemma2 9B: Adapting Google’s New LLM with Custom Data

Part 1 : Automatic Exploratory Data Analysis of Tabular Data Using Large Language Models and LIDA

A brief overview of large langauge model (LLM)

领英推荐

Data Science Growth Series

4,226 位关注者

Data Dialogues: Navigating the Data Science Landscape [Part 2 of 2]

2024年5月9日

Data Dialogues: Navigating the Data Science Landscape [Part 1 of 2]

2024年5月1日

Applied Machine Learning Projects: Course Launch

2024年3月1日

Experimentation-driven Product Development

2023年10月30日

Evolution of Language Models and Their Impact on Search

2023年4月25日

Causal Inference Fundamentals

2023年3月23日

Trends and Career Paths in Data Science

2023年2月20日

Excerpts from Immigration AMA

2022年12月14日

Search Rankings & Recommendations

2022年10月3日

Skills and Career Growth as a Product Data Scientist

2022年8月18日

社区洞察

其他会员也浏览了

Navigating the AI Frontier: The Crucial Role of Vector Databases in LLM Model Training

Google MUM update: What can SEOs expect in the future?

Unlock the Power of Llama3 8B Model with Apple MLX Server and Chainlit

RAG Techniques Every AI/ML/Data Engineer Should Know!

Text-to-SQL business application - Part 5 Natural Language Processing

Almost Timely News: How Large Language Models Are Changing Everything (2023-03-19)

Retrieval-Augmented Generation (RAG): A Comprehensive Overview

Fine-Tuning Gemma2 9B: Adapting Google’s New LLM with Custom Data

Part 1 : Automatic Exploratory Data Analysis of Tabular Data Using Large Language Models and LIDA

A brief overview of large langauge model (LLM)