Dense Passage Retrieval for Open-Domain Question Answering
Harshit Goyal
Sr. BDM & Cloud Consultant @ E2E Networks - NVIDIA Partners in India | IaaS | Cloud Strategy
What is Dense Passage Retrieval?
Dense Passage Retrieval is a technique for open-domain question answering that aims to retrieve relevant passages from a large corpus of unstructured text. Unlike traditional information retrieval (IR) techniques that rely on sparse representations, DPR uses dense representations adapted from deep neural networks, which are then used to encode text passages and questions.
The basic idea of DPR is to precompute dense vector representations of text and store them in a search index. DPR uses dense representations comprehended from deep neural networks to encode text passages and questions. Given a user’s query, DPR retrieves the most relevant passages from the index based on the similarity between their representations and the representation of the query. Once the relevant passages are retrieved, a downstream model can extract the answers to the question asked.?
Why DPR?
DPR has several advantages over traditional IR techniques for ODQA. First, dense representations have a great ability to capture the semantic similarity between text passages, which leads to precise retrieval. Second, by precomputing the representations and storing them in an index, DPR can achieve faster retrieval times than traditional techniques that compute similarity on the fly. DPR on the other hand utilizes vector-dense representations of documents and queries, as the dense vectors capture refined and contextualized semantic information in the text. DPR can be used to improve a variety of text-based applications.
What is Open Domain Question Answering?
Open Domain Question Answering is a type of linguistic task that asks a model to produce answers to factoid questions in natural language. The true answer is objectively produced, so it is simpler to evaluate model performance. The open domain part refers to the lack of relevant context for any arbitrarily asked factual question.?
What is an ODQA model?
An ODQA model may work for us without accessing an external source of knowledge. In the Open-Domain Question Answering task (ODQA), questions could be about nearly anything relying on world knowledge. The challenge is that the context containing relevant information about the question is not provided. This is in contrast to the standard reading comprehension task in which a passage containing the answer span is provided with the question.?
How do you build retrievers for question-answering?
Given a factoid question, if a language model has no context or is not large enough to memorize the context which exists in the training dataset, it is unlikely to guess the correct answer. In an open-book exam, students are allowed to refer to external resources like notes and books while answering test questions. Similarly, an ODQA system can be paired with a rich knowledge base to identify relevant documents as evidence of the answers. We can decompose the process of finding answers to given questions into two stages:
Fig. 2. The retriever-reader QA framework combines information retrieval with machine reading comprehension.
Such a retriever + reader framework was first proposed in?DrQA?(“Document retriever Question-Answering”). The retriever and the reader components can be set up and trained independently or trained together from end to end.
Datasets to train your DPR models:
Github source code:?https://microsoft.github.io/msmarco/Datasets.html
Github source code:https://github.com/microsoft/msmarco/blob/master/TREC-Deep-Learning-2021.md
Github source code:?https://paperswithcode.com/dataset/openwebtext#:~:text=OpenWebText%20is%20an%20open%2Dsource,with%20at%20least%20three%20upvotes .
Papers with code:?https://paperswithcode.com/dataset/aristo-v4
Launch A100 80GB Cloud GPU on E2E Cloud for training your DPR model for open domain question answering:
After launching the A100 80GB Cloud GPU from the Myaccount portal, you can deploy any DPR model for open domain question answering.?
E2E Networks is the leading accelerated Cloud Computing player which provides the latest Cloud GPUs at a great value. Connect with us at?[email protected]
Request a free trial here:?https://bit.ly/3tbr7gn