ChatQA - NIVIDIA'S GPT-4 Level Conversational QA Models & Meta AI's Self-Rewarding Language Models -
Aditi Khare
AWS & AI Research [LLMs & Vision]-Principal Machine Learning Scientist & AI Architect | IIM-A | Author | Inference Optimization | Hyperspectral Imaging | Open-Source Dev | Build Production-Grade AI Products from Scratch
This paper presents family of ChatQA models, varying in model sizes from 7B to 70B. Performs comprehensive valuations on 10 conversational QA datasets that the best ChatQA-70B model can remarkably outperforms GPT3.5-turbo and perform on par with GPT-4 without using any synthetic data from ChatGPT.
In addition, Adding Fine-tuning a single-turn query retrieve using curated conversational QA data performs comparable to the state-of-the-art LLM-based query rewriting model, without the need of extra computational time and potential API cost from rewriting. Also shows incorporating a small amount of “unanswerable” samples can significantly enhance our model’s capability to handle scenarios where answers are unavailable. The unanswerable case evaluation highlights that our best model ChatQA-70B only has a slight gap compared to GPT-4.
Paper Key Highlights -
1. Nividia's ChatQA Models for Conversational QA.
2. Presents Two-Stage Instruction tuning boosts zero-shot QA accuracy approaches.
3. Its Fine-tuned Retriever matches SOTA Rewriting with Lower cost.
4. With ChatQA-70B it matches GPT-4 accuracy without any synthetic data.
Evaluation Metrics -
F1 score is the most commonly used automatic metric to assess QA models, used it for all datasets except for ConvFinQA as they are about extracting numbers from documents as well as arithmetic calculations. Hence, the answer only makes sense when it is exactly the same as the answer. When models generate the arithmetic formula, It will calculate its final result based on a calculator and compare it with the gold answer.
Reference Paper Link - https://arxiv.org/pdf/2401.10225v1.pdf
2. Meta AI's Self-Rewarding Language Models -
Meta's "Self-Rewarding Language Models" are designed to improve themselves and complement or, in the future, completely replace human-dependent feedback methods.
领英推荐
This paper presents an approach that assumes access to a base pretrained language model, and a small amount of human-annotated seed data and then develop a model that aims to possess two skills simultaneously -
These skills are used so that the model can perform self-alignment, i.e., they are the components used to iteratively train itself using AI Feedback (AIF).
Self-Rewarding Language Models, where the language model itself is used via LLM-as-a-Judge prompting to provide its own rewards during training. Paper suggests that during Iterative DPO training that not only does instruction following ability improve, but also the ability to provide high-quality rewards to itself.
Fine-tuning Llama 2 70B on three iterations approach yields a model that outperforms many existing systems on the AlpacaEval 2.0 leaderboard, including Claude 2, Gemini Pro, and GPT-4 0613.
Reference Paper Link - https://arxiv.org/abs/2401.10020
For more information on AI Research Papers you can visit my Github Profile -
For Receving latest updates on Advancements in AI Research Gen-AI, Quantum AI & Computer Vision you can subscribe to my AI Research Papers Summaries Newsletter using below link -
Thank you & Happy Reading !