AI News Bytes: Tired of trying to get RL to work with Human Feedback? Try this new method - SLiC; LLMs Outperform Reinforcement Learning- Meet SPRING

AI News Bytes: Tired of trying to get RL to work with Human Feedback? Try this new method - SLiC; LLMs Outperform Reinforcement Learning- Meet SPRING

Technology Innovation Institute Open-Sourced Falcon LLMs:?A New AI Model That Uses Only 75 Percent of GPT-3’s Training Compute, 40 Percent of Chinchilla’s, and 80 Percent of PaLM-62B’s. Falcon-40B is a powerful decoder-only model developed by TII (Technology Innovation Institute) and trained on a vast amount of data consisting of 1,000B tokens from RefinedWeb and curated corpora. This model is available under the TII Falcon LLM License. Falcon-7B is a highly advanced causal decoder-only model TII (Technology Innovation Institute) developed. It boasts an impressive parameter count of 7B and has been trained on an extensive dataset of 1,500B tokens derived from RefinedWeb, further enhanced with curated corpora. This model is made accessible under the TII Falcon LLM License.


Tired of trying to get RL to work with Human Feedback??Try this new method - SLiC: Sequence level calibration using human feedback. The?research paper?shows how the recently introduced Sequence Likelihood Calibration (SLiC), can also be used to effectively learn from human preferences (SLiC-HF). Furthermore, the team demonstrates this can be done with human feedback data collected for a different model, similar to off-policy, offline RL data. Automatic and human evaluation experiments on the TL;DR summarization task show that SLiC-HF significantly improves supervised fine-tuning (SFT) baselines. Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune, and more computationally efficient in practice.


LLMs Outperform Reinforcement Learning- Meet SPRING: An Innovative Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning.?SPRING?is an LLM-based policy that outperforms Reinforcement Learning algorithms in an interactive environment requiring multi-task planning and reasoning. A group of?researchers?from?Carnegie Mellon University,?NVIDIA,?Ariel University, and?Microsoft?have investigated the use of Large Language Models (LLMs) for understanding and reasoning with human knowledge in the context of games. They propose a two-stage approach called?SPRING, which involves studying an academic paper and then using a Question-Answer (QA) framework to justify the knowledge obtained.


Can language models help us do a better search????Researchers from MIT and Meta AI propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering. EAR first applies a query expansion model to generate a diverse set of queries, and then uses a query reranker to select the ones that could lead to better retrieval results. Motivated by the observation that the best query expansion often is not picked by greedy decoding, EAR trains its reranker to predict the rank orders of the gold passages when issuing the expanded queries to a given retriever. By connecting better the query expansion model and retriever, EAR significantly enhances a traditional sparse retrieval method, BM25.


Meta, CMU, USC, and Tel Aviv University Propose LIMA?- a new 65B parameter LLaMa model fine-tuned on 1000 carefully curated prompts and responses. It doesn't use RLHF. It generalizes well to unseen tasks not in the training data. LIMA responses are equivalent or preferred to GPT-4 in 43% of cases, and even higher compared to Bard and davinci003. It's remarkable you can get high-quality outputs with such a simple approach and limited instruction tuning. While LIMA shows strong performance across a wide range of prompts and tasks, scaling up examples is challenging.


What if LLM Hallucinations Were A Feature And Not A Bug? Meet dreamGPT:?An Open-Source GPT-Based Solution That Uses Hallucinations From Large Language Models (LLMs) As A Feature.?This innovative approach?helps in generating unique and creative ideas. While on the one hand, where hallucinations are typically associated with a negative connotation and are mostly referred to as a drawback of LLMs, DreamGPT enables the transformation of hallucinations into something valuable for generating innovative solutions.?


Do Video-Language Models really understand actions? If not, how can we fix it??In this work, a research group first proposes ActionBench to diagnose VidLMs’ action knowledge. Surprisingly, SOTA VidLMs still struggle to distinguish “falling” v.s. “rising”; as well as reversed video. To resolve the issue, the research group proposes a novel framework, PAXION, along with the Discriminative Video Dynamics Modeling (DVDM) objective. Together, they were able to patch the missing action knowledge (~50% -> 80%) into frozen VidLMs without compromising their general VL capabilities.


Can pretrained language models (LMs) go beyond learning from labels and scalar rewards??LeTI, a new LM finetuning paradigm that explores LMs' potential to learn from textual interactions & feedback, allowing LMs to understand not just if they were wrong but why. LeTI focuses on code generation tasks where models produce code from natural language instructions. This allows us to acquire automatic textual feedback in a natural and scalable way: error messages and stack traces from a Python interpreter.


Featured AI Tools For This Newsletter Issue:

Bright Data

DoNotPay

AdCreative.ai

BuzzSumo

tinyEinstein

Find 100s of cool artificial intelligence (AI) tools. Our expert team reviews and provides insights into some of the most cutting-edge AI tools available.?Check out?AI Tools Club


Exciting research updates indeed! Staying up-to-date with the latest advancements in AI is vital for business growth and development. If you are interested in AI business applications, then subscribe to our Good AI Vibes newsletter to receive updates on the latest research and use cases here: https://goodaivibes.substack.com/ ?? Our newsletter will equip you with insights into how AI can benefit your business. Let's stay ahead in the AI game together! #goodaivibes #ai #artificialintelligence

要查看或添加评论,请登录

社区洞察

其他会员也浏览了