登录查看更多内容

Top LLM Papers of the Week (September Week 2, 2024)

Kalyan KS

发布日期: 2024年9月14日

[1] OpenAI’s o1 Technical Report

Recently, OpenAI introduced o1, a new series of LLMs. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.? o1 thinks before it answers—it can produce a long chain of thought before responding to the user.? OpenAI o1-preview is the early version of this model and OpenAI o1-mini is a faster version of this model that is particularly effective at coding. [Tweet] and [Paper] ?

[2] MemoRAG

Conventional RAG systems struggle with ambiguous queries. MemoRAG overcomes this limitation? by using a dual LLM architecture. MemoRAG uses light but long-range LLM for draft answer generation and uses expensive LLM for generating the final answer. MemoRAG achieves superior performance compared to conventional RAG across a variety of evaluation tasks. [Tweet] and [Paper]

[3] Can LLMs Generate Research Ideas?

This paper investigates whether LLMs can generate novel scientific research ideas. Evalatuion of 4 LLMs in five domains (e.g., Chemistry, Computer, Economics, Medical, and Physics) showed that the (i) future research ideas generated by Claude-2 and GPT-4 are more aligned with the author’s perspective than GPT-3.5 and Gemini and (ii) Claude-2 generates more diverse future re- search ideas than GPT-4, GPT-3.5, and Gemini.??[Tweet] and [Paper]

[4] Chain-of-Translation Prompting (CoTR)

This paper introduces Chain of Translation Prompting (CoTR), a novel prompting technique that modifies the input prompt to encapsulate the translation of the non-English input context to English, followed by performing the target task on the translated text. This prompting technique significantly improve multilingual LLM performance in low-resource languages. The CoTR prompting strategy shows the most significant improvements in complex tasks like hate speech detection and sentiment analysis. The technique also has the potential to enhance the quality of synthetic data generation for underrepresented languages using LLMs. [Tweet] and [Paper]

[5] Scalable Multilingual Retrieval Model

This paper introduces NLLB-E5: A Scalable Multilingual Retrieval Model which leverages the in-built multilingual capabilities in the NLLB encoder. NLLB-E5 addresses the urgent need for a scalable, and language-agnostic text retrieval model.? [Tweet] and [Paper]

Parul Gautam 4 个月前

What Do Claude 3.5 Sonnet & CriticGPT Bring to the LLM…

Arbisoft 4 个月前

Survey of Multimodal LLMs; Meet GOAT-7B-Community…

Danny Butvinik 1 年前

[6] SciAgents

This paper introduces SciAgents to automate scientific discovery using multi-agent systems. SciAgents leverage (1) large-scale ontological knowledge graphs to organize and interconnect diverse scientific concepts, (2) a suite of large language models (LLMs) and data retrieval tools, and (3) multi-agent systems. SciAgents when Applied to biologically inspired materials reveals hidden interdisciplinary relationships that were previously considered unrelated.?[Tweet] and [Paper]

[7] TravelAgent

This paper introduces TravelAgent, a travel planning system powered by large language models (LLMs) designed to provide reasonable, comprehensive, and personalized travel itineraries grounded in dynamic scenarios. TravelAgent comprises four modules: Tool-usage, Recommendation, Planning, and Memory Module. [Tweet] and [Paper]

[8] Role of Small Models in the LLM Era (Survey)

Scaling up model sizes results in exponentially higher computational costs and energy consumption, making these models impractical for academic researchers and businesses with limited resources. This paper presents a comprehensive survey of role of small models in the LLM era in two perspectives : Collaboration and Competition.? This survey provides valuable insights for LLM practitioners. [Tweet] and [Paper]

[9] Can LLMs Generate Novel Research Ideas?

This paper examines whether LLMs can generate novel research ideas by performing a human study with 100+ NLP researchers. The study shows that LLM-generated ideas are judged as more novel than human expert ideas while being judged slightly weaker on feasibility. LLMs has the potential to accelerate accelerate scientific discovery.?[Tweet] and [Paper]

[10] Evaluating LLMs in Clinical Applications

This paper introduces MEDIC, a framework for assessing LLMs across five critical dimensions of clinical competence: medical reasoning, ethics and bias, data and language understanding, in-context learning, and clinical safety. Experiment results show performance disparities across model sizes and? baseline vs medically finetuned models. [Tweet] and [Paper]

If you like this, do subscribe to the newsletter so that you won't miss any of the interesting LLM and RAG-related papers.