Top LLM Papers of the Week (March First Week 2024)
For video tutorials on top LLM papers, check my YouTube Channel.
[1] Not all Layers of LLMs are Necessary during Inference
The inference stage of LLMs being computationally expensive poses problems for real-time application use. During LLM inference, not every layer within an LLM is always actively used as per statistical analysis. AdaInfer is a new algorithm designed to decide when to stop inference depending on the input difficulty. Moreover, this algorithm doesn’t change LLM parameters and works across multiple tasks.
[2] SaulLM-7B: A pioneering Large Language Model for Law
SaulLM-7B is a large language model (LLM) specifically designed to understand and generate legal text. It is based on the Mistral 7B LLM. SaulLM-7B was trained on a massive dataset of English legal documents (over 30 billion tokens). SaulLM-7B exhibits state-of-the-art proficiency in understanding and processing legal documents.
[3] ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Existing pruning methods require extra information (like gradients) or complex. The authors proposed a new approach which involves removing less important layers based on new metric called Block Influence (BI).
[4] GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Training Large Language Models (LLMs) presents significant memory challenges because of their large sizes. This paper introduces GaLore, a new memory-efficient LLM training method.
[5] Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Chatbot Arena is a new open platform introduced to specifically address this evaluation challenge. For evaluation, this platform uses a pairwise comparison method, gathering human preferences through crowdsourcing. The platform gathered over 240K votes and has been successful.
[6] Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People
While so much medical knowledge exists in English, delivering effective healthcare often requires local languages, especially in regions with fewer medical resources. Multilingual medical LLMs (Apollo) are being developed to improve healthcare access in regions with limited resources and non-English speakers.
领英推荐
[7] Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
Birbal LLM is based on the Mistral-7B architecture and fine-tuned in 16 hours on a single RTX 4090 GPU. BirBal LLM outperformed the Qwen-14B model by a significant 35%. BirBal LLM’s success can be attributed to focused, high-quality instructions covering a wide range of tasks.
[8] A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
Along with remarkable text generation capabilities, LLMs pose serious risks like facilitating the spread of propaganda, misinformation, and disinformation at an alarming scale. In response to these dangers, a new field is rapidly developing called “AI-generated text forensics”. This area includes tools and techniques to fight the potential misuse of LLMs.
[9] LLMGuard: Guarding against Unsafe LLM Behavior
Sometimes, LLMs can generate inappropriate, biased, or factually incorrect responses. This might result in a violation of regulations and can lead to legal issues. LLMGuard is a tool which has the potential to address these LLM risks. LLMGuard can monitor user interactions with an LLM application and flags content against specific behaviours or conversation topics.
[10] Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges
Data Augmentation involves generating more labelled data to train deep learning models.Large Language Models can generate large amounts of realistic text data. This survey paper discusses the positive impact of LLMs on DA, including various strategies for using LLMs to generate new training data.
If you like this, do subscribe to the newsletter so that you won't miss reading interesting LLM papers.
Are you are interested in learning LLM Prompting Engineering, here is an excellent book (free and available online to read)
Enjoy learning and using LLMs. See you in the next week with another set of interesting LLM papers.
Let me know in the comments which paper you find most interesting out of these ten papers and why.
9-figure Digital Businesses Maker based on technology (Web2, Web3, AI, and noCode) | General Manager MOVE Estrella Galicia Digital & exAmazon
8 个月Your selection of papers is fascinating! Can't wait to dive in and explore more. ???? Kalyan KS
Exciting lineup of papers! Can't wait to dive in and learn more about the latest research in large language models. ??
Fascinating collection of papers – I'm particularly intrigued by "SaulLM-7B: A pioneering Large Language Model for Law," as it seems to have the potential to significantly impact legal research and accessibility.