??Top ML Papers of the Week
The Top ML Papers of the Week (Jan 29 - Feb 4)
1). OLMo - introduces Open Language Model (OLMo), a 7B parameter model; it includes open training code, open data, full model weights, evaluation code, and fine-tuning code; it shows strong performance on many generative tasks; there is also a smaller version of it, OLMo 1B. (paper | tweet)
2). Advances in Multimodal LLMs - a comprehensive survey outlining design formulations for model architecture and training pipeline around multimodal large language models. (paper | tweet)
3). Corrective RAG - proposes Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation in a RAG system; the core idea is to implement a self-correct component for the retriever and improve the utilization of retrieved documents for augmenting generation; the retrieval evaluator helps to assess the overall quality of retrieved documents given a query; using web search and optimized knowledge utilization operations can improve automatic self-correction and efficient utilization of retrieved documents. (paper | tweet)
4). LLMs for Mathematical Reasoning - introduces an overview of research developments in LLMs for mathematical reasoning; discusses advancements, capabilities, limitations, and applications to inspire ongoing research on LLMs for Mathematics. (paper | tweet)
5). Compression Algorithms for LLMs - covers compression algorithms like pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. (paper | tweet)
领英推荐
6). MoE-LLaVA - employs Mixture of Experts tuning for Large Vision-Language Models which constructs a sparse model with a substantial reduction in parameters with a constant computational cost; this approach also helps to address performance degradation associated with multi-modal learning and model sparsity. (paper | tweet)
7). Rephrasing the Web - uses an off-the-shelf instruction-tuned model prompted to paraphrase web documents in specific styles and formats such as “like Wikipedia” or “question-answer format” to jointly pre-train LLMs on real and synthetic rephrases; it speeds up pre-training by ~3x, improves perplexity, and improves zero-shot question answering accuracy on many tasks. (paper | tweet)
8). Redefining Retrieval in RAG - a study that focuses on the components needed to improve the retrieval component of a RAG system; confirms that the position of relevant information should be placed near the query, the model will struggle to attend to the information if this is not the case; surprisingly, it finds that related documents don't necessarily lead to improved performance for the RAG system; even more unexpectedly, irrelevant and noisy documents can help drive up accuracy if placed correctly. (paper | tweet)
9). Hallucination in LVLMs - discusses hallucination issues and techniques to mitigate hallucination in Large Vision-Language Models (LVLM); it introduces LVLM hallucination evaluation methods and benchmarks; provides tips and a good analysis of the causes of LVLM hallucinations and potential ways to mitigate them. (paper | tweet)
10). SliceGPT - a new LLM compression technique that proposes a post-training sparsification scheme that replaces each weight matrix with a smaller dense matrix; helps reduce the embedding dimension of the network and can remove up to 20% of model parameters for Llama2-70B and Phi-2 models while retaining most of the zero-shot performance of the dense models. (paper | tweet)
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
7 个月The lineup of ML papers for the week looks incredibly promising, covering a wide spectrum of topics. From OLMo to Corrective RAG and advances in multimodal LLMs, there's a wealth of innovation happening.I'm curious, which of these papers caught your attention the most, and why? How do you envision the impact of these recent advancements in the field of machine learning?
Machine Learning at Lightning AI ?? | Building LLM powered applications
7 个月great read!!