??????? LLMs Opening Their Inner Eyes

?????? LLMs Opening Their Inner Eyes


In this issue:

  1. LLaMA-2 performance at 0.001x the price
  2. Trying to unify LLM evaluation
  3. How the “Mind’s Eye” might help LLMs to “think” better


Click here to support


1. JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars

Watching: JetMoE (report/code)

What problem does it solve? Training Large Language Models (LLMs) has been notoriously expensive, with some models like GPT-3 costing over $10 million to train. This has led to a concentration of LLM development in a few well-resourced labs, limiting the democratization and diversity of these powerful AI tools. JetMoE-8B demonstrates that high-performing LLMs can be trained at a fraction of the cost, potentially opening up LLM research and application to a much wider range of institutions and developers.

How does it solve the problem? JetMoE-8B leverages a sparsely activated architecture inspired by ModuleFormer. While the model has 8 billion parameters in total, only 2.2 billion parameters are active during inference. This is achieved through the use of Mixture of Experts (MoE) layers, specifically Mixture of Attention heads (MoA) and Mixture of MLP Experts. Each MoA and MoE layer has 8 experts, but only 2 experts are activated for each input token. This sparse activation drastically reduces computational cost during inference while still allowing the model to learn from a large parameter space during training.

What's next? The development of JetMoE-8B could mark a significant shift in the accessibility of LLM technology. By demonstrating that high-performing models can be trained at a relatively low cost using only publicly available resources, this work may inspire more labs to research model pre-training.


2. Evalverse: Unified and Accessible Library for Large Language Model Evaluation

Watching: Evalverse (paper/code)

What problem does it solve? Evaluating Large Language Models (LLMs) can be a challenging task, especially for individuals without extensive AI expertise. The process often involves using multiple disparate tools, which can be time-consuming and complex. This fragmented approach to LLM evaluation makes it difficult for researchers and practitioners to comprehensively assess the performance of these models, hindering progress in the field.

How does it solve the problem? Evalverse addresses this issue by providing a unified, user-friendly framework that integrates various evaluation tools into a single library. By centralizing the evaluation process, Evalverse simplifies the task of assessing LLMs, making it accessible to a wider audience. The library's integration with communication platforms like Slack further enhances its usability, allowing users to request evaluations and receive detailed reports with ease.

What's next? The introduction of Evalverse opens up new possibilities for the widespread adoption of LLM evaluation. As more researchers and practitioners begin to utilize this centralized framework, we can expect to see a proliferation of insights into the performance and capabilities of LLMs. This, in turn, may drive further advancements in the field, as the increased accessibility of evaluation tools enables a broader range of individuals to contribute to the development and refinement of these powerful models.


3. Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Watching: VoT (paper)

What problem does it solve? Spatial reasoning, the ability to understand and manipulate spatial relationships between objects, is a fundamental aspect of human cognition. While Large Language Models (LLMs) have shown remarkable performance in various language comprehension and reasoning tasks, their capabilities in spatial reasoning have not been extensively explored. The Mind's Eye, a cognitive process that allows humans to create mental images of unseen objects and actions, is a key component of spatial reasoning. Developing methods to enhance spatial reasoning abilities in LLMs could lead to more human-like reasoning and problem-solving capabilities.

How does it solve the problem? Visualization-of-Thought (VoT) prompting is a novel approach that aims to improve the spatial reasoning abilities of LLMs by visualizing their reasoning traces and using these visualizations to guide subsequent reasoning steps. VoT prompting draws inspiration from the Mind's Eye process, enabling LLMs to generate mental images that facilitate spatial reasoning. The researchers applied VoT prompting to multi-hop spatial reasoning tasks, such as natural language navigation, visual navigation, and visual tiling in 2D grid worlds. By visualizing the reasoning traces of LLMs, VoT prompting provides a means to elicit and enhance spatial reasoning capabilities.

What's next? The experimental results demonstrate that VoT prompting significantly improves the spatial reasoning abilities of LLMs, even outperforming existing multimodal large language models (MLLMs) in the studied tasks. The success of VoT prompting in LLMs suggests its potential viability in MLLMs as well. Future research could focus on extending VoT prompting to more complex spatial reasoning tasks, exploring its applicability to other domains, and investigating the integration of VoT prompting with MLLMs to potentially get the best of both worlds.


Papers of the Week:

VoT sounds like the concept of not just being able to produce an answer, but also being able to deduce 'why' that answer was given. Pretty cool leap towards the LLM emulation of actual critical thinking.

回复
Vincent Valentine ??

CEO at Cognitive.Ai | Building Next-Generation AI Services | Available for Podcast Interviews | Partnering with Top-Tier Brands to Shape the Future

6 个月

Exciting research! The "Mind's Eye" approach seems very promising for enhancing spatial reasoning in Large Language Models. Pascal Biese

Marcelo Grebois

? Infrastructure Engineer ? DevOps ? SRE ? MLOps ? AIOps ? Helping companies scale their platforms to an enterprise grade level

6 个月

Fascinating research on enhancing spatial reasoning in LLMs! Can't wait to dive deeper into VoT and its implications. ?? Pascal Biese

Troy Schultz

AI Enthusiast, GenAI researcher, AI tools developer, chatbots. Innovator of Mermaid RAG LLMs utilizing Knowledge Graphs from text Input to flow maps of code, system diagrams, storyboards, consequence outcome prediction.

6 个月

Interesting concept, I love seeing all this diverse research across so many domains.

Daniel H.

Operations Research and Development | Geopolitical & Geoeconomics Expert | Veteran Advocate | PhD Candidate | Project Manager | SQL Python R | Ai & Palantir Enthusiast | Active Security Clearance

6 个月

Thank you for sharing this information ?? - Dan

要查看或添加评论,请登录

社区洞察

其他会员也浏览了