Open-Source and Science in the Era of Foundation Models

Open-Source and Science in the Era of Foundation Models

Welcome to the summary of the tenth lecture of the LLM Agents course conducted by University of California, Berkeley. Refer to this link for the summary of the previous lectures.

While the capabilities of LLMs increase exponentially over the years, the model accessibility decreases at the same rate resulting in reduced understanding of the model architecture. The model accessibility is categorized into the following levels: 1) APIs (GPT-4): providing a black-box view of the model through prompt and response, 2) open-weight (Llama): the model weights are widely available to understand the mechanisms, and 3) open-source (StarCoder): allowing the users to use/study/modify/share the model without permission. Integrating the API calls together leads to agents that helps augmenting the capability of LLMs through reasoning and action. There are two types of agents: problem-solving and simulation.

Given a ML problem, the objective of the problem-solving agent is to build the best model by writing code, editing model parameters, executing code and analyzing the results. MLAgentBench is a suite of 13 tasks ranging from improving model performance on CIFAR-10 to recent research problems like BabyLM. It is used to evaluate the performance of agents in ML experimentation which involves improving the model performance by iteratively finetuning the model parameters. The results imply that the success rate of the highly performant Claude v3 Opus model varies from 100% on older datasets to 0% on the new datasets created after the model’s cutoff date. This approach leads to self-improvement enabling the agents to solve the tasks in a better way by improving the model.? ?????

Generative agents are computational software agents that simulate believable human behavior. In this work, the multi-agent architecture extends an LLM to store a complete record of the agent's experiences using NLP, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior based on recency, importance, and relevance. It is inferred that the architecture components: observation, planning and reflection, enable the generative agents to produce believable individual and emergent social behaviors.

The key challenges with the API-based LLM agents are reproducibility, interpretability, and the limited long-term planning capabilities. To further improve them, it is critical to get deeper access to the model. When LLMs behave unexpectedly (Ex: many LLMs incorrectly stating 9.8 < 9.11), it is important to understand the internal representations of the thought process. To address this, Transluce developed Monitor, an observability interface designed to help humans observe, understand, and steer the internal computations of LLMs. This work has demonstrated significant compute cost savings and MMLU scores by pruning Nemotron-4 15B model into Minitron (8B) through distillation-based retraining with <3% of the original training data. The above two findings are possible only because the model weights are widely available. With the open-source models, we get full access to the data information and the source code used for data processing, model training and validation. This facilitates optimizing the data mixtures during training that enables the model to achieve the baseline accuracy withy 2.6x fewer training steps. The current research efforts span across building low-scale models with the potential to scale well and harnessing distributed heterogenous, low-bandwidth compute environments for training models to address the compute problem.

要查看或添加评论,请登录

Ramesh Perumal PhD的更多文章

社区洞察

其他会员也浏览了