Outperforming LLMs with Fewer Data and Smaller Model Sizes; Toward Federated GPT; You Can Learn and Get Work Done at the Same Time; and More.
Danny Butvinik
Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter
Papers of the Week
Artificial Neuropsychology: Are Large Language Models Developing Executive Functions? This article explores the question of whether large language models (LLMs), specifically those of the GPT family, are developing executive functions similar to those of humans as part of their learning. Executive functions rely on the correct functioning of neural networks in the frontal lobes. The article evaluates the planning function and working memory of GPT using the Towers of Hanoi method, including a new variant to avoid data leakage. Preliminary results show that GPT can generate near-optimal solutions to Towers of Hanoi-related tasks, adhere to task constraints, and exhibit rapid planning capabilities and efficient working memory usage, indicating a potential development of executive functions. However, GPT's abilities are limited and inferior to well-trained humans regarding unknown tasks outside the training data. The article contributes to understanding the potential development of executive functions in LLMs and their limitations.
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance: This article discusses the cost of querying large language models (LLMs) through popular APIs such as GPT-4, ChatGPT, and J1-Jumbo. The authors find that these models have heterogeneous pricing structures, with fees that can differ significantly. The authors outline three strategies for reducing inference cost: prompt adaptation, LLM approximation, and LLM cascade, to address the high cost of using LLMs on large collections of queries and text. They propose FrugalGPT, an instantiation of the LLM cascade that learns which combinations of LLMs to use for different queries to reduce cost and improve accuracy. Experiments show that FrugalGPT can match the performance of the best individual LLM with up to 98% cost reduction or improve accuracy over GPT-4 by 4% with the same cost. The article contributes to understanding LLM sustainability and efficiency, providing practical strategies for reducing inference costs.
Comparing Foundation Models using Data Kernels: This article discusses recent advances in self-supervised learning and neural network scaling that have enabled the creation of large foundation models that can be easily adapted to various downstream tasks. The current method for comparing foundation models involves benchmarking them with aggregate metrics on curated datasets, which is heavily dependent on the choice of metric. This article proposes a metric-free methodology for comparing foundation models using their embedding space geometry, based on random graph theory, which facilitates both pointwise and multi-model comparisons. The framework can also induce a manifold of models with a distance function that strongly correlates with several downstream metrics. The article contributes to model comparison and evaluation, offering a new approach independent of specific metrics that can be applied to various downstream tasks.
MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts: This paper proposes a framework, MoT, for improving Large Language Models (LLMs) through Memory of Thoughts, without relying on annotated datasets or parameter updates. The framework involves two stages: pre-thinking on unlabeled data to save high-confidence thoughts as external memory and recalling relevant memory during inference to help the LLM reason and answer questions. The authors demonstrate that MoT can significantly improve the abilities of ChatGPT in math reasoning, commonsense reasoning, factual reasoning, and natural language inference, and that each component of the framework contributes critically to the improvements. The framework offers a promising approach to self-improvement for LLMs, reducing the reliance on high-quality datasets and computationally expensive fine-tuning.
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes: This paper proposes a new “Distilling step-by-step” mechanism for training smaller task-specific models that outperform large language models (LLMs) while using less training data than traditional methods such as finetuning or distillation. The mechanism leverages LLM rationales as additional supervision for small models within a multi-task training framework. The authors demonstrate on four NLP benchmarks that their approach performs better with fewer labeled/unlabeled training examples than finetuning and distillation. They use substantially smaller model sizes than LLMs. Additionally, the authors show that they can reduce the model size and the data required to outperform LLMs, achieving better performance than a 540B PaLM model using only 80% of available data on a benchmark task with their 770M T5 model.
Towards Building the Federated GPT: Federated Instruction Tuning: This article addresses the challenges of acquiring high-quality instruction data for training "instruction-tuned" generative LLMs. It introduces Federated Instruction Tuning (FedIT), a novel approach that leverages federated learning (FL) to overcome the limitations of centralized training. By utilizing users' diverse instructions stored on local devices while prioritizing privacy and data security, FedIT improves the generality and effectiveness of LLMs. The study first explores FL-based instruction tuning for LLMs and demonstrates its efficacy through GPT-4 auto-evaluation. Additionally, it introduces Shepherd, a GitHub repository serving as a foundational framework for federated fine-tuning of LLMs using heterogeneous instructions across diverse categories.
Industry Insights
--
Are you looking to advertise a product, job opening, or event to an audience of over 30,000 AI researchers and engineers? Get in touch with us at?[email protected]?to explore your options.
Enjoy the newsletter? Help us make it bigger and better by sharing it with colleagues and friends.
--
Weekly Concept Breakdown
Principle Component Analysis (PCA)?vs. Linear Discriminant Analysis (LDA)
In Data Science and Machine Learning, PCA is an unsupervised dimensionality reduction technique that ignores the class label. PCA focuses on capturing the direction of maximum variation in the data set.
LDA is a supervised dimensionality reduction method that focuses on finding a feature subspace that maximizes the separability between the groups.
领英推荐
A few more notes:
Growth Zone
Motivational Spark
In our journey toward success, we often encounter numerous challenges and obstacles that test our resolve, determination, and character. During these moments of adversity, we have the opportunity to rise above our limitations and discover the true depths of our potential. Albert Schweitzer once said, "Success is not the key to happiness. Happiness is the key to success. If you love what you are doing, you will be successful." This sentiment holds a profound truth – that our ability to embrace challenges is integral to our growth and ultimately leads us to greatness.
Every challenge we face is an invitation for growth. It is a chance to push beyond our comfort zones and explore uncharted territories. Often, we find ourselves hesitating, fearing the unknown or the possibility of failure. However, in those moments of hesitation, we need to remind ourselves of our inherent capabilities and inner strength. We must embrace challenges with a mindset of resilience and unwavering determination.
We can expand our knowledge, skills, and capabilities when we encounter challenges. Each hurdle we overcome teaches us valuable lessons, shaping us into more capable individuals. We develop our problem-solving skills, creativity, and adaptability through challenges. As Winston Churchill wisely stated, "Success is not final, failure is not fatal: it is the courage to continue that counts." Failure may be a part of the journey, but it is through persistence and resilience that we learn, improve, and ultimately succeed.
Challenges allow us to discover our true passions and purpose. They push us to explore our limits and step outside our comfort zones. In the face of adversity, we often uncover hidden talents and strengths we never knew we possessed. Challenges force us to reevaluate our goals, beliefs, and values, helping us align our lives with what truly matters to us. It is important to remember that challenges are not meant to break us; they are designed to build us. They are the stepping stones that lead us to growth, transformation, and, ultimately, our desired success. As we face challenges head-on, we discover resilience, courage, and determination within ourselves that we may not have known existed. We develop a mindset that views challenges not as obstacles but as opportunities for growth and self-improvement.
So, let us embrace challenges with open arms, for they hold the key to unlocking our true potential. Let us remember the words of Theodore Roosevelt, who said, "Believe you can, and you're halfway there." By believing in ourselves and our abilities, we can conquer any challenge that comes our way. Let us view challenges not as setbacks but as stepping stones toward success. Let us welcome them as catalysts for personal and professional growth. In embracing challenges, we cultivate the strength, resilience, and determination necessary to reach the highest peak of achievement.
Expert Advice
In data science and machine learning, thinking beyond accuracy as the sole evaluation metric when working on a problem is crucial. While accuracy is certainly important, it is not always the most appropriate or meaningful measure of success, as it may not fully capture the nuances and complexities of a given problem.
When considering a project's broader context and goals, it becomes evident that different metrics may carry more weight and significance depending on the problem. For example, in a binary classification problem where the classes are imbalanced, more than accuracy alone may be needed to reflect model performance accurately. Metrics such as precision, recall, or F1 score can offer a more comprehensive understanding of how well the model performs for each class, considering factors such as false positives and false negatives.
Depending on the problem domain, additional considerations may go beyond traditional metrics. For instance, in a medical diagnosis system, the impact of false negatives (misclassifying a patient as healthy when they have a condition) could be far more severe than that of false positives. In this case, the sensitivity (recall) metric becomes crucial, as it measures the ability of the model to identify positive cases correctly.
Similarly, in recommendation systems, the relevance of personalized recommendations to individual users may be more important than overall accuracy. Metrics such as precision at K, which measures the percentage of relevant items among the top K recommendations, can provide valuable insights into the system’s effectiveness in meeting user preferences and needs.
Data scientists can select or develop evaluation metrics that align with the specific problem, domain, and user requirements by thinking beyond accuracy and considering a project's broader context and goals. This holistic approach ensures the evaluation is comprehensive and meaningful, enabling better decision-making and delivering greater value to the end-users.
It is crucial to emphasize that the problem at hand and the particular objectives and constraints of the project should drive the choice of metrics. Accuracy should only be blindly pursued by considering the unique characteristics and priorities of the problem. By expanding our perspective and considering a range of metrics, we can gain deeper insights into the performance and impact of our models, ultimately leading to more informed decisions and better outcomes.
Great insights on the latest AI trends! As a team, we always look forward to staying informed about the latest developments in the AI community. To gain access to business applications and use cases of AI, subscribe to our Good AI Vibes newsletter: https://goodaivibes.substack.com/ ??Together, let's remain optimistic about the future of AI and continue to engage in its growth! ??
Senior Managing Director
1 年Danny Butvinik Thank you for sharing this insightful post. I found it to be very informative and thought-provoking.
Customer Service| Financial Consultant| PMP| Product Support
1 年Thank you for sharing a good article. Very true. Our own Human intelligence should not forget the bigger good while we we are advancing in Science and Technology. We need to work collectively together combining individual excellence for better health, safety, environment, peace and prosperity for all of us.
Multidisciplinary Optimization (PhD) | AI/ML Software Engineering | Artificial Intelligence | Machine Learning | Data Science & Analytics
1 年Roberto Gomes, PhD Alberto Costa Nogueira Junior
Sales Associate at Microsoft
1 年Great opportunity