登录查看更多内容

Outperforming LLMs with Fewer Data and Smaller Model Sizes; Toward Federated GPT; You Can Learn and Get Work Done at the Same Time; and More.

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

发布日期: 2023年5月16日

Papers of the Week

Artificial Neuropsychology: Are Large Language Models Developing Executive Functions? This article explores the question of whether large language models (LLMs), specifically those of the GPT family, are developing executive functions similar to those of humans as part of their learning. Executive functions rely on the correct functioning of neural networks in the frontal lobes. The article evaluates the planning function and working memory of GPT using the Towers of Hanoi method, including a new variant to avoid data leakage. Preliminary results show that GPT can generate near-optimal solutions to Towers of Hanoi-related tasks, adhere to task constraints, and exhibit rapid planning capabilities and efficient working memory usage, indicating a potential development of executive functions. However, GPT's abilities are limited and inferior to well-trained humans regarding unknown tasks outside the training data. The article contributes to understanding the potential development of executive functions in LLMs and their limitations.

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance: This article discusses the cost of querying large language models (LLMs) through popular APIs such as GPT-4, ChatGPT, and J1-Jumbo. The authors find that these models have heterogeneous pricing structures, with fees that can differ significantly. The authors outline three strategies for reducing inference cost: prompt adaptation, LLM approximation, and LLM cascade, to address the high cost of using LLMs on large collections of queries and text. They propose FrugalGPT, an instantiation of the LLM cascade that learns which combinations of LLMs to use for different queries to reduce cost and improve accuracy. Experiments show that FrugalGPT can match the performance of the best individual LLM with up to 98% cost reduction or improve accuracy over GPT-4 by 4% with the same cost. The article contributes to understanding LLM sustainability and efficiency, providing practical strategies for reducing inference costs.

Comparing Foundation Models using Data Kernels: This article discusses recent advances in self-supervised learning and neural network scaling that have enabled the creation of large foundation models that can be easily adapted to various downstream tasks. The current method for comparing foundation models involves benchmarking them with aggregate metrics on curated datasets, which is heavily dependent on the choice of metric. This article proposes a metric-free methodology for comparing foundation models using their embedding space geometry, based on random graph theory, which facilitates both pointwise and multi-model comparisons. The framework can also induce a manifold of models with a distance function that strongly correlates with several downstream metrics. The article contributes to model comparison and evaluation, offering a new approach independent of specific metrics that can be applied to various downstream tasks.

MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts: This paper proposes a framework, MoT, for improving Large Language Models (LLMs) through Memory of Thoughts, without relying on annotated datasets or parameter updates. The framework involves two stages: pre-thinking on unlabeled data to save high-confidence thoughts as external memory and recalling relevant memory during inference to help the LLM reason and answer questions. The authors demonstrate that MoT can significantly improve the abilities of ChatGPT in math reasoning, commonsense reasoning, factual reasoning, and natural language inference, and that each component of the framework contributes critically to the improvements. The framework offers a promising approach to self-improvement for LLMs, reducing the reliance on high-quality datasets and computationally expensive fine-tuning.

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes: This paper proposes a new “Distilling step-by-step” mechanism for training smaller task-specific models that outperform large language models (LLMs) while using less training data than traditional methods such as finetuning or distillation. The mechanism leverages LLM rationales as additional supervision for small models within a multi-task training framework. The authors demonstrate on four NLP benchmarks that their approach performs better with fewer labeled/unlabeled training examples than finetuning and distillation. They use substantially smaller model sizes than LLMs. Additionally, the authors show that they can reduce the model size and the data required to outperform LLMs, achieving better performance than a 540B PaLM model using only 80% of available data on a benchmark task with their 770M T5 model.

Towards Building the Federated GPT: Federated Instruction Tuning: This article addresses the challenges of acquiring high-quality instruction data for training "instruction-tuned" generative LLMs. It introduces Federated Instruction Tuning (FedIT), a novel approach that leverages federated learning (FL) to overcome the limitations of centralized training. By utilizing users' diverse instructions stored on local devices while prioritizing privacy and data security, FedIT improves the generality and effectiveness of LLMs. The study first explores FL-based instruction tuning for LLMs and demonstrates its efficacy through GPT-4 auto-evaluation. Additionally, it introduces Shepherd, a GitHub repository serving as a foundational framework for federated fine-tuning of LLMs using heterogeneous instructions across diverse categories.

Industry Insights

Are you looking to advertise a product, job opening, or event to an audience of over 30,000 AI researchers and engineers? Get in touch with us at?[email protected]?to explore your options.

Enjoy the newsletter? Help us make it bigger and better by sharing it with colleagues and friends.

Weekly Concept Breakdown

Principle Component Analysis (PCA)?vs. Linear Discriminant Analysis (LDA)

In Data Science and Machine Learning, PCA is an unsupervised dimensionality reduction technique that ignores the class label. PCA focuses on capturing the direction of maximum variation in the data set.

LDA is a supervised dimensionality reduction method that focuses on finding a feature subspace that maximizes the separability between the groups.

PCA performs better in cases where the number of samples per class is smaller.
LDA works better with large datasets having multiple classes; class separability is an important factor while reducing the dimensionality
PCA performs better in cases where the number of samples per class is smaller.
Whereas LDA works better with large datasets with multiple classes, class separability is an important factor in reducing dimensionality.
LDA performs at its best only when parametric assumptions like heteroscedasticity, normality, etc., are satisfied.

领英推荐

Ahead of AI #5: RevAIval of Ideas

Sebastian Raschka, PhD 1 年前

How to optimize an AI algorithm

Algolia 1 年前

Meet Vectara: powerful, free neural search

Amr Awadallah 1 年前

A few more notes:

It is important to note that LDA has certain assumptions that must be satisfied for optimal performance. These assumptions include the normality of class distributions and the equality of covariance matrices among classes, a property known as homoscedasticity. Violations of these assumptions can affect the effectiveness of LDA and may require alternative approaches.
PCA and LDA can serve as preprocessing steps in machine learning pipelines. PCA can be used to reduce data dimensionality before applying other algorithms, improving computational efficiency and addressing the curse of dimensionality. On the other hand, using LDA as a dimensionality reduction technique after a classifier can improve classification performance by highlighting class separability.

Growth Zone

Motivational Spark

In our journey toward success, we often encounter numerous challenges and obstacles that test our resolve, determination, and character. During these moments of adversity, we have the opportunity to rise above our limitations and discover the true depths of our potential. Albert Schweitzer once said, "Success is not the key to happiness. Happiness is the key to success. If you love what you are doing, you will be successful." This sentiment holds a profound truth – that our ability to embrace challenges is integral to our growth and ultimately leads us to greatness.

Every challenge we face is an invitation for growth. It is a chance to push beyond our comfort zones and explore uncharted territories. Often, we find ourselves hesitating, fearing the unknown or the possibility of failure. However, in those moments of hesitation, we need to remind ourselves of our inherent capabilities and inner strength. We must embrace challenges with a mindset of resilience and unwavering determination.

We can expand our knowledge, skills, and capabilities when we encounter challenges. Each hurdle we overcome teaches us valuable lessons, shaping us into more capable individuals. We develop our problem-solving skills, creativity, and adaptability through challenges. As Winston Churchill wisely stated, "Success is not final, failure is not fatal: it is the courage to continue that counts." Failure may be a part of the journey, but it is through persistence and resilience that we learn, improve, and ultimately succeed.

Challenges allow us to discover our true passions and purpose. They push us to explore our limits and step outside our comfort zones. In the face of adversity, we often uncover hidden talents and strengths we never knew we possessed. Challenges force us to reevaluate our goals, beliefs, and values, helping us align our lives with what truly matters to us. It is important to remember that challenges are not meant to break us; they are designed to build us. They are the stepping stones that lead us to growth, transformation, and, ultimately, our desired success. As we face challenges head-on, we discover resilience, courage, and determination within ourselves that we may not have known existed. We develop a mindset that views challenges not as obstacles but as opportunities for growth and self-improvement.

So, let us embrace challenges with open arms, for they hold the key to unlocking our true potential. Let us remember the words of Theodore Roosevelt, who said, "Believe you can, and you're halfway there." By believing in ourselves and our abilities, we can conquer any challenge that comes our way. Let us view challenges not as setbacks but as stepping stones toward success. Let us welcome them as catalysts for personal and professional growth. In embracing challenges, we cultivate the strength, resilience, and determination necessary to reach the highest peak of achievement.

Expert Advice

In data science and machine learning, thinking beyond accuracy as the sole evaluation metric when working on a problem is crucial. While accuracy is certainly important, it is not always the most appropriate or meaningful measure of success, as it may not fully capture the nuances and complexities of a given problem.

When considering a project's broader context and goals, it becomes evident that different metrics may carry more weight and significance depending on the problem. For example, in a binary classification problem where the classes are imbalanced, more than accuracy alone may be needed to reflect model performance accurately. Metrics such as precision, recall, or F1 score can offer a more comprehensive understanding of how well the model performs for each class, considering factors such as false positives and false negatives.

Depending on the problem domain, additional considerations may go beyond traditional metrics. For instance, in a medical diagnosis system, the impact of false negatives (misclassifying a patient as healthy when they have a condition) could be far more severe than that of false positives. In this case, the sensitivity (recall) metric becomes crucial, as it measures the ability of the model to identify positive cases correctly.

Similarly, in recommendation systems, the relevance of personalized recommendations to individual users may be more important than overall accuracy. Metrics such as precision at K, which measures the percentage of relevant items among the top K recommendations, can provide valuable insights into the system’s effectiveness in meeting user preferences and needs.

Data scientists can select or develop evaluation metrics that align with the specific problem, domain, and user requirements by thinking beyond accuracy and considering a project's broader context and goals. This holistic approach ensures the evaluation is comprehensive and meaningful, enabling better decision-making and delivering greater value to the end-users.

It is crucial to emphasize that the problem at hand and the particular objectives and constraints of the project should drive the choice of metrics. Accuracy should only be blindly pursued by considering the unique characteristics and priorities of the problem. By expanding our perspective and considering a range of metrics, we can gain deeper insights into the performance and impact of our models, ultimately leading to more informed decisions and better outcomes.

The AI Vanguard

43,696 位关注者

Good AI Vibes

1 年

Great insights on the latest AI trends! As a team, we always look forward to staying informed about the latest developments in the AI community. To gain access to business applications and use cases of AI, subscribe to our Good AI Vibes newsletter: https://goodaivibes.substack.com/ ??Together, let's remain optimistic about the future of AI and continue to engage in its growth! ??

Woodley B. Preucil, CFA

Senior Managing Director

1 年

Danny Butvinik Thank you for sharing this insightful post. I found it to be very informative and thought-provoking.

1 次回应

Sudhir Dyapa

Customer Service| Financial Consultant| PMP| Product Support

1 年

Thank you for sharing a good article. Very true. Our own Human intelligence should not forget the bigger good while we we are advancing in Science and Technology. We need to work collectively together combining individual excellence for better health, safety, environment, peace and prosperity for all of us.

1 次回应

Wallace Ferreira

Multidisciplinary Optimization (PhD) | AI/ML Software Engineering | Artificial Intelligence | Machine Learning | Data Science & Analytics

1 年

Roberto Gomes, PhD Alberto Costa Nogueira Junior

1 次回应

KRISHNAN NARAYANAN

Sales Associate at Microsoft

1 年

Great opportunity

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Outperforming LLMs with Fewer Data and Smaller Model Sizes; Toward Federated GPT; You Can Learn and Get Work Done at the Same Time; and More.

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

Papers of the Week

Industry Insights

Weekly Concept Breakdown

领英推荐

Growth Zone

Motivational Spark

Expert Advice

The AI Vanguard

43,696 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Unlocking the Potential of Pre-Trained Models

What's the most important technology of today?

Future of Artificial Intelligence

Hallucinations in LLMs: bug or feature?

Encoder decoder to Transfer learning: An analysis of all research papers contributed towards journey of Transformers Architecture (LLM's)

What Do AI 'Mistakes' Reveal About the Underlying Algorithms' 'Unconscious' Processes?

The Only Broad Match Guide You’ll Ever Need *

Anatomy of the Beast with many heads! [with code]

AI Atlas #9: Transformers

Legal Issues Gen AI: Creation Tool or Generic Output

Papers of the Week

Industry Insights

Weekly Concept Breakdown

领英推荐

Growth Zone

Motivational Spark

Expert Advice

The AI Vanguard

43,696 位关注者

Assessing GPT-4 on Reasoning; Mathematical Perspective On Transformers; Family Of Multimodal Models; Why Small LMs Are The Next Thing; and More.

2024年4月18日

First Hallucination-Free LLM; Fine-Tune or Retrieval; Privacy Issues in LLMs; New Embedding Model by Google; What Resilience Means and More.

2024年4月4日

LLM Fine-Tuning on Graphs; How To Evaluate LLMs; Uncovering Knowledge Gaps Using RAG; Claud 3 on Bedrock; Overcoming Limits Of RAG; and More.

2024年3月12日

Generation Model – What Do They Know? Cracking Length Generalization: AI's Reasoning Evolution; Can We Drastically Reduce Training Costs?; and More.

2024年3月3日

Multimodal LLMs; Orca 2; Cosmopedia – Largest Open Synthetic Data by Huggin Face; How To Fine-Tune On Single GPU; and More.

2024年2月27日

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

2024年2月20日

Survey on Hallucination in LLM; LLM’s Understanding Math; GPT4All Open-Source LMs; Next Chapter of Gemini; Improved GPT-4 Performance; and More.

2024年2月13日

Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

2024年2月6日

Hallucination in LLMs – Perspectives and Remediations; Fine-Tuning With Feedback; What LLMs DO NOT KNOW; LLaMA 2 Explained; and More.

2024年1月30日

What Algorithms Can Transformers Learn; Reasoning Agent for Graphs; Supervised Fine-Tuning; Context Understanding in LLMs; and More.

2024年1月23日

社区洞察

其他会员也浏览了

Unlocking the Potential of Pre-Trained Models

What's the most important technology of today?

Future of Artificial Intelligence

Hallucinations in LLMs: bug or feature?

Encoder decoder to Transfer learning: An analysis of all research papers contributed towards journey of Transformers Architecture (LLM's)

What Do AI 'Mistakes' Reveal About the Underlying Algorithms' 'Unconscious' Processes?

The Only Broad Match Guide You’ll Ever Need *

Anatomy of the Beast with many heads! [with code]

AI Atlas #9: Transformers

Legal Issues Gen AI: Creation Tool or Generic Output