登录查看更多内容

“The False Promise of Imitating Proprietary LLMs” — A Provectus Perspective

Provectus

We help businesses leverage cloud, data, and AI to reimagine the way they operate, compete, and deliver customer value.

发布日期: 2023年7月6日

The AI/ML market is incredibly dynamic, making it challenging to keep up.? At Provectus, we strive to stay abreast of the latest news and updates, delve into recent papers, and share the essence of our findings with the community.

In this installment of the "Provectus AI Review" series, we share our perspective on a paper titled The False Promise of Imitating Proprietary LLMs, authored by Arnav Gudibande et al. at 美国加州大学伯克利分校 .

Abstract

To start, let's examine the abstract of the paper.

No alt text provided for this image — https://arxiv.org/pdf/2305.15717.pdf

In the paper, the authors evaluate the approach of finetuning weaker language models on outputs from stronger proprietary models like ChatGPT, to imitate their capabilities in a cost-efficient manner. They conducted an experiment where they finetuned Language Models (LMs) to imitate ChatGPT using varying base model sizes, data sources, and imitation data amounts. These models were evaluated using crowd raters and canonical NLP benchmarks.

The initial results were promising, with the imitation models appearing to follow instructions well and their outputs rated as competitive with ChatGPT by crowd workers. However, more targeted automatic evaluations revealed that the imitation models did not significantly close the gap between the base LM and ChatGPT on tasks that were not heavily supported in the imitation data. The authors found that the imitation models were good at mimicking ChatGPT's style, but not its factuality.

The authors concluded that model imitation is a false promise. There is a substantial capability gap between open and closed LMs that cannot be bridged using current methods without an unwieldy amount of imitation data or more capable base LMs. Therefore, they argue that the most effective way to improve open-source models is to focus on developing better base LMs, rather than attempting to imitate proprietary systems.

Major Takeaways and Provectus Perspective

In this section, we will delve into some of the points raised in the paper and share our thoughts on potential solutions.

We consider decoder-only models ranging in size from 1.5B to 13B parameters: GPT-2 1.5B (Radford et al., 2019), LLaMA 7B (Touvron et al., 2023), and LLaMA 13B.

In practical terms, this suggests that alternative architectures, such as encoder-decoder models, should also be explored and evaluated. For instance, we could consider models like Google's UL2 or T5. It would also be beneficial to provide some budget estimates for these potential explorations.

For human evaluation, we conduct blind pairwise output comparisons using Mechanical Turk. In our UI, we present each rater with a task instruction and the output of two unknown models, one of which is ChatGPT and the other is one of our imitation models.

Every significant project requires a tool for human assessment that allows for a seamless switch between a private workforce and a public crowd. A lot of open-source solutions that do exactly this already exist on the market. For example, Ray's Aviary, which was recently released, could prove to be a useful tool in this context.

Training local imitation models is far more successful… This demonstrates that it is far more feasible to distill a specific behavior from ChatGPT as opposed to broadly matching its capabilities.

In practical terms, this implies that the approach can be utilized for a broad spectrum of specific tasks that extend beyond mere replication of ChatGPT.

For example: Fine-tuning OpenAI models for English-to-Cypher translation can be quite costly. We can argue that a well-tuned open-source LLM, utilizing the imitation approach, could be more feasible for continuous improvement in the translation task, without incurring the high costs of fine-tuning.

Parul Gautam 5 个月前

The Incredible Powers of GPT-3.5

Michael Spencer 1 年前

Can GPTZero be relied upon for AI Detection accuracy?

Anna Y. 6 个月前

An open problem is whether these performance regressions can be mitigated using regularization or by mixing in pre-training data during fine-tuning.

Here is an interesting observation: the authors highlight an issue that arises when you fine-tune a model on conversational-style data, but then apply a different benchmark for evaluation.

However, it also may be possible to use the target model to perform RLHF or constitutional AI… to further improve results. Lastly, we only considered relatively simple methods for collecting imitation data, however, there may be more advanced methods (e.g., active learning) that may improve the effectiveness or efficiency of model imitation.

Here is another noteworthy point: the authors highlight that despite certain constraints, there are still lots of methods to explore.

Note: according to OpenAI's terms of use, any dataset sampled from their models technically renders the fine-tuned LLM unusable for commercial purposes.

Conclusion

In The False Promise of Imitating Proprietary LLMs, the authors evaluate the effectiveness of model imitation as means of enhancing open-source Language Models (LMs). The findings suggest that businesses can gain a competitive edge by pre-training powerful base models, but also that one group can mimic another's model if their base LMs are equally competent. The study raises questions about the future of human evaluation, about improving open-source LMs, and ethical and legal issues surrounding the use of proprietary models by the open-source community.

The Provectus perspective on the paper is clear: it is definitely worth a read. Publications like this are always extremely valuable as they go through a thorough evaluation process. There are still many opportunities - and gaps, however - that can and should be investigated in and around the topic.

Authors:

Rinat Gareev — Senior Solutions Architect || Provectus

Marlon Cajamarca Vega?— ML Engineer & AI Educator || Provectus

Moving Forward — Learn more about Provectus AI expertise

要查看或添加评论，请登录

“The False Promise of Imitating Proprietary LLMs” — A Provectus Perspective

Provectus

We help businesses leverage cloud, data, and AI to reimagine the way they operate, compete, and deliver customer value.

Abstract

Major Takeaways and Provectus Perspective

领英推荐

Conclusion

Moving Forward — Learn more about Provectus AI expertise

更多精彩文章

社区洞察

其他会员也浏览了

AutoML-GPT; Causal Reasoning and LLMs; MetaGPT; Free Access to GPT-4; Weekly Concept; To Handle Increased Stress, build resilience; and more.

PROMPTING CLAUDE TO DISCUSS THE IMPLICATIONS OF GENERATIVE AI (LARGE LANGUAGE MODELS OR LLM) FOR KNOWLEDGE MANAGEMENT

Gen-AI may be massively hyped, but the potential is huge: Here are ten big technological shifts creating the disruptive opportunity of GPT-4

A Comparative Look at Today’s Leading Gen AI Assistants: Unveiling the Giants of Conversational Technology

Concerns Over GPT-4: Assessing Performance and Ensuring Responsible AI Development

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Ai vs Hard Copies

Explore the Evolution of GPT-3, the World's Most Influential Language Model: From its Humble Beginnings to Today's ChatGPT - Happy Friday!

The unnerving capabilities of state-of-the-art chatbots and how to use them

Elevating AI with RAG (Retrieval-Augmented Generation): Beyond Pre-Trained Models

Abstract

Major Takeaways and Provectus Perspective

领英推荐

Conclusion

Moving Forward — Learn more about Provectus AI expertise

Falcon 180B LLM, Code Llama, LLMs with Human Preferences, Algorithm of Thoughts, Defog Coder, and More

2023年9月13日

Llama 2 Release, Hugging Face Updates, OpenAI Availability and Deprecation, and “Superalignment” Vision

2023年7月25日

Progress in Gen AI and Open-Source LLMs, New Product Launches, and Educational Resources

2023年7月11日

Google I/O 2023: A Journey into the Future of AI Technology

2023年6月29日

Migrating and Optimizing Amazon EMR Workloads — Provectus

2022年10月31日

Feature Store 101

2022年7月25日

People Management for AI: Building High-Velocity AI Teams

2022年7月20日

社区洞察

其他会员也浏览了

AutoML-GPT; Causal Reasoning and LLMs; MetaGPT; Free Access to GPT-4; Weekly Concept; To Handle Increased Stress, build resilience; and more.

PROMPTING CLAUDE TO DISCUSS THE IMPLICATIONS OF GENERATIVE AI (LARGE LANGUAGE MODELS OR LLM) FOR KNOWLEDGE MANAGEMENT

Gen-AI may be massively hyped, but the potential is huge: Here are ten big technological shifts creating the disruptive opportunity of GPT-4

A Comparative Look at Today’s Leading Gen AI Assistants: Unveiling the Giants of Conversational Technology

Concerns Over GPT-4: Assessing Performance and Ensuring Responsible AI Development

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Ai vs Hard Copies

Explore the Evolution of GPT-3, the World's Most Influential Language Model: From its Humble Beginnings to Today's ChatGPT - Happy Friday!

The unnerving capabilities of state-of-the-art chatbots and how to use them

Elevating AI with RAG (Retrieval-Augmented Generation): Beyond Pre-Trained Models