登录查看更多内容

Open-Source LLM or Free, Downloadable, Binary of a LLM

Ajit Patankar

发布日期: 2023年6月26日

As I dived deeper into Generative-AI strategy and implementation over the last few months, I have come to a few conclusions:?

All or most of enterprise Gen-AI work is critically dependent on OpenAI in one form or another – ChatGPT, document embeddings, or summarization of RAG snippets. ?
The prognosis in the famous or infamous leaked Google Memo, “We have no moat ..” is nowhere near to being fulfilled. With so many very smart minds working in this field, we may be on the cusp of a breakthrough, but we are simply not there yet. ?
I would love to have some real competition for OpenAI – simply to save costs for my company and to have real flexibility in domain adaptations of LLMs. For example, there is no transparency or flexibility in fine-tuning the largest LLM available on OpenAI, namely, Davinci. I do not even know if internally it uses LORA or q-LORA or one of the older fine-tuning techniques. Nor can I change the technique.?
Thus, I am even more troubled when I review the so-called open-source competition and have to call them ‘Free, Downloadable, Binary LLM’.?

Let us parse the each descriptive word:?

领英推荐

Is Google's Imagen Better than OpenAI's DALL-E 2?

Michael Spencer 2 年前

?? GraphRAG's Biggest Problem Solved

Pascal Biese 3 个月前

OpenAI's o1 Outperforms Other LLMs By "Stopping To…

ARK Investment Management LLC 6 个月前

Free. Yes, these LLMs are free until someone asks money for using them, perhaps with a legal threat that you are breaking someone’s copyrights or IP. ?
Downloadable. This seems to be the next big thing going for these LLMs. Indeed, one can download the weights file for these LLMs and is free to do what one wishes with these weights.?
Binary. I use ‘binary’ as a metaphor for a piece of code that a developer cannot change and/or truly understand. I know this is a little stretch – of course a weights file can be used to generate the model architecture and we can inspect the activation functions at each node. All said and done, there is a very limited capability to modify these weights once the number of model parameters reaches even a few Billion.? Most attempts at truly retraining a LLM of this scale using an enterprise corpus which is orders of magnitude smaller as compared the initial web corpus, result in insignificant changes or catastrophic forgetting.?In fact, this is the reason for the advent of the whole field of Parameter Efficient Fine-Tuning field.?

Even if the model weights are under Apache or MIT license, I believe there is a fundamental expectation for a truly open-source software – A developer?who is not part of the original development team should be able to fix a bug or enhance the software independent of the original development team. Towards this goal, I enunciate four requirements for a model to be called as a true Open-Source LLM:?

Disclosure and public access to training corpus. A simple statement that the model was trained on ‘public web documents’ is not sufficient for other groups to use or validate the corpus for its copyright properties. Furthermore, the public web content is highly dynamic and is even likely to have been modified or deleted by the time other groups want to experiment with this corpus. The training document corpus must be collected and made available in a shared location.?
Document processing pipeline code and tokenized output. This is one of the most important yet unappreciated steps in the whole process. There should not be a secret sauce or confidentiality in the tokenization process. ?
Model architecture and training code. This is likely to be the least controversial of my proposals as the model architecture is already derivable from the weights file and many teams have already published some code in GitHub.?
Training process. This includes the hyperparameters, evaluation metrics, batching techniques, random seeds, etc. The goal of idempotent training may be too far into future but all these and more are needed to reproduce training output with reasonable statistical confidence.?

I have a feeling that I have only scratched the surface in terms of defining what is truly an Open-Source LLM. Until there is progress on these factors, the so-called open-source LLMs are best described as free, downloadable, binary of a LLM.?

Shreya G.

Software Engineer at Millennium

1 年

Very well articulated!

Monimoy Deb Purkayastha

AI/ML and Software Architect | Natural Language Processing & Generative AI & Data Science Expert | MIT Alumni

1 年

Very insightful analysis on open source LLMs

1 次回应

Piyush Malik

1 年

Ajit, interesting perspective! For me, I feel It's just the beginning with LLMs & Gen AI! With partnerships first and heating m&a activity we have witnessed in the past few months, the competition will only get intense and benefit the consumers/ developers in the long run.

1 次回应

Kapil Gogia

Group Manager - Engineering and R&D Services at HCLTech

1 年

Well Articulated ?? ??

Arun Mahajan

Senior Director, Software Engineering. Pathfinding and new initiatives, CTO's office at Juniper Networks

1 年

Good perspectives, Ajit and food for thought.

1 次回应

查看更多评论

要查看或添加评论，请登录

Ajit Patankar的更多文章

Large Language Models and Networking: What are the Challenges and Opportunities?

2023年10月9日

Large Language Models and Networking: What are the Challenges and Opportunities?

It was an honor to represent Juniper Networks at Networking Channel (https://networkingchannel.eu) forum discussion on…

Open-Source LLM or Free, Downloadable, Binary of a LLM

Ajit Patankar

领英推荐

Ajit Patankar的更多文章

社区洞察

其他会员也浏览了

Crash Course on Developing AI Applications with LangChain

Math Hallucinations with OpenAI, But Also Some Great Results

This 32B Open-Source DeepSeek Distilled Model outperforms OpenAI's o1-mini! ??

OpenAI’s o3?mini: A Masterstroke or a Market Manipulation? The ROI Gamble That’s Rattling Boardrooms

DeepSeek Article Observations and Security

The 6 Best LLM Tools To Run Models Locally

LangChain State of AI 2024: A Comprehensive Analysis

Models We Love: June 2023

Practical Guide: Using Gemini Context Caching with Large Codebases

January 25, 2024

领英推荐

Ajit Patankar的更多文章

Large Language Models and Networking: What are the Challenges and Opportunities?

社区洞察

其他会员也浏览了

Crash Course on Developing AI Applications with LangChain

Math Hallucinations with OpenAI, But Also Some Great Results

This 32B Open-Source DeepSeek Distilled Model outperforms OpenAI's o1-mini! ??

OpenAI’s o3?mini: A Masterstroke or a Market Manipulation? The ROI Gamble That’s Rattling Boardrooms

DeepSeek Article Observations and Security

The 6 Best LLM Tools To Run Models Locally

LangChain State of AI 2024: A Comprehensive Analysis

Models We Love: June 2023

Practical Guide: Using Gemini Context Caching with Large Codebases

January 25, 2024