Switching LLMs - The Invisibles
Opinions are personal, do not reflect my employer.
Properitary to Open Source LLM Model switch is more than just a replacement of models - Few factors to consider on how to make "informed choices" over changing and growing complexity (not an exhaustive list):
Did we understand the llm-patterns & problems?
What are some of the most open challenges in LLM today?
Comparison of Self-hosted LLMs vs OpenAI:
New Foundation Models: https://magazine.sebastianraschka.com/p/ahead-of-ai-11-new-foundation-models
In LLM context: "Fine tuning" is for form, not facts??
Fine Tuning Guide:
Guide: What does it take to run LLama Model on a GPU computer?
Model options customer should be exploring?
Does the Organization have the budget, time, skills and resources to create fine-tunable dataset and engineering to optimize "serving" of the model?
Serving is a discipline by itself: https://betterprogramming.pub/frameworks-for-serving-llms-60b7f7b23407
How to build a dataset for finetuning? https://platypus-llm.github.io/ & https://huggingface.co/datasets?sort=trending&search=llama2
What dataset out there - 3 Trillion Token: https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64
Cost of training data exceeds cost of compute: https://www.dhirubhai.net/feed/update/urn:li:activity:7087771674959310849/
LLM Inference: Why it matters - continuous and static batching
How to LoRA, QLoRA, PEFT, and other refinements on open source LLM??
Example: platypus. No.1 in leaderboard
LLama Inference primer and the case of why GPT-3.5 is cheaper.?
领英推荐
Probably don't need to finetune an LLM
Which Cloud GPU to use for LLM
Optimize for latency?
LLMOps - road ahead - data drift, model drift, everything drifts
Llama2 with n-bit quantization on CPU with huge RAM? When quality and latency is not a prime concern.
Great explanation on how quantization is done from deeplearning dot ai:
Understanding the size tradeoffs of LLMs & a decision tree
Offline batch inference of 200TB
How do i evaluvate?
Case Study: Tailoring models to unique applications?
The dilemma: Generalization, Evaluation and Deployment
The new business of AI. How it's different from traditional software? (2020)
Back to prompt tuning. What else is there?