There is no such thing as a Trained LLM
What I mean here is that traditional LLMs are trained on tasks irrelevant to what they will do for the user. It’s like training a plane to efficiently operate on the runway, but not to fly. In short, it is almost impossible to train an LLM, and evaluating is just as challenging. Then, training is not even necessary. In this article, I dive on all these topics.
?? Training LLMs for the wrong tasks.
Since the beginnings with Bert, training an LLM typically consists of predicting the next tokens in a sentence, or removing some tokens and then have your algorithm fill the blanks. You optimize the underlying deep neural networks to perform these supervised learning tasks as well as possible. Typically, it involves growing the list of tokens in the training set to billions or trillions, increasing the cost and time to train. However, recently, there is a tendency to work with smaller datasets, by distilling the input sources and token lists. After all, out of one trillion tokens, 99% are noise and do not contribute to improving the results for the end-user; they may even contribute to hallucinations. Keep in mind that human beings have a vocabulary of about 30,000 keywords, and that the number of potential standardized prompts on a specialized corpus (and thus the number of potential answers) is less than a million.
领英推荐
?? Read the full article here, also featuring issues with evaluation metrics and the benefits of untrained LLMs.
To learn more about LLM 2.0 and its radically different approach and next gen features, see my AI research papers and books, here. There you can sign-up to my free newsletter and discover the newest LLM advances. For instance, I am currently working on a public Nvidia corpus consisting of financial reports (PDFs), with new technology to produce multimodal agentic contextual chunks, retrieve information that standard Python libraries cannot detect, and introducing the concept of multi-index. I will write about it next week; the code and data are on GitHub already and fully tested.
Not the least, see here my large GitHub repository with corporate LLM use cases, as well as LLM applied to the entire Wolfram corpus with better answers (at least for professional users) obtained much faster with no training.
Artificial Intelligence | Machine Learning | Computer Vision | Model Testing and Deployment
3 个月What's your take on this Dr. Malusi Sibiya UNISA?
Digital Transformations Engineer
3 个月Dear Vincent, first of all, thank you for sharing this article. I completely agree with the main idea of this article, but any tool that is effective in making human life easier will be useful. Of course, the effort to truly realize artificial intelligence and introduce its benefits and possibilities to human society will be a commendable move. I enjoyed your article very much and I am following your work.
Business Analyst at GAC Group (Shipping & Logistics) | Finance Billing, Tariffs & Taxation | CBAP Certified | AI Enthusiast
3 个月Tanzeel Shaikh and Keith Satuku
DevSecOps Architect, Agile Coach @ HCLTech | ISTQB, MCPS, SAFe?
3 个月Are LLM's always learning?
Founder of Codepage ?| "Knowledge isn't free. You have to pay attention." (Richard Feynman)
3 个月Thank you so much for sharing! And maybe it's not just about training or optimising predictions. It's also about how well an LLM can actually respond to what the user wants. Best models can do more than just give answers. They can also ask questions to make sure they understand what you're asking (or, well, simulate this if they don't already). Another interesting aspect is the role of cultural and linguistic context. A lot of models operate on a global level, but differences in language, idioms or even technical jargon can really affect the results. So, how can we make sure that LLMs not only give generic responses but also ones that are culturally or linguistically adapted? I think we should consider taking a modular approach, where different sub-LLMs cover not only domains but also regional or linguistic specifics. I also find the idea of untrained (or minimally trained) LLMs fascinating, especially in terms of sustainability and efficiency. It would be great to explore further how these models can be used in environments where computing resources are limited, such as in small businesses or in regions with limited infrastructure. Such approaches could potentially revolutionise access to AI technologies. Have a great day!