AI on a laptop
1. I am fascinated by the topic of running generative AI models on PCs and smartphones.
2. OpenAI's business model is to provide ChatGPT and GPT-3 APIs for a small fee. The model is in the cloud, the developer can easily connect to it and query it using the API. This way he can embed GPT into his software. You pay for every thousand tokens downloaded.
3. OpenAI recently enabled the use of the ChatGPT API, the cost of this solution is $0.002 per thousand tokens. It's very cheap, 10x cheaper than the previous GPT access price.
4. The decreasing price makes it easier to experiment and test ideas, which will result in more applications in specific products.
5. However, there is a way to make the use of a large language model even cheaper. The way to do this is to run it locally on your computer. Then 1000 tokens cost $0.
6. Until now, this was impossible, because large language models need a lot of RAM and many GPUs. For example, the open source LLM Bloom needs 352 GB of RAM and 8 GPUs (PLN 22k each). This is not the specification of a typical laptop.
7. However, the engineers from Meta managed to create a model that supposedly has GPT-3 capabilities and can be run on a MacBook Pro M2 64GB RAM laptop.
8. The LLaMA model is open source, unfortunately under a license that excludes commercial use. But you can watch it and play with it.
9. Programmer Georgi Gerganov watched and played with it and made a C++ version of LLaMA that runs on a laptop: llamma.ccp. Locally, without the internet, without other vendors’ APIs. LLaMA runs on the user's computer. For free, without limits.
10. What is a significant drawback of LLaMA is that it doesn’t understand the instructions. Understanding instructions is a major OpenAI innovation. Thanks to it, the responses of the language model are more intuitive and useful for the user.
11. As you can see in the example above, understanding the instructions is important to get the expected results.
12. LLaMA doesn’t understand the instructions.?
领英推荐
13. Very quickly, however, came the Stanford Alpaca project, created by a group of Stanford scientists who took LLaMA and tuned it to understand instructions.
14. Interestingly, the Stanford Alpaca has been tuned with data from the OpenAI API. Researchers paid for access to the OpenAI API, threw 175 prompts/seed tasks into it, and fine-tuned LLaMA with answers. It works. Beautiful!
15. Why is it fascinating? It turns out that you can have a large language model that is not resource intensive, runs locally, and has a similar UX to an industry leader, for free.
16. This means that it will probably be possible to embed such a model at the level of the computer's operating system, or maybe soon the smartphone’s operating system.
17. Imagine an alternate reality where whenever you take a photo with your phone, it has to be uploaded to the internet for some digital service to enhance brightness, contrast and other computational photography tricks that allow you to take good photos on your phone. Then every photo capture would be slow and costly for the owner of the operating system.
18. A more vulgar example: if the calculator on the phone, whenever we enter 2+2 into it, would have to ask the central calculation server for the result, then using the calculator would be less convenient, slower and more expensive.
19. This is how LLMs work in the model proposed by OpenAI. We have to pay for each processing.
20. Alpaca and LLaMA are a promise that in the future these operations will be able to happen locally, at the level of the operating system.
21. Who makes money from editing photos from the phone? Who makes money counting on a calculator? Only the manufacturer of the operating system or the phone / computer that is the wrapper for this operating system.
22. I think this is the future of this technology. A large language model will be part of the operating system of our phones and personal computers.
Did you enjoy this edition of my newsletter? Feel free to share it with your connections!
AI Architect/Data Science Manager
1 年Kuba Filipowski It's rather not the case that it's cheaper on the laptop as it's not (the laptop uses a huge amount of energy during inference - much more that the optimized global API version). It's probably rather the case that everyone can have his own AI trained on private conversations, images etc.