"Generative AI Everywhere" require local, smaller, specialised models running on-device!
Magnus Revang
Chief Product Officer at Openstream.ai - creating the next generation of AI products | ex-Gartner Analyst | ex-Award Winning UX Leader | Experienced Keynote Speaker | Design Thinker
The vision of "Generative AI Everywhere" is here. We see the new features in Photoshop, the capabilities of CoPilot and Bing inside the Edge browser. Yet... it seems the potential is so much more. The challenge of integrating "Generative AI Everywhere" is one of latency. A large, cloud-hosted model can take up to 10 seconds to respond - not exactly usable for powering autocorrect functionality.
Apple showed off some local models running in iOS during the WWDC. The suggested words on the keyboard being one use-cases. I was disappointed. I wanted them to go all out on local models, built into the very core of the OS.
There is no limit to what you could do with a powerful base model, fine-tuned on telemetry data - with parameter-efficient fine-tuning for different use-cases that you can use as glasses in front of the base model. Prompt engineering, output guidance and model orchestration running locally to give millisecond response times... even with multi-billion parameter models.
Imagine being able to make the text just the right length - the same way you would scale an image. All while preserving meaning. Or have the engine peer-review your text... and propose the changes so all you have to do is accept or decline them. What about rewrite? Imagine being able to rewrite the text in different voices, including the brand-voice decided by the marketing department in your company?
领英推荐
You could go even further. Imagine being able to Paste the meaning of the text on the clipboard into a section of text. The result would be the selected text rewritten to also contain the text you are "pasting".
Imagine being able to delete a section of text, but at the same time ensure that the rest of the article you are writing contains the deleted text! I'm sure there is many, many other almost magical UI enhancements possible with LLMs. And if you start looking at tabular data, documents, images, charts, videos... the sky is the limit. And so far I've only mentioned things you could put in a context menu.
To do so, however, requires millisecond response time on the models. It requires multi-modality input and output. It requires local models that can run their inference in constrained compute environments. Open-source is where those models - and techniques to make them work - is found. But it's not enough to just take a model and deploy it. Creating a use-case, using a local open-source model, but fine-tuned and adapted to deliver ChatGPT-like performance on-device - that's really really hard!
Oh... by the way... we do. Give me a call.