AI & Operating Systems
Many people first learned about AI when ChatGPT gained popularity in the news about a year ago. This wasn't surprising to me. For most, the value of technology is measured by its impact on daily life. ChatGPT, in this respect, was a pioneering AI product. Although AI has enhanced numerous software products, like face detection in Google Photos or Netflix recommendations, it was typically a feature rather than the product itself. OpenAI’s development of ChatGPT, showcasing the capabilities of large language models (LLMs), marked a shift in this trend.
Now, onto my viewpoints. Let me start with hot take. I don't find ChatGPT particularly impressive. It excels in text manipulation and generation but as a digital assistant with vast internet knowledge, it has limitations. It's prone to inaccuracies, very costly to operate, and slow for search engine use. For many queries, traditional searches on platforms like StackOverflow or Google can be more efficient.
The future, however, lies in transforming LLMs into operating systems rather than mere applications, a vision shared by OpenAI researcher Andrej Karpathy ChatGPT is evolving into an orchestrator, using tools like Bing Search and DALL-E to direct user requests to the most suitable tool or agent. (Youtube video) These agents might be specialized AI models or non-AI applications.
I have encountered similar concept in the past with smart assistants. I happened to work a bit with 亚马逊 Alexa, which is a competitor to Siri, Cortana, and Google Assistant etc. Besides functionalities that Alexa was providing, it was offering an ability to create “skills” for it. There is a interface using which custom applications can be connected to this smart assistant. For instance, if I write an application that can translate text from English to Ukrainian, I can make this application to be “an Alexa skill”. Using special interface, I can “teach” Alexa to recognise that this tool is at it’s disposal, and program “the intent”. If Alexa can recognise that user is asking for a translation, it would get user’s input, direct it to my application, get the output and channel it back to the user via Alexa interface - which is voice. At one point that seemed to me like a future way of interacting with software - voice UI. I even did a talk at a AI meetup, titled “How to make your product an interlocutor”. So in this case, Alexa did not have to be do-it-all application, it should be smart enough to get the intent, and be able to connect this intent to respective “skill”. These “skills” do not have to be AI agents, but rather a “regular” software applications and would be developed not only by Alexa team, but anybody. And Alexa would have a marketplace for developers to publish their “skills”, and users can get (or buy) skills to add new capabilities to the assistant. The way Alexa was matching intent to skill was not perfect however. It would often not get the intent unless it was phrased very explicitly. One way of mitigating that was to provide multiple examples, and configure multiple ways to invoke my “skill” in Alexa interface.
领英推荐
However, with LLMs, that ask is becoming way more feasible. LLMs in general and chatGPT in particular are quite good in understanding natural language. Now, what is going to happen when Siri would be able to understand not a small set of instructions, but various intents ? It’s already doing quite well to interpret queries like “Hey Siri, set up an alarm at 10 AM” and “Wake me up in 2 hours” correctly. But with this ecosystem of understanding, tools and agents it could take whole assistant market to entirely new level. ChatGPT (or Siri, or Google Assistant) can be not only an application that can answer questions, but an operating system, a way to interact with all the applications, including one that is answering questions. One, that is able to, for example, “find a romcom that I was watching last year with that blonde actress and play it on living room TV”. What is even cooler, that same as we have multiple operating systems at the moment, the competition in the space of AI OS seems to be accelerating. OpenAI might be building it’s operating system (or even, according to the rumours, is seeking to get into building hardware), Meta , with its LLaMA models could trying to follow Linux path and create open source OS.
Existing OS maintainers could be trying to augment or even transform their systems to be AI based systems. 微软 has exclusive access to OpenAI models, so it can integrate it with Windows. And 苹果 , although currently does not offer any LLM-first powered products, is definitely doing a lot of work in this area. Recently they have introduced a framework to run on Apple Silicon. Meta has introduced an AI OS that is embedded in glasses. A model, that “lives” on the wearable device, able to be invoked conveniently via voice (imagine saying ”Meta, check out what I am looking at, and describe it to me in a rap song”). Having an AI OS, as a router or orchestrator for various tools, skills, agents or simply applications could be a quite the shift from the way we look at, interact and value computing devices.
Emergence of AI as an operating system, orchestrating a range of tools and applications, promises to revolutionize our interaction with technology, embedding it more seamlessly into our lives. This future, rich in interaction diversity, is truly exciting.