Avatars, Agents, and Companions

Avatars, Agents, and Companions

Fifteen years ago, I was approached by a former colleague to write a book about how I expected the world to look in 2025. Sadly, the book itself has long since disappeared into the depths of the Internet, but I remember being very much focused on three related concepts that I figured would all be "coming due" about now: Avatars, Agents, and Companions.

Now, these are all familiar concepts, and occasionally, someone comes up with exciting implementations. At the same time, however, none of these are implemented consistently, and history is littered with many failed attempts to make these viable (does anyone remember "Clippy"?).

The Evolution of Companions

I'd argue, however, that we're closing in on real viable examples of each that may have legs (or wheels or tails) this time around, and it has to do with the rise in both generative AI and a shift in thinking away from precise commands to a more flexible, intelligent version of same.

While these terms are similar, they represent three fairly distinct kinds of entities.

Avatars. The term avatar comes from the Vedic, meaning the physical representation of a god. An avatar is any two- or three-dimensional representation (or sound representation) of how a person wants to present themselves. It can also represent how a given AI is displayed in a virtual world. Avatars can be something as seemingly simple as an image or a voice, but most people think of avatars as being animated and able to express some form of emotion. The text output of ChatGPT is a form of avatar as well. The avatar is the outward-facing view of an entity to the external world.

Agents. An agent is a similar concept but can best be considered an application that acts on your behalf and informs you about information that comes from these efforts. You can think of an agent as analogous to a writer's or an actor's agent who negotiates with others on your behalf or who does specific actions based on your commands.

Agents do things. Sometimes they are dispatched by a person. Other times they are invoked by an event (such as a timer or a condition such as reaching a certain balance in your bank account). Most importantly, agents themselves are generally invisible - they are autonomous or semi-autonomous bundles of code. Agents also maintain some degree of state, implying they have at least a nominal degree of persistence.

In some respects, agents are distributed, decentralized, persistent analogs to objects or daemons. Most programmers learn about object-oriented design when dealing with closed environments where such objects can be resident in memory. Still, once you start moving towards a services-oriented model, OOP tends to fall by the wayside in favor of other design patterns. As applications become more distributed, agents will likely become tetherable, moving from server to server as necessary to accommodate its requirements. Webhooks are one design pattern that you're seeing that hints at this autonomous behavior.

Companions. A companion combines an agent with one or more avatars. What differentiates companions from other computational bots is that they typically retain a potentially large array of metadata about the person (or bot) they represent.

Comments will become the next central stage of evolution for artificial intelligence; conversely, while we are getting closer, we are not quite there yet. Again, companions have been a staple of science fiction for as long as science fiction has existed. They vary from the idea of automated butlers to spirit animals to potential love interests. Still, the idea in almost all cases is that they have an incredibly close relationship with the person in question, knowing their likes and dislikes, having access to their resources, and often acting to defend their person from external attacks.

Companions, like avatars and agents, have been implemented many times over the decades. Typically, they have failed due to the immaturity of one or another part of the AI stack. Microsoft, for instance, released an Agent Toolkit in 1999, making the back end of Clippy available as an API. (I may have spent too much time playing with it).

There were several fundamental problems with the toolkit. On a reasonably fast computer at the time, it was very slow and, worse, dragged other programs down. This was primarily due to the verbal interface, which provided a (fundamental) speech-to-text layer for controlling the agent. It could interface with programs, open windows, and run executable functions, but it required some serious understanding of Windows programming to do any but the simplest of operations. It was also sprite-based, even if it presented itself as being 3D. Not surprisingly, it never made it out of beta.

It would take another decade before Siri debuted from Apple. Again, the assumption was that companions needed the ability to work speech-to-text, and Siri had all but dispensed with visual avatars or even chatbots. Siri, and similar companions, including Amazon's Alexis, had only a moderate level of success, primarily on mobile devices. While it more cleanly defined the domain of what a companion should be, it also was one of a series of innovations that received a great deal of hype but then proceeded to fade away during the 2010s.

Towards AI-Based Companions

ChatGPT is a necessary precursor to companions. It can hold a conversation at a surprisingly deep level of understanding. As importantly, it can maintain session state for a certain period, including some historical state information about the speaker.

However, it will likely take the rise of micro-LLMs (call them small language models or SLMs) that have enough of a foundation to "talk" meaningfully but that is also capable of retaining state about the user without that information bleeding too far outside the constraints of the model. One critical role that companions play is that of data guardians, protecting their owner's news (and, for that matter, access) from hostile or questionable use.

This is the Jeeves model, emulating the concept of the 19th-century butler, who had an incredible degree of control over who could see the mansion's family and what information was released to outsiders. The butler also arranged for food serving, coordinated with the cooks and cleaning staff, and acted as the point of contact with the outside world.

The 21st-century Jeeves would act as an agent, a filter, and an advisor. The agent is the most achievable part of the companion, though it generally requires the ability to translate natural language into some form of service API. This conversion layer is still largely speculative, especially given the reliability requirement that any integration layer must face. This can be accomplished now for simple actions (turn the lights down, Jeeves). Other actions (negotiate the business contract, Jeeves) are considerably more complex and require both a consistent mechanism for smart contracts and the ability to attempt to optimize such contracts to provide the most significant benefit to the user. Smart contracts deserve far more than I can address here.

Virtual Companionship

While the metaverse went through its (short) day in the sun a few years back, there were many attempts to define precisely what the metaverse was, from Neal Stephenson-like interactive worlds to the libertarian ideal of completely opaque monetary transactions. Yet for all that, I very seldom heart what I believe is the essence of the multiverse: the idea of the companion.

So what do companions have to do with the metaverse? The idea of the companion should be fairly obvious: they are intelligences that act as agents, avatars, and advisors. They have an incredibly privileged position in a given person's life in that they have deep knowledge about that person (essentially their full transactional stack), have a certain amount of legal immunity (think a locked diary), can act as a proxy for that person in specific instances, and they can act as advisors. They are personally aware. Companions may be the first actual instances of artificial general intelligence.

A person may have multiple "companions," though each can be considered another avatar of a generalized companion entity. Companions could be "imaginary friends", teachers, assistants, "older siblings," butlers, secretaries, game opponents, squires, fashion advisors, financial advisors, mentors, and yes, friends of a romantic nature.

From an abstraction standpoint, companions are interfaces that connect a broad global context (such as financial advice) with a specialized local context that identifies information that represents the individual in question. There is also the possibility that the global context may also be supplemented with live actors, such as a human financial advisor or doctor who "puppets" the companion to specify the information that requires human perceptions or judgment. This is analogous to a human being expert being called in when the options offered by an LLM are exhausted or unacceptable.

Note that such companions are likely to be as controversial as Generative AI has been. Identity management and authentication become critical, especially as most of the cues that help to identify a given companion as being legitimate otherwise are potentially not there. Spoofing becomes an ever-present danger, as does determining liability when complex data streams are firing up the machine learning portion of these components. Finally, some key ethical and psychological implications in creating Turing-proof companions could affect everything from improper child development to the spread of disinformation.

Summary

Distributed intelligence is likely the next stage in the evolution of AI. This includes complex systems that represent multiple types of intelligence acting in concert, actors that effectively act as proxies for other intelligences, and dedicated contexts that provide learning-enabled companions for individuals. I figure they will build inexorably from where we are today, and that it will be this arena of cognitive development, not de fi or "metaverses" that will represent the next step toward artificial general intelligence.


Kurt Cagle ?is the editor of?The Cagle Report . He lives in Bellevue, Washington, with his wife, kids, and weird furry wingless sociopathic dragons (meow).

Malome Tebatso Khomo

Everywhere, knowingly with the bG-Hum; Crusties!

1 年

Always a refreshing read. More so this time as it touches upon the AI craze that's in vogue without falling into the usual fanaticism. But it instructs also on the fate of anthropomorphic constructs which are exhibited here as generally short lived. Software beings do not stay long in the forefront of our social consciousness unless they adorn a narrative in a plot that travels with us in real life. I remember when object oriented mix-ins came up at Symbolics, and LISP it seemed the world would change forever. But it has not, even though they're everywhere (like default methods in java interfaces for example). I would be interested in a functionalist software typology instead. Precursor to mixin for example I would delegate the humble interface visual Icon with Callbacks. And for Agent I would suggest implementation of MVC patterns; and for desktop Companions any Reactive platform be it JS or even XForms. But without Plot or Narrative à la Toy Story they too will soon also all be forgotten. PS: Just a day ago I had an incidental exchange with an outfit that's sorely in need of a Companion. Very small team, Hugely successful following. Total dependence on Proto-companion Bots that are doing more harm than good ...

Tripp Josserand Austin

Project Controls P6 Scheduler

1 年

I want to design and build a companion.

Debbie Reynolds

The Data Diva | Data Privacy & Emerging Technologies Advisor | Technologist | Keynote Speaker | Helping Companies Make Data Privacy and Business Advantage | Advisor | Futurist | #1 Data Privacy Podcast Host | Polymath

1 年

Kurt Cagle this is brilliant thought leadership and futurist thinking. You were right then and now.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了