Building the first AI-native interface
Photo by Olena Bohovyk via Pexels

Building the first AI-native interface

For decades, our digital interfaces have remained fundamentally unchanged. We've moved from interactions based on physical printing to mobile touchscreens, but the underlying paradigm persists - windows, folders, files, and pages, and concepts dating back literally thousands of years. As the capabilities of AI become increasingly expansive, we're still trying to shoehorn it into interfaces derived from newspapers and books.

So what if we started from scratch?

The limitations of traditional interfaces

In exploring, and building AI projects, I've kept running into an issue I've named "the Mercator problem." Just as the Mercator projection distorts our spherical Earth to fit a flat map, traditional interfaces force our naturally non-linear thoughts into rigid, flat structures. Our minds don't organise information in folders and files, because we think in connections, associations, and relationships.

This mismatch becomes particularly acute when working with AI. Today's AI interfaces are predominantly text-based chat windows that flatten the potential into linear conversations. They ignore the natural way humans process information at multiple levels simultaneously.

The Multimodal bridge

In the near to mid-term, I think AI interfaces will inevitably become multimodal, but in a far more elegant form to what we have now. Voice, text, and touch will blend together as complementary interaction channels rather than separate modes. We're already seeing this evolution with voice assistants that display visual responses, and chat interfaces that accept spoken input. But true multimodality goes beyond simply offering multiple input methods; it requires designing interfaces where each modality serves the interaction style and context where it excels.

Voice provides freedom and spontaneity, and is massively popular with mobile natives but lacks precision. Text offers exactness, granularity and permanence but demands more attention. Touch and gesture create intuitive spatial relationships but require physical engagement. An AI-first interface wouldn't just offer all these options, it'd seamlessly transitions between them based on the user's needs, environment, and the nature of the task.

This multimodal approach serves as a bridge between today's interfaces and truly AI-native designs. It breaks us free from the keyboard and screen while laying groundwork for more revolutionary approaches.

The "Z interface"

I'm in no way qualified to do this, but because I keep bumping into the constraints of the chat interface, I've been exploring several approaches to stepping out of it.

The concept I keep playing with involves adding literal depth to our interactions with AI. This interface would allow users to move fluidly between three layers:

  1. Voice (surface) - the simplest, most natural interaction mode. Pure conversation with the AI through speech, requiring minimal cognitive load and technical understanding.
  2. Content (middle) - where information materialises into readable, editable content. This layer includes transcriptions, artefacts, and source materials, with tools for typing, editing, and organising information.
  3. Structure (deep) - the knowledge graph beneath it all. Users can see and navigate the AI-structured network of information that underpins their work. Users of Roam would recognise this graph of nodes and associations.

Movement between these layers would happen through intuitive zoom gestures, pinching out to go deeper, pinching in to zoom out to simplicity.

This z-axis approach might help to address fundamental challenges in information representation. Unlike traditional flat interfaces, this model provides natural ways to organise and access information at different levels of complexity; casual conversation at the voice layer, structured content in the middle, and strong, weak and abstract links between nodes of knowledge at the structural level.

This interface naturally accommodates different input modalities (voice, text, touch) by assigning them to appropriate layers while maintaining seamless transitions between them, an "invisible handover."

Building how we think

This approach is inspired by how our minds actually work. The human brain processes information at multiple levels simultaneously:

  • We engage in natural conversation with minimal conscious effort
  • We create and consume content through focused attention
  • We build mental models that connect ideas in complex and abstract networks

The z-axis interface mirrors this cognitive architecture, and would allow us to interact with AI in a way that feels natural and intuitive, rather than forcing our thoughts into rigid structures designed for physical paper or basic computing.

Imagine working with an AI interface where:

  • You begin with a simple voice question about climate science
  • As the conversation develops, you zoom into the content layer to see sources, edit text, and take notes
  • You pinch deeper into the structure layer to see how global temperature correlates with ocean acidification and carbon emissions, and how all of this relates to your previous notes and past conversations
  • You can then zoom back out to voice to continue the conversation with this enriched understanding

First principles

As we move toward truly AI-native interfaces, we might consider some new design principles:

  1. Design for thought, not things - create interfaces that map to cognitive processes rather than physical objects
  2. Embrace multimodality - allow seamless transitions between voice, text, touch, and gesture
  3. Respect context - adapt the interface based on the user's current needs and situation
  4. Relationship-based navigation - let users move through information based on common nodes and connections
  5. Balance simplicity and depth - provide both casual and deep interaction modes for different situations

I'm not an engineer and I'm barely a designer, but it's obvious the future of human-AI interaction won't just be a better chat window.

If we're brave, it'll be something entirely new, designed from first principles for a world where computation is abundant, contextual, and conversational.


Dan Leit?o

Chief Product Officer at SkootEco & Sticky Ventures

6 天前

We were working on a project related to this, and I never realized how much I appreciate a dropdown. When you take it away and ask me to write a sentence explaining what two clicks could accomplish, it leads to some "fun" debates. Raffaele Fulgente and Michael McClure will remember this.

回复

INTERESTING stuff, Paul. Like the Z-layer pen pic. We've been looking a lot at what we can realistically expect gen AI to do with precision now, and I really like some of Benedict Evans writing on the subject. Your Z-layer supports his notion that AI currently works well with things that don't have a wrong answer: your top layer; presentational, subjective, about knowledge exchange. A lot of the stuff we're dealing with is organising the middle layer; informative, objective, about knowledge mapping. AI is useful for parts of this, but needs to be held together by things with more substance (and carefully checked). Here's a great summary of why that is: https://www.ben-evans.com/benedictevans/2025/1/the-problem-with-better-models. I'm less sure of - and therefore very curious about - that bottom layer. What it is, how it is stored, queried, consumed. Really interesting and thoughtful article though. Thank you. Steve Erdal Hugh Volpe Steph Clish

Romona Harron-Harding

Founder I Director - Vrai Limited : The Legal ESG Critical Friend I Practising Solicitor I Qualified Barrister (unregistered) I Developing Software I N. Irish

1 个月

A thought-provoking read Paul Smith OBE. Thanks for sharing.

回复
Calum Cameron

Digitalising industry for sustainability, innovation and resilience.

1 个月

There's a manifesto in here somewhere! Thanks for sharing Paul. We are going down the same line of using AI to remove the need for interfaces and imposed organisational models that have deterred "real economy" SME's from digitalising intensively to date. Finding the language to explain it to others (and ourselves!) is invaluable.

Nick McBlain

Co-founder @ Lumion // FL c.2

1 个月

J.A.R.V.I.S., you there?

回复

要查看或添加评论,请登录

Paul Smith OBE的更多文章

社区洞察

其他会员也浏览了