登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Computer, What Is the Right AI Interface?

Mark Gillespie

发布日期: 2024年10月27日

“Computer?” says Montgomery Scott, the famous starship engineer, to an OG Macintosh Plus in Star Trek IV: The Voyage Home. Scotty and the rest of his crew are trapped in the 20th century, trying to make sense of its “quaint” technology and customs. After fumbling for a minute, Dr. “Bones” McCoy helpfully hands him the mouse, which Scotty mistakes for a microphone. Scotty is so confused because he is more familiar with the AI interface he uses aboard the U.S.S. Enterprise—his own voice.

When crewmembers address the ship’s computer, they say a “Hey, Siri”-like prompt and then ask a question. Our collective cultural view of the future helped prepare us for Siri, Alexa, OK Google Assistant, and other personal assistants.

I’m not alone. “Science fiction does something better than predict the future: It influences it,” says sci-fi writer Cory Doctorow in a 2017 article in Slate. Doctorow is among many futurists who claim science fiction visions from decades ago have influenced today’s engineers and programmers.

I wonder if Scotty have any easier time whipping up a formula for transparent aluminum if he found himself in the early-ish 21st century with our touch interfaces, keypads, and embryonic chat-based interfaces?

My Beef With GUI Platforms

Okay, let’s get this out of the way. I’m kind of old.

In 1990, just a few years after Star Trek IV came out, I began my first full-time job. I was a combination newspaper reporter and darkroom technician. One of my tools was for “dodging,” a round piece of cardboard attached to a Popsicle stick, which I frantically waved over part of a photo to keep it lighter while the rest of the picture developed. There was another technique called “burning,” where I made an OK sign with my hand and waved it over a part of a photo I wanted to be darker than the rest.

Today, my copy of Adobe Photoshop 2025 still has dodge and burn tools. Guess what they look like?

Adobe Photoshop 2025 dodge and burn tools.

That’s right! The Popsicle stick thingie and my fingers.

At my newspaper, we would make Xerox copies of our laser-printed stories, cut them into column-sized strips, and use heated wax to paste them onto large paper broadsheet templates so a technician could create photo-sensitive plates to attach to the printing press. In other words, my job was to make a copy of my story and then cut and paste it into the newspaper.

Here’s my beef with graphical user interfaces: They don’t always exist to help us learn the application more efficiently. Many apps have so many buttons and menus exposed that we have to take classes to discover them all. GUIs limit our choices and keep us from experiencing the full power of our computers and networks until a user interface engineer somewhere decides to include a button for a new feature.

In the 1980s, Apple and Microsoft gave us files and folders, a familiar metaphor of the desktop to help comfort office workers. Word processing programs like Microsoft Word added fonts, menus, and little buttons that made our words bold, italicized, or underlined. Then came spreadsheets for calculations and PowerPoint for office presentations. Adobe’s whole raft of design, illustration, and video tools became the closely guarded province of specialists who export finished files but don’t want end users monkeying with the source files.

Format and capability became commoditized, the apps themselves creating silos that influenced the staffing structure of entire companies. Google Workspace, Canva, and other browser-based tools helped simplify and democratize some of these tools, but you still see the same controls based on analog tools, like echoes from the past.

What is it with our obsession with slideshow presentations? After all, a “slide” is a little 35-millimeter piece of film mounted on cardboard. We’d shine a light through it to display our still images on a white screen. Kodak sold the first carousel-style slide projector in 1965. How has our corporate presentation style changed substantially since then?

Heck, I’m typing this on a QWERTY keyboard , with letters arranged that way because 19th-century typewriters would get jammed if the keys were arranged ABCDEF.

An 1878 Remington typewriter would be familiar to anyone tapping away on a smartphone today. WIKIMEDIA COMMONS

Touch-activated app design gives us a smoother experience, adding scrolling, swiping, pinching, and tapping, but it’s still buttons, menus, and keyboards on a smaller screen. Digital natives who have never set foot in a darkroom or typed a letter on a Smith Corona are still shackled to the same silos and pre-chosen functions.

It reminds me of early elevator design. You know, the ones with the guy in the uniform who would ask which floor you want?

Automatic elevator buttons—a user interface—had been around since 1892, but elevator passengers didn’t feel comfortable riding in them without an expert “driver.” Elevator designers had to develop a whole comforting sensory grammar of an elevator to lure passengers into automatic elevator cars. Buttons became rounded and inviting. The lighting was muted, and soothing music piped in. Nothing threatening here. Don’t think about the 40-story drop beneath your feet.

The Chatbot Reboot

Along come consumer-facing LLMs. ChatGPT, Claude, Gemini, and the rest could have chosen almost any other interface, but they picked the chat interface.

For one, it’s familiar to everyone alive who has ever used the internet. Programmers recognize it as a command line. Early GenX adopters like myself remember text-based chatrooms and instant messaging. Social media users see feeds and DMs.

As interface designer Amelia Wattenberger beautifully explains, chat is not the best AI interface these companies could have chosen, but it’s arguably the most recognizable.

New LLM users start with a conversation, and the chatbot responds in a surprisingly human way. We might decide to push some boundaries, and it keeps up. We quickly discover the inaccuracies and hallucinations plaguing AI output and learn its limitations. Advanced models, such as ChatGPT 4o, can morph into other software. By asking for an image, you activate DALL-E, which will create an image for you and offer rudimentary spot editing. In ChatGPT Canvas, your document opens in its own frame with a stripped-down word-processing toolbar and a chat-based sidebar.

The downside of chat-based interfaces is that they create work in one long chain that must be transferred to other apps. When I use ChatGPT for short-form content, I have to save the text fragments to a second Google Doc file for longer content. If I need to make more words than the token window of a single response can support, I need to patch it together elsewhere.

I would love for ChatGPT to display a web form for structured data entry or a single live document that can add to itself and be saved in a predictable place. However, we’re still bound to the long, linear chat, with errors and inconsistencies piling up the longer we iterate.

The Ideal AI Interface of the Future

In my daily work, I have an AI window open in one browser and regularly use about 20 other apps, including a calendar, email client, Slack threads, PDF readers, task management system, Workspace apps, video calls, and a second GUI-based AI platform. I encounter hundreds of different interface schemes on random websites that I visit for research and utility.

Wouldn’t it be great if I could access anything I wanted from a familiar interface?

Let’s say I wanted a live table embedded in a word processing document that I will present—as an automatically designed slideshow—over a video call. I should be able to put that together using my prompt engineering skills. My calendar listings should auto-populate with transcripts of the calls I took. If I want to know the name of a project manager on a particular client, I should be able to ask for it and see the answer without clicking through several screens in a complex project management system.

If I want to change every heading in a lengthy document to Heading 2, Comic Sans Serif, I should have a prompt window to ask for this. A Google search bar or AI research tool should be accessible from whatever workspace I am in—and those workspaces should be able to morph from one kind of app to another.

If I’m working on a report, I shouldn’t have to tab between several kinds of apps. The report itself should be able to multiply itself, like a stem cell, into several different versions, sendable from the get-go to review teams, social media feeds, email recipients, or traditional publishing platforms.

Choose Your Interface Carefully

As AI companies roll out new products, whether a commercial-based platform with limited functions or a freewheeling, embryonic LLM service, they should pay attention to the visual touchpoints of their user interface.

Look for the equivalents of the “dodge” and “burn” tools, those relics of a time when creative people toiled in dimly lit rooms with masking tape and smelly chemicals. If you see lots of buttons and menus, reconsider your choices. Be willing to endure a slight learning curve to adapt your expectations to the best possible tools.

Let’s normalize toolbars that understand the context of what you’re trying to accomplish or apps that can seamlessly adapt themselves to your task. There’s no reason why an AI-supported document can’t also become a spreadsheet, a website, or a slideshow. We should favor systems that break the familiar paradigms of documents, slides, files, folders, and desktops.

Imagine a device the size of a smartphone that contains your entire digital life and can connect to any display, including smart glasses or contact lenses. Anything we need to do professionally would be called up by the sound of our voice. If we want a traditional QWERTY keyboard, one will appear. A full studio mixing board or artists’ paintbrushes could be summoned. There may be ways of interacting with an extensive, distributed network that we haven’t yet imagined. The user could even customize interfaces on the spot for one particular task.

When my grandkids sit at an antique “desktop” computer in a museum somewhere, I hope they’ll be just as confused by the mouse and keyboard as our favorite starship engineer.

This article is 100% human written on a keyboard designed to produce one letter at a time. I used Grammarly Pro to correct my mistakes and suggest better ways of writing things.

Computer, What Is the Right AI Interface?

Mark Gillespie

My Beef With GUI Platforms

The Chatbot Reboot

The Ideal AI Interface of the Future

Choose Your Interface Carefully

Human Words

144 位关注者

更多精彩文章

社区洞察

My Beef With GUI Platforms

The Chatbot Reboot

The Ideal AI Interface of the Future

Choose Your Interface Carefully

Human Words

144 位关注者

AI and Storytelling: The Journey Matters More Than the Big Reveal

2024年11月18日

Ethical AI Content: The Three Laws of Robotic Writing

2024年11月14日

The Neurodivergent Detective: AI Prompting on the Spectrum

2024年11月7日

Speech Acts and AI Slop: Words That Can Change Your Life

2024年11月6日

AI and the Bechdel Test: This Blog Might Have a Gender-Bias Problem

2024年11月4日

Birdman, and What We Talk About When We Talk About SEO

2024年11月1日

AI and Anthropomorphism: What Are a Chatbot's Pronouns?

2024年10月31日

Can an AI Make You Feel Frisson?

2024年10月24日

AI Humanization: Make It Sound Like You Wrote It That Way on Purpose

2024年10月23日

Data and Lore: Why AI Is Better at 'Content' Than 'Stories'

2024年10月20日

社区洞察