AI Unveiled: What is (and isn't) AI?
Welcome to "AI Unveiled," where we take a closer look at artificial intelligence and show you what's really going on behind the scenes. This time around, Virbe 's CEO Krzysztof (Chris) Wrobel will peel back the curtain on what AI can truly do – and what it can't.
In recent years, AI chatbots like ChatGPT and DALL-E have captured our imagination with their ability to engage in human-like conversation, generate text on any topic, and create images from more or less vague descriptions.
This has led to a common belief: if an AI can understand and respond to our requests so well, surely it can do anything we ask of it. But the reality of AI capabilities is far more nuanced.
Artificial Intelligence Lingo
First things first, what do we mean when we say "AI" nowadays? AI as a scientific field has been around for many years, encompassing a broad range of more and less advanced technologies and approaches aimed at creating intelligent machines. Key among these are machine learning, which enables algorithms to learn from data and make predictions without explicit programming, deep learning, which uses neural networks to analyze and learn from vast amounts of data to achieve sophisticated tasks, or NLP (Natural Language Processing) that allow computers to understand, interpret and manipulate human language. There are numerous applications of "AI" systems based on those (and more) technologies in today's world, such as recommendation systems in online stores or streaming services, smartphone camera filters, and photo enhancements. Even the spam filters in your email are powered by "AI" which has been evolving since the '90s!
However, when you hear people talk about "AI" today, they usually mean the kind that they interact with online – assistant-like systems (eg. ChatGPT, Claude, Perplexity) that can understand and generate human-like text and speech, answer questions from documents and internet, analyze and create images, and even perform tasks for you on your computer, for example, in your email inbox or calendar.
The Myth: One AI to Rule Them All
There's a common belief that when you talk to an AI chatbot, you're interacting with a single, super-smart AI that knows and understands everything. The chat flows smoothly, and the AI seems able to handle just about anything you throw at it. Because it communicates in a natural language, we start to associate its capabilities with human intelligence. If something can talk with sense, explain its reasoning, and create as a human would, we tend to assume it must be similar to us. But is this really the case?
The Reality: A Symphony of Specialized Systems
When you interact with an AI chatbot that seems to effortlessly switch between complex tasks while communicating in the most human way possible, it's easy to expect it to perceive the world as we do: to be capable of logical reasoning, know the facts and truths, in short be a coherent, intelligent unit. In reality, what you're experiencing is a carefully orchestrated integration of multiple agents using specialized services.
These services usually fall into the following categories within the current AI ecosystem:
Large Language Models (e.g., GPT-3, Llama)
Image Generation Models (e.g., DALL-E, Stable Diffusion, Midjourney)
Speech Recognition and Synthesis (e.g. Azure Cognitive Services, Whisper, Eleven Labs, Google Speech)
Computer Vision
领英推荐
Currently, when you're chatting with an advanced AI assistant like ChatGPT, behind the scenes is actually a cleverly constructed pipeline orchestrating a variety of tools, APIs, and services – including specialized AI models and other agents. It's usually the LLM part that takes care of transforming your inputs into computer-understandable "intents" and delivering a response. If the intent requires a more complex task to be performed, a different part of the system (e.g. an image generation service) will be triggered by the pipeline or agent and then returned. What may be tricky in this scenario is the illusion that the "talking" part can "see" the image generated just as you can see it, but in reality, it only relays communication to you and back so that the other system can tweak the image if needed.
The Future: Large Multimodal Models
More and more AI tools (e.g. GPT-4o, Claude 3.5 Sonnet) combine the capabilities of different systems (modalities), allowing the users to rely on fewer tools to achieve similar results or simplify workflows. However, many of them are still based on connections between separate AI services usually limited to simple text and image inputs and outputs. The natural progression will be towards much more potent, truly multimodal models (eg. capable of real-time video generation), but we shouldn't expect this transition very soon. We got used to a super rapid and somewhat revolutionary growth of LLMs in the past few years, but now in 2024, it seems to have come to a significant slowdown as the computational and algorithmic complexity of each next leap in technology advancement grows exponentially.
Large Multimodal Models
Why It's Not As Simple As It Looks
While the setup of current AI tools is impressive, it's fundamentally different from how human intelligence works:
Why This Matters
Understanding these realities helps us appreciate the true capabilities of AI and use it more effectively. AI is a powerful tool, but it's not magic – it's a collection of specialized technologies working together under human direction. Knowing this is crucial for several reasons:
Looking Ahead
The future of AI is exciting, with researchers working towards truly integrated systems that can seamlessly combine different types of intelligence. But for now, it's important to appreciate these tools for what they are – clever combinations of specialized technologies, not all-knowing digital entities.
Stay tuned for more episodes of "AI Unveiled," where we'll uncover LLM-based techniques, such as RAG and Agents, explaining what's needed to create ChatGPT-like apps and incorporate them into your business.
Follow Virbe and visit our website www.virbe.ai.
Founder & CEO at Virbe - adding a human touch to the online and in-person customer experiences with AI-powered Virtual Beings
7 个月The first episode is up - many more to follow!
Head of Product & Conversational AI @Virbe
7 个月Love this series!