AI Unveiled: What is (and isn't) AI?
AI Unveiled, Episode 1: What is (and isn't) AI?

AI Unveiled: What is (and isn't) AI?


Welcome to "AI Unveiled," where we take a closer look at artificial intelligence and show you what's really going on behind the scenes. This time around, Virbe 's CEO Krzysztof (Chris) Wrobel will peel back the curtain on what AI can truly do – and what it can't.


In recent years, AI chatbots like ChatGPT and DALL-E have captured our imagination with their ability to engage in human-like conversation, generate text on any topic, and create images from more or less vague descriptions.

This has led to a common belief: if an AI can understand and respond to our requests so well, surely it can do anything we ask of it. But the reality of AI capabilities is far more nuanced.

Artificial Intelligence Lingo

First things first, what do we mean when we say "AI" nowadays? AI as a scientific field has been around for many years, encompassing a broad range of more and less advanced technologies and approaches aimed at creating intelligent machines. Key among these are machine learning, which enables algorithms to learn from data and make predictions without explicit programming, deep learning, which uses neural networks to analyze and learn from vast amounts of data to achieve sophisticated tasks, or NLP (Natural Language Processing) that allow computers to understand, interpret and manipulate human language. There are numerous applications of "AI" systems based on those (and more) technologies in today's world, such as recommendation systems in online stores or streaming services, smartphone camera filters, and photo enhancements. Even the spam filters in your email are powered by "AI" which has been evolving since the '90s!

AI encompasses a vast range of technologies, like machine learning, deep learning, or neural networks.

However, when you hear people talk about "AI" today, they usually mean the kind that they interact with online – assistant-like systems (eg. ChatGPT, Claude, Perplexity) that can understand and generate human-like text and speech, answer questions from documents and internet, analyze and create images, and even perform tasks for you on your computer, for example, in your email inbox or calendar.

The Myth: One AI to Rule Them All

There's a common belief that when you talk to an AI chatbot, you're interacting with a single, super-smart AI that knows and understands everything. The chat flows smoothly, and the AI seems able to handle just about anything you throw at it. Because it communicates in a natural language, we start to associate its capabilities with human intelligence. If something can talk with sense, explain its reasoning, and create as a human would, we tend to assume it must be similar to us. But is this really the case?

The Reality: A Symphony of Specialized Systems

When you interact with an AI chatbot that seems to effortlessly switch between complex tasks while communicating in the most human way possible, it's easy to expect it to perceive the world as we do: to be capable of logical reasoning, know the facts and truths, in short be a coherent, intelligent unit. In reality, what you're experiencing is a carefully orchestrated integration of multiple agents using specialized services.


Each AI system excels at specific tasks—like generating text or recognizing objects.

These services usually fall into the following categories within the current AI ecosystem:

Large Language Models (e.g., GPT-3, Llama)

  • Capabilities: Text transformation and generation, language understanding
  • Limitations: Lack of true comprehension, potential for misinformation, reliance on complex pipelines and specialized components working together

Image Generation Models (e.g., DALL-E, Stable Diffusion, Midjourney)

  • Capabilities: Creating images from text descriptions
  • Limitations: Lack of true understanding of the physical world, potential copyright issues

Speech Recognition and Synthesis (e.g. Azure Cognitive Services, Whisper, Eleven Labs, Google Speech)

  • Capabilities: Converting speech to text and vice versa
  • Limitations: Accuracy issues with accents, background noise

Computer Vision

  • Capabilities: Image analysis, object recognition
  • Limitations: Lack of true understanding of the scene, vulnerable to subtle image alterations

LLMs are just one type of specialized AI system.

Currently, when you're chatting with an advanced AI assistant like ChatGPT, behind the scenes is actually a cleverly constructed pipeline orchestrating a variety of tools, APIs, and services – including specialized AI models and other agents. It's usually the LLM part that takes care of transforming your inputs into computer-understandable "intents" and delivering a response. If the intent requires a more complex task to be performed, a different part of the system (e.g. an image generation service) will be triggered by the pipeline or agent and then returned. What may be tricky in this scenario is the illusion that the "talking" part can "see" the image generated just as you can see it, but in reality, it only relays communication to you and back so that the other system can tweak the image if needed.

ChatGPT is a clever implementation of an LLM and a variety of other specialized AI services.


The Future: Large Multimodal Models

More and more AI tools (e.g. GPT-4o, Claude 3.5 Sonnet) combine the capabilities of different systems (modalities), allowing the users to rely on fewer tools to achieve similar results or simplify workflows. However, many of them are still based on connections between separate AI services usually limited to simple text and image inputs and outputs. The natural progression will be towards much more potent, truly multimodal models (eg. capable of real-time video generation), but we shouldn't expect this transition very soon. We got used to a super rapid and somewhat revolutionary growth of LLMs in the past few years, but now in 2024, it seems to have come to a significant slowdown as the computational and algorithmic complexity of each next leap in technology advancement grows exponentially.

Large Multimodal Models

  • Capabilities: handling different inputs and outputs, like text, image, audio
  • Limitations: computational and algorithmic challenges, expensive to scale and/or fine-tune

More and more AI tools combine the capabilities of different modalities, not only text.


Why It's Not As Simple As It Looks

While the setup of current AI tools is impressive, it's fundamentally different from how human intelligence works:

  1. Lack of true understanding: these systems don't "understand" in the way we do. They excel at pattern recognition and statistical correlations but lack deep, contextual comprehension.
  2. Specialized vs. General Intelligence: Each AI component is highly specialized. They're great at specific tasks but lack the general intelligence and adaptability of the human mind.
  3. Data Dependence: AI relies heavily on patterns in training data. Unlike humans, it can't make intuitive leaps or come up with truly novel ideas.
  4. Limited Integration: While these systems work together efficiently, they don't truly "communicate" or share understanding like different parts of the human brain do.
  5. Ethical and Bias Issues: AI can perpetuate or amplify biases present in training data, and privacy concerns arise with integrated AI systems.

AI's ability to mimic human language creates an illusion of comprehension.


Why This Matters

Understanding these realities helps us appreciate the true capabilities of AI and use it more effectively. AI is a powerful tool, but it's not magic – it's a collection of specialized technologies working together under human direction. Knowing this is crucial for several reasons:

  • Setting realistic expectations: knowing the limitations of AI helps us use these tools and create solutions more effectively and avoid disappointment.
  • Ethical considerations: each specialized AI comes with its own set of biases and limitations, which can compound when combined.
  • Reasonable discourse online: getting our terminology and understanding of AI systems straight, will make a huge difference in how we communicate ideas and understand tools or products.
  • Future development: as we work towards more advanced AI systems, understanding the current state helps us appreciate the challenges ahead and the potential breakthroughs.

Knowing AI's limitations prevents disappointment and helps you use these tools more effectively and realistically.


Looking Ahead

The future of AI is exciting, with researchers working towards truly integrated systems that can seamlessly combine different types of intelligence. But for now, it's important to appreciate these tools for what they are – clever combinations of specialized technologies, not all-knowing digital entities.

Stay tuned for more episodes of "AI Unveiled," where we'll uncover LLM-based techniques, such as RAG and Agents, explaining what's needed to create ChatGPT-like apps and incorporate them into your business.

Follow Virbe and visit our website www.virbe.ai.


Krzysztof (Chris) Wrobel

Founder & CEO at Virbe - adding a human touch to the online and in-person customer experiences with AI-powered Virtual Beings

7 个月

The first episode is up - many more to follow!

Olga Jakubowska

Head of Product & Conversational AI @Virbe

7 个月

Love this series!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了