登录查看更多内容

Unveiling the Layers of Artificial Intelligence: From Search Engines to Actionable Agents

Srinivas Rao Marri

Enterprise Architect | Cloud Security Strategist | Data Science Innovator | AI/ML & Gen AI Leader | Transforming Technology with Secure & Intelligent Solutions

发布日期: 2024年10月17日

In recent years, artificial intelligence (AI) has grown from a specialized research field to a transformative technology shaping industries worldwide. From self-driving cars and virtual assistants to predictive healthcare and content generation, AI’s impact is undeniable. However, at its core, AI builds upon foundational principles that stretch across multiple disciplines, from computer science to advanced neural networks. Understanding these layers, from machine learning to multi-model AI agents, is essential for grasping the full potential of this technology and its future trajectory.

The below diagram explains the evolutionary path from search engines (like goole.com) to answer engines (like ChatGPT) and ultimately to action engines (like Apple Siri, Google assistant, Tesla autopilot etc.,) highlighting how data is transformed and processed along the way by indexes, transformers, and agents to support increasingly sophisticated tasks, from data retrieval to action execution.

The Layers of Intelligence

This article will take you through a structured exploration of the key components that make AI what it is today. We’ll begin with the broadest field computer science and journey through essential layers such as machine learning, deep learning, generative AI, and large language models. Along the way, we’ll highlight the significance of foundation models, which form the backbone of many modern AI systems, and discuss multi-model AI agents that combine multiple technologies for intelligent decision-making. By the end, you’ll have a clearer understanding of how these components interact to create the AI-driven world we live in today and the one we’re rapidly moving towards.

Computer Science (CS): This is the broadest field that encompasses all aspects of computing, including AI, algorithms, hardware, software, and more. AI is a subset of CS.

Artificial Intelligence (AI): AI is a subset of computer science that focuses on development of intelligent systems capable of performing tasks that typically require human intelligence such as perception, reasoning, learning, problem solving, decision making. AI includes a variety of subfields, including machine learning, natural language processing, robotics, etc.

Machine Learning (ML): ML is a subset of AI that involves the development of algorithms and statistical models that enable machines to learn. These methods use data to improve their performance on tasks through experience, without being explicitly programmed for each task.

Deep Learning: Deep learning is a subset of machine learning that focuses on neural networks. It uses the concept of neuron and synapses like how human brain works.

With deep learning methods can analyze several images and videos in seconds.

Generative AI: Generative AI is a subset of deep learning focused on generating new content (such as text, images, or audio) based patterns and structures learned from training data.

Foundation Models:

A model is a mathematical representation or system designed to understand patterns within data and generate new content based on that understanding. Generative AI is powered by models that are pretrained on internet-scale data and these models are called as foundation models.

Language Models?(LM)?

A language model is a type of AI model designed to understand, predict, or generate human language. It is trained on large amounts of text data to learn how to model the probability distribution of words, sentences, or longer pieces of text. The model can predict the next word or sequence of words based on the given input. Language models are used for various natural language processing (NLP) tasks such as text generation, machine translation, sentiment analysis, and more.

In Natural Language Processing (NLP), parts of speech (POS) play a crucial role in language models by providing insights into the grammatical structure of sentences.

Sentiment analysis and English parts of speech (POS) are closely related because understanding the grammatical roles of words helps sentiment analysis models interpret the meaning and sentiment of text more accurately. Sentiment analysis is the task of determining whether a piece of text expresses a positive, negative, or neutral sentiment. Parts of speech tagging helps sentiment analysis by identifying key components of the text that carry emotional weight, such as adjectives, verbs, and adverbs.

Language Parts of Speech for Sentiment Analysis

Agents (with Multi-Models): The terms agent and multi-model agent refer to systems or entities that interact with their environment to achieve specific goals.

An AI agent is an autonomous system or entity capable of perceiving its environment, processing information, and taking actions to achieve a certain goal or fulfill a task. AI agents can operate in various domains, such as robotics, virtual assistants, or game environments, and they follow the perceive-think-act loop.

A multi-model agent (or multimodal agent) refers to an AI agent that can process and integrate multiple types of input data (modalities) and utilize various models (e.g., a combination of language, vision, speech, and more) to achieve more complex, adaptive behavior.

Understanding the Machine Learning Workflow: Data, Algorithms, and Inference

Machine learning (ML) is a process that involves several stages, each of which plays a critical role in the development of effective models that can perform predictive tasks. The following breakdown offers a clear picture of the key components in a typical machine learning pipeline, starting from the data used for training to the deployment of models for inference, as illustrated in the below diagram.

1. Training?Data

At the foundation of any machine learning system is the training data. Data can be broadly categorized into two types:

Labeled Data: This includes structured data where both input data and corresponding output labels are available. Examples include tabular data (organized in rows and columns) and timeseries data (data points collected over time).
Unlabeled Data: This type of data lacks predefined labels, making it more complex to analyze. It often includes unstructured data types, such as text or image data, which are not inherently organized in a structured format.

The choice between labeled and unlabeled data influences the type of algorithm used and how the system learns patterns.

2. ML Algorithm

Machine learning algorithms are at the core of the system, taking in the training data to generate models that can predict outcomes. There are three primary types of learning that dictate how these algorithms operate:

Supervised Learning: This approach is used when labeled data is available. The algorithm learns from input-output pairs to make future predictions (e.g., classification or regression tasks).

Unsupervised Learning: When only input data is available (unlabeled), the algorithm identifies hidden patterns or structures in the data. Clustering and association tasks are common here.

Reinforcement Learning: This is a trial-and-error learning process where the model learns to make a sequence of decisions by interacting with an environment and receiving feedback (rewards or penalties).

Note: The above listed algorithms are not full list, listed only few popular ones.

3. ML?Model

Once trained, the ML algorithm produces an ML model, which is a system that can take new data and make predictions or decisions. The quality of the model depends heavily on the type of algorithm used and the quality of the training data.

Inference

Once the model is built, it can be deployed to perform inference, which refers to the process of applying the model to new, unseen data to generate predictions or insights.

Inference can be done in two main ways:

Batch Inferencing: This involves processing large volumes of data at once, often in scenarios where real-time responses are not required (e.g., financial forecasting or customer segmentation).
Real-Time Inferencing: In contrast, this type of inferencing involves generating predictions on-the-fly, often in milliseconds, for applications where immediate responses are critical (e.g., fraud detection, chatbots, or autonomous vehicles).

Deep Learning

Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers to learn from data. It’s inspired by the structure and function of the human brain, where information is processed through interconnected layers of neurons.

The below diagram shows different types of deep learning model architectures.

The Convolutional Neural Networks (CNNs) mainly used for Image and video processing, object detection, and image segmentation. Examples: Object detection, facial recognition, medical image analysis
The Recurrent Neural Networks (RNNs) Designed to process sequential data, such as text and time series. Examples: Natural Language Processing (NLP): Machine translation, sentiment analysis, text generation
Long Short-Term Memory (LSTM) Networks: A type of RNN that can effectively handle long-term dependencies in data.
Generative Adversarial Networks (GANs): Used to generate new data, as shown in below examples.

Autoencoders are neural networks that learn to compress and decompress data.

Autoencoders are primarily used for tasks such as:

Dimensionality reduction: Reducing the number of features in a dataset while preserving essential information.
Feature learning: Learning latent representations that can capture meaningful patterns in the data.
Anomaly detection: Identifying unusual or abnormal data points.
Image denoising: Removing noise from images.

Generative AI

Generative AI is a deep learning technology and a subset of artificial intelligence, capable of creating/generating new content such as text, music, movies, and code of many languages based on patterns learned from existing data. Generative AI is powered by models that are pretrained on internet-scale data and these models are called as foundation models. Below are the different types of generative AI models we have today.

Generative Adversarial Networks?(GANs)

GANs consist of two neural networks, a generator and a discriminator. The generator creates new data (e.g., images), while the discriminator tries to distinguish between real and generated data. These two networks train together in a competitive setting, where the generator improves its ability to create realistic outputs, and the discriminator improves its ability to detect fakes. Application use case would be image generation (e.g., face synthesis), video generation, style transfer etc.,

Style transfer is a technique in computer vision that involves modifying the appearance of an image (or video) by applying the artistic style of one image to the content of another. In simple terms, it allows you to take the structure and layout of one image (the content image) and blend it with the visual style, textures, or colors of another image (the style image).

领英推荐

The Deep Learning Evolution: What’s Driving the Next…

Inspirisys Solutions Limited (a CAC Holdings Group Company) 4 个月前

How AI is Transforming the IT Industry

BoldTek 8 个月前

Mastering AI: From Fundamentals to Industry…

BIZSOL Technologies 1 年前

Variational Autoencoders (VAEs)

VAEs are a type of generative model that learns to encode input data into a latent space and then generate new data by sampling from this space. They add a probabilistic component to traditional autoencoders, which allows them to generate new data points that are like the original data. Application use cases would be image generation, data compression, anomaly detection.

Transformers

Transformers mainly work within language models using NLP of deep learning techniques. These models are trained to predict the next word or token in a sequence, allowing them to generate coherent and contextually relevant text. Applications use cases for transformers are text generation, machine translation, summarization, code generation etc., Popular examples of transformers are

GPTs (Generative Pre-trained Transformers) first introduced by OpenAI
BERT (Bidirectional Encoder Representations from Transformers), is a machine learning (ML) model for natural language processing (NLP) that was developed by Google
Bidirectional and Autoregressive Transformers (BART) A model that combines bidirectional and autoregressive properties, like a mix of BERT and GPT.
Multimodal transformers can handle multiple types of input data, such as text and images
Vision transformers (ViT) A model that uses the transformer architecture for image classification
Switch Transformer A model that uses a mixture-of-experts (MoE) architecture to improve performance in language processing.

Recommended Reading: Attention Is All You Need. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Foundation Models (FMs) vs Large Language Models (LLMs) vs Small Language Models?(SLMs)

Foundation Models provide general-purpose, multi-modal capabilities and can be fine-tuned for a wide variety of tasks across different domains. Large Language Models (LLMs) specialize in advanced language-based tasks and excel at generating high-quality text for complex NLP tasks, though they require significant resources. Small Language Models (SLMs) are lightweight and optimized for specific tasks with minimal resource requirements, making them ideal for real-time or low-cost deployments. Below table explains more feature comparisons.

Feature Difference between FMs, LLMs,?SLMs

Models Output Optimization Techniques

The optimization techniques are versatile and can be applied to Foundation Models, Large Language Models, and Small Language Models, though the scale and complexity of the models dictate the specific goals and priorities of each technique. FMs focus on handling a broad range of tasks across different data types, requiring techniques that manage both scale and multi-modality. LLMs focus primarily on language tasks and benefit from text-centric optimization like RAG, prompt engineering, and knowledge distillation. SLMs aim for efficiency and are often optimized for low-resource environments through quantization, pruning, and distillation while still benefiting from techniques like fine-tuning and transfer learning. This flexibility in applying optimization techniques allows for broader use across different types of models, ensuring efficient deployment and improved output quality in both large-scale and small-scale AI systems.

We will discuss here two popular techniques, RAG and Prompt Engineering.

RAG (Retrieval Augmented Generation)

A powerful architecture that combines the benefits of retrieval-based methods and generative models. It enables models to generate more accurate, contextually relevant, and factually grounded outputs by retrieving external knowledge. By incorporating real-time retrieval into the generation process, RAG addresses key limitations of generative models, such as hallucination and outdated knowledge, making it highly suitable for knowledge-rich tasks like question-answering, content generation, and customer support. Below is a general RAG workflow.

RAG is a technique in AI where a large language model accesses new or recent data outside its training set to provide better answers and improved results. Below are the RAG concepts to be aware off.

It uses Vector database, a search engine or database that stores vectorized documents, enabling more accurate information retrieval for AI models. Example Qdrant: Software used for creating an in-memory vector database search, enabling efficient text retrieval and embedding storage.
Embeddings: Representations of text data as vectors in a high-dimensional space, allowing similarity comparisons between different pieces of text.
Sentence transformers: A tool to encode sentences into numerical representations (embeddings) that can be compared using cosine similarity or other distance metrics.
Cosine distance: A measure of similarity between two non-zero vectors in a multi-dimensional space, often used in text analysis and information retrieval.

Prompt Engineering

Prompt Engineering is the process of carefully designing and structuring the input (known as a prompt) given to a language model or generative AI system to achieve desired outputs. In the context of models like GPT-4o, LLaMA 3.2, or other large language models (LLMs), the way you phrase a prompt can dramatically influence the model’s response. This method is particularly useful because it doesn’t require modifying the model itself, but instead focuses on optimizing how input is provided to guide the model’s behavior.

In-Context Learning (ICL)

The prompt engineering can be enhanced by using In-Context Learning (ICL) as a technique to craft effective prompts. In-Context Learning (ICL) is a way to teach models new tasks without retraining, by providing input-output examples within the prompt itself. Prompt Engineering is a broader strategy that involves designing prompts effectively, and ICL is a specific approach within that framework that uses examples in the prompt to guide the model’s behavior.

Learning vs Inference

Learning is about training the model to recognize patterns in data, optimizing it through repeated exposure to training data, and adjusting the model’s internal parameters (e.g., weights) to reduce error. Whereas Inference is about applying the already trained model to new, unseen data to make predictions or decisions without altering the model’s internal parameters. In-Context Learning (ICL) doesn’t involve updating the model’s parameters, it is fundamentally an inference-time phenomenon, not a learning process in the traditional sense. Learning refers to the process of updating the model’s parameters through training. Inference refers to using the already trained model to make predictions on new, unseen data without updating its parameters.

The inferences of In-Context Learning (ICL) be done in below ways.

Zero-Shot Inference, the model is given no prior examples in the prompt and is expected to perform the task solely based on the instruction provided.
One-Shot Inference, the model is provided with a single example to infer the task, and it must generalize from this one example to generate a new output.
Few-Shot Inference, the model is given a few examples (input-output pairs) to infer the task and then generalizes to make predictions on new inputs.

The below diagram explains this with an example classifications and sentiments.

Prompt Engineering In-Context Learning?(ICL)

Engineering a Generative AI Application

Engineering a generative AI application involves below steps.

Identify the purpose of the project and set specific objectives for the AI model.
Choose between fine-tuning an existing model or training a new one based on the use case.
Tailor the model to your specific use case through prompt engineering, fine-tuning, RAG and evaluation with human feedback.
Deploy the model, optimize for performance, and integrate it into applications that leverage the generative AI system.

This is different for engineering a machine learning workflow, it takes below stages.

As part of data engineering, collect and preprocess raw data from various sources.
Then comes model engineering, that train and test machine learning models using the prepared data
Now with software engineering, integrate the trained models into larger software systems.
Finally with operations and governance, ensure the models are operationally sound, scalable, and compliant with policies.

These phases of engineering artifacts were derived based on below diagram that is defined in OWASP AI Security and Privacy Guide?: https://owasp.org/www-project-ai-security-and-privacy-guide/. Thanks to OWASP team for articulating this in bringing the phases together as AI Engineering.

AI Agents & Agentic Workflows

In AI, an agent is an autonomous entity (like systems or programs) that perceives its environment, makes decisions, and takes actions to achieve goals. These agents can range from simple, rule-based systems to advanced AI models leveraging machine learning, natural language processing (NLP), and deep learning techniques. Below diagram shows different types of AI Agents.

The key components of an AI agent are:

Perception: Agents can perceive their environment by gathering information from sensors, data inputs, or external sources.
Reasoning/Planning: Agents process the information they gather, formulating decisions or plans based on pre-set goals and learned patterns.
Action: Once decisions are made, the agent executes actions within the environment to achieve its goals.
Learning: Some agents can learn from their past experiences and improve their decision-making over time. Reinforcement learning is a common technique for agents to learn through trial and error.
Autonomy: Agents operate autonomously without continuous human intervention, though they may collaborate with humans or other agents when necessary.

Multi-model AI?Agents

Multi-Model AI Agents are AI systems that integrate multiple models, typically specialized in different tasks or domains, to operate together within a single framework or environment. These agents use different AI models to perform various functions, making decisions and taking actions based on inputs from multiple models. The key idea is that no single model is sufficient to handle all aspects of a complex task, so multiple models are used in collaboration to enhance the agent’s overall performance. Multi-model AI agents are particularly useful in scenarios where a range of capabilities such as natural language processing (NLP), computer vision, speech recognition, reasoning, and planning is required.

Agentic Workflows in AI

Agentic workflows describe task-oriented processes where AI agents perform and manage a sequence of actions autonomously to achieve an overarching goal. These workflows often involve multiple steps or tasks, where the agent determines what action to take at each stage based on the context or situation. Agentic workflows emphasize the self-directed nature of AI agents, meaning agents decide, plan, and act through a series of tasks, adjusting their behavior based on outcomes and environmental changes. These workflows are key in fields like robotics, autonomous systems, intelligent virtual assistants, and multi-agent systems.

The agentic workflows were used in several scenarios, below are few of them.

Autonomous vehicles like self-driving cars
RPA (Robotic Process Automation)
Virtual Personal Assistants (Apple Siri, Amazon Alexa) follows agentic workflows to interpret user commands (perception), plan how to execute those commands (reasoning), and provide responses or perform actions (action).

AI agents and agentic workflows are central to autonomous systems that perceive, reason, and act in their environment. From simple automation to complex, goal-driven decision-making, agents enable systems to operate independently and handle dynamic tasks across various domains. Agentic workflows bring efficiency, adaptability, and scalability to AI-driven applications, enabling them to handle diverse tasks ranging from customer service to autonomous driving.

Conclusion

Artificial intelligence (AI) has rapidly transformed from a niche academic pursuit into a central force reshaping industry worldwide. By understanding the layered structure of AI from foundational computer science principles to complex multi-model agents one can appreciate the technological advancements that underpin modern AI systems. These layers include machine learning, deep learning, generative AI, and large language models, each playing a critical role in expanding the capabilities of AI systems.

The transition from search engines to answer engines, and eventually to action engines like Siri and Tesla’s autopilot, demonstrates AI’s evolution in processing data and making autonomous decisions. Furthermore, agents and multi-model AI agents offer a framework for solving complex, multi-modal problems by combining different models and technologies. As we continue to advance, the integration of generative models, foundation models, and agentic workflows will be pivotal in creating more intelligent, adaptable systems. These systems can operate autonomously, making critical decisions, interacting with humans, and improving their performance over time. In essence, understanding these AI layers and workflows provides a comprehensive view of how AI is shaping the world today and how it will continue to do so in the future, with endless possibilities across sectors such as healthcare, autonomous vehicles, and intelligent virtual assistants.

Senthil Kumar Arumugam

Senior Technical Architect

3 个月

Good one Srini..

1 次回应

Hrijul Dey

AI Engineer| LLM Specialist| Python Developer|Tech Blogger

4 个月

Excited to dive into this practical guide comparing AI Agents & Pipelines! As a developer eager to explore LLM apps using OpenAI and CrewAI, I'm particularly keen on learning about coding examples and use cases. Can't wait to implement these insights in my projects. Thanks for sharing! https://www.artificialintelligenceupdate.com/ai-agents-vs-ai-pipelines-a-practical-guide/riju/ #learnmore #AI&U

Hrijul Dey

AI Engineer| LLM Specialist| Python Developer|Tech Blogger

4 个月

Unlocking AI potential! Eager to dive into this practical guide comparing #AIAgents & #AIPipelines. Coding examples and use cases for LLM apps with OpenAI and CrewAI? Count me in! https://www.artificialintelligenceupdate.com/ai-agents-vs-ai-pipelines-a-practical-guide/riju/ #learnmore #AI&U

Vijay Bhaskar Reddy B.

Director/Principal-Enterprise Architecture

4 个月

amazing article Srini

2 次回应

Aishwarya Nair

Student at Jain (Deemed-to-be University)

4 个月

This is great , also checkout AI CERTs AI+ Security certifications here https://store.aicerts.io/certifications/ and use the coupon code NEWUSER25 to get 25% OFF on AI CERTS' certifications. Don't miss out on this limited-time offer!

查看更多评论

要查看或添加评论，请登录

Unveiling the Layers of Artificial Intelligence: From Search Engines to Actionable Agents

Srinivas Rao Marri

Enterprise Architect | Cloud Security Strategist | Data Science Innovator | AI/ML & Gen AI Leader | Transforming Technology with Secure & Intelligent Solutions

The Layers of Intelligence

Foundation Models:

Language Models?(LM)?

Understanding the Machine Learning Workflow: Data, Algorithms, and Inference

Inference

Deep Learning

Generative AI

Generative Adversarial Networks?(GANs)

领英推荐

Variational Autoencoders (VAEs)

Transformers

Foundation Models (FMs) vs Large Language Models (LLMs) vs Small Language Models?(SLMs)

Models Output Optimization Techniques

RAG (Retrieval Augmented Generation)

Prompt Engineering

In-Context Learning (ICL)

Learning vs Inference

Engineering a Generative AI Application

AI Agents & Agentic Workflows

Multi-model AI?Agents

Agentic Workflows in AI

Conclusion

社区洞察

其他会员也浏览了

Myths Surrounding Artificial Intelligence

AI and ML Technologies: Everything You Need to Know

#30 -Behind The Cloud: Beyond the Frontier - What’s Next for AI Systems in Asset Management? (5/8)

Evolution of Generative AI: From Roots to New Horizons!

Top 8 Use Cases Of Generative AI You Need To Know!

A New Era Of The Internet

The 5 Biggest Computer Vision Trends In 2022

2. Back to the future(ish)

Everything You Need to Know About Image Recognition

Let’s Dive Just on Artificial Intelligence. (AI)