How to use AI - Part 3: Infrastructure, Operations, Security, and Agents
Infralovers GmbH
Hands-On Training, Mentoring & Consulting - EN & DE #Terraform #Mondoo #GitLab #Security #Automation #CICD #CloudNative
In the previous parts (part one and part two) of this blog series, we explored the challenges facing DevOps today, how AI can address them, and how to build powerful AI applications using frameworks like LangChain. Now, in this final part, we'll look at infrastructure options for hosting AI applications, optimising their performance, enabling guardrails for secure interaction, and using AI agents to automate complex workflows.
Hosting Options for AI Applications
When deploying AI applications, one of the first decisions is choosing the hosting option for our language model. There are two main approaches: cloud-based models and self-hosted models.
Cloud-Based Models
Cloud-based models are provided by companies like OpenAI, Anthropic, Azure AI, and even more cloud providers. These services are popular for their ease of use and scalability.
Advantages:
Considerations:
Self-Hosted Models
Self-hosting involves running the model on our own infrastructure using tools like Ollama, Llama.cpp, or LM Studio.
Advantages:
Challenges:
Decision Factors:
Optimizing Performance: Inference Speed
Inference speed refers to how quickly a model processes and generates responses. Several factors influence this:
Managing Multiple Models: Language Model Proxying
If the application requires different models for various tasks, language model proxying can help. This technique intelligently routes requests to specific models based on predefined factors or tasks.
LiteLLM: Simplifying Multi-Model Management
LiteLLM is an open-source framework designed to simplify working with multiple language models. It provides a standardized API to call over 100 different LLMs, such as OpenAI, Anthropic, Google Gemini, and Hugging Face.
Benefits:
Example: Using LiteLLM for Multi-Model Applications
Here’s how we can use LiteLLM to manage multiple LLMs in a single application:
1import streamlit as st
2from litellm import completion
3
4st.title("Multi-Model Chat")
5
6# LiteLLM Completion function to get model response
7def get_model_response(model_name: str, prompt: str) -> str:
8 response = completion(model=model_name, messages=[{"role": "user", "content": prompt}])
9 return response.choices[0].message.content
10
11# Streamlit UI Selection for model
12model_option = st.selectbox("Choose a language model:", ("gpt-3.5-turbo", "ollama/llama2", "gpt-4o"))
13
14# Chat history session state
15if 'chat_history' not in st.session_state:
16 st.session_state['chat_history'] = []
17
18user_input = st.text_input("You:")
19
20# Send user input and get model response
21if st.button("Send") and user_input:
22 st.session_state['chat_history'].append({"role": "user", "content": user_input})
23 with st.spinner("Thinking..."):
24 response = get_model_response(model_name=model_option, prompt=user_input)
25 st.session_state['chat_history'].append({"role": "model", "content": response})
26
27for message in reversed(st.session_state['chat_history']):
28 st.write(f"{message['role'].capitalize()}: {message['content']}")
In this example, we create a simple chat application that allows users to choose from different language models. The get_model_response function sends the user input to the selected model and returns the response. The chat_history session state retains the conversation history, and the Streamlit interface displays the chat messages in an interactive web UI.
Monitoring LLM Applications
Once our AI application is running, monitoring its performance is critical to ensure reliability and efficiency.
Key Metrics to Monitor
Performance Metrics:
User Engagement Metrics:
Observability:
Tools for Monitoring
Cost Considerations for LLM Applications
Running LLM applications can be expensive, so understanding and managing costs is essential.
Key Cost Factors
Cost per Token/Character:
Volume Discounts:
Additional Features:
Usage Limits/Quotas:
领英推荐
Best Practices
Security Challenges for Publicly Accessible LLM Applications
Public-facing LLM applications come with unique security challenges.
Potential Threats
Mitigation Strategies
Input Validation and Sanitization:
Generation Guardrails:
Output Validation:
Agents: Automating Complex Workflows
AI agents go beyond generating text — they perform tasks autonomously, enabling complex, multi-step workflows. Let’s explore their key characteristics and how to build them.
Key Characteristics of AI Agents
Key Components of an AI Agent
An AI agent consists of several components that work together to perform tasks effectively:
Tools
Memory
Planning
Execution and Feedback
LangGraph: A Visual Tool for Agent Workflows
LangGraph is also a component of the LangChain ecosystem that provides a "visual" interface for designing and managing agent workflows. It simplifies the process of connecting different components and orchestrating complex tasks.
Capabilities of LangGraph:
Example: Research and Summarization Agent
This agent performs web research and summarizes the results.
1from langchain.chat_models import ChatOpenAI
2from langchain.tools import DuckDuckGoSearchRun
3from langgraph.graph import StateGraph, END
4from typing import TypedDict, Annotated, List
5import operator
6
7# Initialize models and tools
8research_model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)
9summary_model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)
10search_tool = DuckDuckGoSearchRun()
11
12# Define graph state structure
13class ResearchState(TypedDict):
14 query: str
15 research_results: Annotated[List[str], operator.add]
16 summary: str
17
18graph = StateGraph(ResearchState)
19
20# Define agent functions
21def research_agent(state):
22 query = state["query"]
23 search_result = search_tool.run(query)
24 return {"research_results": [search_result]}
25
26def summarization_agent(state):
27 research_results = state["research_results"]
28 summary_prompt = f"Summarize the following research results:\n{research_results}"
29 summary = summary_model.predict(summary_prompt)
30 return {"summary": summary}
31
32# Build the graph
33graph.add_node("research", research_agent)
34graph.add_node("summarize", summarization_agent)
35graph.set_entry_point("research")
36graph.add_edge("research", "summarize")
37graph.add_edge("summarize", END)
38
39# Compile and run
40app = graph.compile()
41
42def run_research(query):
43 result = app.invoke({"query": query, "research_results": [], "summary": ""})
44 return result["summary"]
45
46# Example usage
47research_topic = "Latest advancements in AI"
48summary = run_research(research_topic)
49print(f"Research Summary on '{research_topic}':\n{summary}")
In this example, we define a research agent that performs a web search on a given topic and summarizes the results. The agent uses LangGraph to create a state machine that guides the workflow through research and summarization steps. The run_research function initiates the agent with a research query and returns the final summary.
This is what the visual representation of this simple agent workflow would look like:
Conclusion
In this final part of our blog series, we explored the critical aspects of deploying and managing AI applications for DevOps engineers. From hosting options and performance optimization to monitoring, cost management, and security, we covered the essential considerations for building reliable and efficient AI-powered systems. Additionally, we talked about AI agents, showcasing their ability to automate complex workflows and adapt to dynamic environments.
Key Takeaways from the Blog Series
This blog series explored how AI is transforming DevOps, from addressing challenges to building and deploying advanced AI applications. Let's quickly recap the key takeaways:
Part 1: The Building Blocks of DevOps AI
Part 2: Building AI Applications with LangChain
Part 3: Infrastructure, Operations, Security, and Agents
Thank you for joining us on this journey! Stay curious and feel free to reach out if you have any questions or need further guidance.
Again, if you're hungry for more details, make sure to check out our video-recordings of our latest AI Basics or Devops Engineers Workshop on YouTube: