登录查看更多内容

Introducing Agentic DevOps

Cohen Reuven

发明家“IaaS”，天使投资人，成长黑客，导师

发布日期: 2025年3月21日

A fully autonomous, AI-powered DevOps platform for managing cloud infrastructure across multiple providers, with AWS and GitHub integration, powered by OpenAI's Agents SDK.

Agentic DevOps represents the next step in infrastructure management, a fully autonomous system that doesn't just assist with DevOps tasks but can independently plan, execute, and optimize your entire infrastructure lifecycle.

Built on the foundation of OpenAI's Agents SDK, this platform goes beyond traditional automation by incorporating true AI-driven decision-making capabilities.

? Try it Here: agentic-devops.fly.dev

?? Github Repo: https://github.com/agenticsorg/devops

?? Support Agentics Foundation: https://agentics.org/memberships

The system can autonomously:

Provision and configure infrastructure based on high-level requirements
Monitor and detect anomalies across your environment
Self-heal infrastructure issues without human intervention
Optimize resource allocation and costs continuously
Deploy applications with intelligent rollout strategies
Manage complex multi-environment deployments
Learn from past operations to improve future performance

Agentic DevOps serves as an intelligent co-pilot for your infrastructure, or even as a fully autonomous operator, understanding complex requirements, executing precise commands, adapting to changing conditions, and providing valuable insights across your entire DevOps workflow. Whether you're managing AWS resources, working with GitHub repositories, or orchestrating complex deployments, Agentic DevOps provides a unified, intelligent interface that simplifies these tasks while maintaining security and best practices.

Overview

Agentic DevOps is designed to transform cloud infrastructure management through autonomous operation and intelligent decision-making. It provides a consistent interface for working with various cloud providers and services while adding a layer of AI-driven automation that can operate independently when needed.

Key benefits include:

Autonomous Operation: Deploy infrastructure and applications with minimal human oversight
Self-Healing Systems: Automatically detect and remediate issues before they impact users
Continuous Optimization: Intelligently adjust resources based on actual usage patterns
Reduced Complexity: Manage multiple cloud services through a single, intelligent interface
Increased Efficiency: Eliminate repetitive tasks through true autonomous automation
Enhanced Security: Built-in security guardrails with proactive vulnerability detection
Natural Language Control: Interact with your infrastructure using plain English
Extensibility: Easily add support for new services and providers
Comprehensive Documentation: Detailed guides and examples for all features

Features & Core Capabilities

Autonomous Infrastructure Management: AI-driven management of cloud resources

Self-provisioning infrastructure based on application requirements
Automatic scaling based on real-time demand
Intelligent resource optimization for cost efficiency
Anomaly detection and autonomous remediation

AI-Powered Assistance: Leverage OpenAI's capabilities

Natural language infrastructure commands
Automated troubleshooting and diagnostics
Intelligent resource optimization recommendations
Security posture analysis
Cost optimization suggestions

Multi-Cloud Support: Consistent interface across providers

AWS (primary support)
Azure (planned)
Google Cloud (planned)
DigitalOcean (planned)

Security and Compliance:

Secure credential management with keyring integration
Least privilege access patterns
Compliance checking for industry standards
Security best practice enforcement
Audit logging and reporting

Observability and Monitoring:

Resource health monitoring
Performance metrics collection
Cost tracking and optimization
Anomaly detection
Custom alerting rules

Deployment Automation:

CI/CD pipeline integration
Blue/green deployment strategies
Canary releases
Rollback capabilities
Deployment verification

Disaster Recovery:

Automated backup management
Cross-region replication
Recovery time objective (RTO) optimization
Disaster recovery testing
Failover automation

Installation

# Clone the repository
git clone https://github.com/agenticsorg/devops.git
cd devops

# Install dependencies
pip install -r requirements.txt

# Configure credentials
cp env.example .env
# Edit .env with your AWS, GitHub, and OpenAI credentials

Configuration

The DevOps Agent supports multiple configuration methods:

Environment Variables: Set credentials and configuration in your environment
Configuration File: Use YAML or JSON configuration files
Credential Store: Securely store credentials in your system's keyring
AWS Profiles: Leverage existing AWS CLI profiles

Example configuration file (config.yaml):

aws:
  region: us-west-2
  profile: devops-agent
  default_vpc: vpc-1234567890abcdef0
  
github:
  organization: your-organization
  default_branch: main
  
openai:
  model: gpt-4o
  temperature: 0.2
  
logging:
  level: INFO
  file: devops-agent.log

Usage

Python API

from devops_agent.aws.ec2 import EC2Service
from devops_agent.aws.s3 import S3Service
from devops_agent.github import GitHubService
from devops_agent.core.context import DevOpsContext

# Initialize context
context = DevOpsContext(
    user_id="user123",
    aws_region="us-west-2",
    github_org="your-organization"
)

# Initialize services
ec2 = EC2Service(context=context)
s3 = S3Service(context=context)
github = GitHubService(context=context)

# List EC2 instances
instances = ec2.list_instances(filters=[{"Name": "instance-state-name", "Values": ["running"]}])
print(f"Found {len(instances)} running EC2 instances")

# Create S3 bucket with encryption
bucket = s3.create_bucket(
    name="my-secure-bucket",
    region="us-west-2",
    encryption={"algorithm": "AES256"},
    versioning=True
)

# Deploy from GitHub to EC2
ec2.deploy_from_github(
    instance_id="i-1234567890abcdef0",
    repository="your-org/your-repo",
    branch="main",
    deploy_path="/var/www/html",
    setup_script="scripts/setup.sh",
    environment_variables={"ENV": "production"}
)

CLI Usage

The DevOps Agent provides a powerful command-line interface with rich output formatting:

# List EC2 instances with filtering and formatting
devops ec2 list-instances --state running --region us-west-2 --output table

# Create an EC2 instance with detailed configuration
devops ec2 create-instance \
  --name "web-server" \
  --type t3.medium \
  --ami-id ami-0c55b159cbfafe1f0 \
  --subnet-id subnet-1234567890abcdef0 \
  --security-group-ids sg-1234567890abcdef0 \
  --key-name my-key \
  --user-data-file startup-script.sh \
  --tags "Environment=Production,Project=Website" \
  --wait

# Get GitHub repository details with specific information
devops github get-repo your-org/your-repo --output json

# Create a GitHub issue with labels and assignees
devops github create-issue \
  --repo your-org/your-repo \
  --title "Update dependencies" \
  --body "We need to update all dependencies to the latest versions." \
  --labels "maintenance,dependencies" \
  --assignees "username1,username2"

# Deploy from GitHub to EC2 with advanced options
devops deploy github-to-ec2 \
  --repo your-org/your-repo \
  --instance-id i-1234567890abcdef0 \
  --branch develop \
  --path /var/www/html \
  --setup-script scripts/setup.sh \
  --env-file .env.production \
  --post-deploy-hook scripts/notify.sh

OpenAI Agents Integration

The DevOps Agent leverages OpenAI's Agents SDK to provide powerful AI-driven infrastructure management capabilities. This integration enables natural language interactions with your cloud resources, intelligent automation, and context-aware assistance.

Key Benefits of OpenAI Agents Integration

Natural Language Infrastructure Control: Manage your infrastructure using plain English commands
Context-Aware Operations: The agent maintains context across interactions for more coherent workflows
Intelligent Automation: Automate complex tasks with AI-driven decision making
Adaptive Learning: Improve over time based on your specific infrastructure patterns
Multi-Step Reasoning: Break down complex operations into logical steps
Guardrails and Safety: Built-in safeguards to prevent destructive operations

Agent Architecture

The DevOps Agent uses a modular architecture with specialized agents for different domains:

EC2 Agent: Specializes in EC2 instance management
S3 Agent: Focuses on S3 bucket operations
GitHub Agent: Handles GitHub repository management
Deployment Agent: Orchestrates deployment workflows
Orchestrator Agent: Coordinates between specialized agents

Each agent is equipped with domain-specific tools and knowledge, allowing for deep expertise in their respective areas while maintaining a unified interface for the user.

Basic Usage Example

from agents import Agent, Runner
from devops_agent.agents.tools import (
    list_ec2_instances,
    start_ec2_instances,
    stop_ec2_instances,
    create_ec2_instance
)
from devops_agent.core.context import DevOpsContext

# Create a context with user information
context = DevOpsContext(
    user_id="user123",
    aws_region="us-west-2",
    github_org="your-organization"
)

# Create an EC2-focused agent
ec2_agent = Agent(
    name="EC2 Assistant",
    instructions="""
    You are an EC2 management assistant that helps users manage their AWS EC2 instances.
    You can list, start, stop, and create EC2 instances based on user requests.
    Always confirm important actions before executing them and provide clear explanations.
    """,
    tools=[
        list_ec2_instances,
        start_ec2_instances,
        stop_ec2_instances,
        create_ec2_instance
    ],
    model="gpt-4o"
)

# Run the agent with a user query
result = Runner.run_sync(
    ec2_agent,
    "I need to launch 3 t2.micro instances for a web application in us-west-2. They should have the tag 'Project=WebApp'.",
    context=context
)

print(result.final_output)

Advanced Agent Orchestration

For more complex workflows, you can use agent orchestration to coordinate between specialized agents:

from agents import Agent, Runner, Handoff
from devops_agent.agents.tools import (
    # EC2 tools
    list_ec2_instances,
    start_ec2_instances,
    stop_ec2_instances,
    create_ec2_instance,
    # S3 tools
    list_s3_buckets,
    create_s3_bucket,
    # GitHub tools
    get_github_repository,
    list_github_issues,
    create_github_issue,
    # Deployment tools
    deploy_to_ec2
)

# Create specialized agents
ec2_agent = Agent(
    name="EC2 Agent",
    instructions="You are an EC2 management specialist...",
    tools=[list_ec2_instances, start_ec2_instances, stop_ec2_instances, create_ec2_instance],
    model="gpt-4o-mini"
)

s3_agent = Agent(
    name="S3 Agent",
    instructions="You are an S3 management specialist...",
    tools=[list_s3_buckets, create_s3_bucket],
    model="gpt-4o-mini"
)

github_agent = Agent(
    name="GitHub Agent",
    instructions="You are a GitHub management specialist...",
    tools=[get_github_repository, list_github_issues, create_github_issue],
    model="gpt-4o-mini"
)

deployment_agent = Agent(
    name="Deployment Agent",
    instructions="You are a deployment specialist...",
    tools=[deploy_to_ec2],
    model="gpt-4o-mini"
)

# Create an orchestrator agent that can delegate to specialized agents
orchestrator = Agent(
    name="DevOps Orchestrator",
    instructions="""
    You are a DevOps orchestrator that helps users manage their cloud infrastructure and code repositories.
    You can delegate tasks to specialized agents for EC2, S3, GitHub, and deployments.
    Determine which specialized agent is best suited for each user request and hand off accordingly.
    """,
    handoffs=[
        Handoff(agent=ec2_agent, description="Handles EC2 instance management tasks"),
        Handoff(agent=s3_agent, description="Handles S3 bucket operations"),
        Handoff(agent=github_agent, description="Handles GitHub repository management"),
        Handoff(agent=deployment_agent, description="Handles deployment workflows")
    ],
    model="gpt-4o-mini"
)

# Run the orchestrator with a complex query
result = Runner.run_sync(
    orchestrator,
    """
    I need to set up a new web application deployment:
    1. Create 2 t2.micro EC2 instances with the tag 'Project=WebApp'
    2. Create an S3 bucket for static assets with versioning enabled
    3. Clone our 'company/webapp' GitHub repository to the EC2 instances
    4. Create a GitHub issue to track this deployment
    """,
    context=context
)

print(result.final_output)

Asynchronous Agent Execution

For high-performance applications, you can use asynchronous execution:

import asyncio
from agents import Runner

async def run_agent_async():
    result = await Runner.run(
        ec2_agent,
        "List all my EC2 instances in us-west-2 and show their status",
        context=context
    )
    return result.final_output

# Run the agent asynchronously
response = asyncio.run(run_agent_async())
print(response)

Security Guardrails

The DevOps Agent includes built-in security guardrails to prevent destructive operations:

from devops_agent.core.guardrails import (
    security_guardrail,
    sensitive_info_guardrail
)

# Apply security guardrail to check for potentially harmful operations
@security_guardrail
def perform_operation(operation_details):
    # Implementation
    pass

# Apply sensitive information guardrail to prevent leaking credentials
@sensitive_info_guardrail
def generate_response(user_query, system_data):
    # Implementation
    pass

Tracing and Debugging

For debugging and monitoring agent behavior, you can use the tracing functionality:

from agents.tracing import set_tracing_enabled, get_trace

# Enable tracing
set_tracing_enabled(True)

# Run the agent
result = Runner.run_sync(ec2_agent, "List my EC2 instances", context=context)

# Get the trace for analysis
trace = get_trace()
print(f"Agent took {len(trace.steps)} steps to complete the task")
for step in trace.steps:
    print(f"Step: {step.type}, Duration: {step.duration}ms")

Advanced Configuration

Credential Management

The DevOps Agent provides multiple secure options for credential management:

Environment Variables: Traditional approach using environment variables
AWS Profiles: Leverage AWS CLI profiles for credential management
Keyring Integration: Store credentials securely in your system's keyring
IAM Roles: Use IAM roles for EC2 instances or Lambda functions
Secrets Manager: Retrieve credentials from AWS Secrets Manager or similar services

Example keyring setup:

from devops_agent.core.credentials import CredentialManager

# Store credentials securely
cred_manager = CredentialManager()
cred_manager.store_aws_credentials(
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    region="us-west-2",
    profile_name="production"
)

cred_manager.store_github_credentials(
    token="YOUR_GITHUB_TOKEN",
    username="your-username"
)

# Retrieve credentials securely
aws_creds = cred_manager.get_aws_credentials(profile_name="production")
github_creds = cred_manager.get_github_credentials()

Error Handling and Logging

The DevOps Agent provides comprehensive error handling with actionable suggestions:

from devops_agent.core.logging import setup_logging
from devops_agent.aws.base import AWSServiceError, ResourceNotFoundError

# Setup logging
logger = setup_logging(level="INFO", log_file="devops-agent.log")

try:
    # Attempt to perform an operation
    ec2.start_instance(instance_id="i-nonexistentid")
except ResourceNotFoundError as e:
    # Handle specific error with context
    logger.error(f"Could not find instance: {e}")
    logger.info(f"Suggestion: {e.suggestion}")
    # Take remedial action
except AWSServiceError as e:
    # Handle general AWS errors
    logger.error(f"AWS operation failed: {e}")
    logger.info(f"Suggestion: {e.suggestion}")

Extensibility

The DevOps Agent is designed to be easily extended with new services and providers:

Service Modules: Add new AWS services by creating new service modules
Cloud Providers: Implement new cloud providers by following the provider interface
Custom Tools: Create custom tools for specific workflows
Plugins: Develop plugins to extend functionality

Example of creating a custom service:

from devops_agent.aws.base import AWSBaseService

class CustomService(AWSBaseService):
    """Custom service implementation."""
    
    SERVICE_NAME = "custom-service"
    
    def __init__(self, credentials=None, region=None):
        super().__init__(credentials, region)
        # Initialize service-specific resources
        
    def custom_operation(self, param1, param2):
        """Implement custom operation."""
        try:
            # Implement operation logic
            result = self._client.some_operation(
                Param1=param1,
                Param2=param2
            )
            return self._format_response(result)
        except Exception as e:
            # Handle and transform errors
            self.handle_error(e, "custom_operation")

Creating Custom Agent Tools

You can extend the agent's capabilities by creating custom tools:

from agents import function_tool
from pydantic import BaseModel, Field
from devops_agent.core.context import DevOpsContext, RunContextWrapper

# Define the input schema for your tool
class CustomOperationInput(BaseModel):
    resource_id: str = Field(..., description="The ID of the resource to operate on")
    operation_type: str = Field(..., description="The type of operation to perform")
    parameters: dict = Field(default={}, description="Additional parameters for the operation")

# Create a function tool
@function_tool()
async def custom_operation(
    wrapper: RunContextWrapper[DevOpsContext],
    input_data: CustomOperationInput
) -> dict:
    """
    Perform a custom operation on a specified resource.
    
    Args:
        resource_id: The ID of the resource to operate on
        operation_type: The type of operation to perform (e.g., "analyze", "optimize", "backup")
        parameters: Additional parameters specific to the operation type
        
    Returns:
        A dictionary containing the operation results
    """
    # Access the context
    context = wrapper.context
    
    # Implement your custom logic
    result = {
        "resource_id": input_data.resource_id,
        "operation_type": input_data.operation_type,
        "status": "completed",
        "details": {
            "timestamp": "2023-01-01T00:00:00Z",
            "user": context.user_id,
            "region": context.aws_region,
            "parameters": input_data.parameters
        }
    }
    
    return result

Testing

The DevOps Agent includes comprehensive testing capabilities:

# Run all tests
python run_all_tests.py

# Run specific test categories
python -m pytest tests/aws/
python -m pytest tests/github/
python -m pytest tests/test_cli.py

# Run tests with specific markers
python -m pytest -m "aws"
python -m pytest -m "integration"
python -m pytest -m "unit"

Fungibility

14,533 位关注者

Pradeep Sanyal

32 分钟前

Kudos for pushing the boundaries with Agentic DevOps. Building an AI-native system that manages the entire cloud lifecycle is no small feat. The concept of intent-driven provisioning and self-healing infrastructure is a strong step toward autonomous operations. That said, the real challenge will be in its decision-making. Infrastructure management often requires nuanced judgment - balancing performance, cost, security, and compliance. While AI agents can optimize for metrics and respond to anomalies, how well can they assess trade-offs in ambiguous scenarios? For example, deciding whether to delay a critical deployment to avoid potential downtime, or prioritizing security over speed when vulnerabilities are detected. Human engineers bring experience, intuition, and context that algorithms may struggle to replicate. Without robust guardrails and escalation mechanisms, autonomous decisions could introduce unintended risks. It’ll be interesting to see how Agentic DevOps manages these complexities and whether it truly complements human oversight or falls short in critical moments. Would love to hear from anyone who’ve tested it - how well does it handle those high-stakes decisions in real-world scenarios?

Adrian Hornsby

I help software organizations improve resilience and achieve operational excellence | Former Principal Engineer at AWS

3 小时前

I wrote about AI meta-operator a month ago, but it happened faster than I expected. Congrats Reuven Cohen I am curious to hear what you think about nondeterministic behavior, hidden complexity, etc. Great stuff! https://medium.com/@adhorn/when-ai-makes-the-call-b10b094e1b8f

Thomas Smith

*OPS / CISSP

7 小时前

AWESOME!

Marc Landy

8 小时前

Hi Reuven, how does Agentic DevOps compare to products like Humanitec which aim to standardise & automate Internal Developer Platforms (IDPs) ?

S Shridhar

14 小时前

Awesome product. Does it also work with GCP?

查看更多评论

要查看或添加评论，请登录

Cohen Reuven的更多文章

My Settings: Agentic Coding with Roo Code

2025年3月21日

My Settings: Agentic Coding with Roo Code

Setting up Roo Code for agentic systems is incredibly effective. I often get asked how I achieve such seamless…

8 条评论
Agentic Security Scanner: How-To Build Complex Ai SaaS Applications using Ai (Cursor/Roo Code/Cline)

2025年3月18日

Agentic Security Scanner: How-To Build Complex Ai SaaS Applications using Ai (Cursor/Roo Code/Cline)

How I built a complete SaaS security App in about 3 hours, completely using Ai. Total cost around $30.

21 条评论
Introducing ?? Agentic MCP: An OpenAI Agents API MCP Server

2025年3月12日

Introducing ?? Agentic MCP: An OpenAI Agents API MCP Server

Using the new Agentics MCP for OpenAi Agents Service, I deployed 500 agents, at once. Not hypothetical, real agents, in…

95 条评论
Introducing Declarative Self-improving TypeScript. (DSPy.ts): Build & Run powerful Free AI applications right in your web browser.

2025年2月22日

Introducing Declarative Self-improving TypeScript. (DSPy.ts): Build & Run powerful Free AI applications right in your web browser.

DSPy.ts ?? Declarative Self-improving TypeScript (DSPy.

16 条评论
Introducing Meta Agents: An agent that creates agents.

2025年2月21日

Introducing Meta Agents: An agent that creates agents.

Introducing Meta Agents: An agent that creates agents. Instead of manually scripting every new AI assistant, the Meta…

66 条评论
Introducing Quantum Agentics: A New Way to Think About AI Tasks & Decision-Making

2025年2月17日

Introducing Quantum Agentics: A New Way to Think About AI Tasks & Decision-Making

What if you could instantly see all the best solution to a complex reasoning problems all at once? That's the problem…

35 条评论
Introducing Agentic_Robots.txt - Automating Agent Access to Websites

2025年2月14日

Introducing Agentic_Robots.txt - Automating Agent Access to Websites

Empowering the Next Generation of Web Automation Agentic_Robots.txt improves how autonomous agents interact with web…

14 条评论
Ai Hacker League Live Coding: AI Agent Development Tutorial using Crew Ai and Aider.

2025年1月23日

Ai Hacker League Live Coding: AI Agent Development Tutorial using Crew Ai and Aider.

AI Hacker League is a vibrant community of developers, researchers, and enthusiasts who come together to explore and…

5 条评论
Introducing Auto-Browser: An Agentic Web Browser and Automation Tool

2025年1月20日

Introducing Auto-Browser: An Agentic Web Browser and Automation Tool

Auto-Browser is an AI-powered web automation tool that makes complex web interactions simple through natural language…

18 条评论
Introducing Ai Code Calculator: Comparing the costs of Code Agents vs Human Software Engineering (96% cheaper on average)

2025年1月12日

Introducing Ai Code Calculator: Comparing the costs of Code Agents vs Human Software Engineering (96% cheaper on average)

When I couldn’t find a tool that addressed the operational costs of code agents versus hiring a software engineer in…

23 条评论

See all articles