Introducing Auto-Browser: An Agentic Web Browser and Automation Tool

Introducing Auto-Browser: An Agentic Web Browser and Automation Tool

Auto-Browser is an AI-powered web automation tool that makes complex web interactions simple through natural language commands. It combines the power of LLMs with browser automation to enable sophisticated multi-step workflows and data extraction.

The web, as we know it, has two distinct dimensions: the machine-centric side, where APIs, data, and automation thrive, and the human-centric side, designed for people to interact visually through logins, forms, buttons, and more. While automation has made strides in the AI-driven part, the human-facing web has remained largely inaccessible to agents—until now.

Auto-Browser bridges this gap by simplifying the complex world of human web interaction. Traditional browser automation tools are powerful but tailored for programmers, requiring a steep learning curve and detailed scripting. Auto-Browser eliminates these barriers, allowing anyone to describe tasks in plain English. Need to log in to a website, extract data, input information elsewhere, and generate a report? Auto-Browser makes multi-step workflows effortless.

For instance, you could ask it to log into Workday, fill out your timesheet, add project details, and submit—all in a single command. It dynamically handles navigation, session management, and templating, auto-generating the underlying code while you focus on what you need done. Open-sourced and accessible via CLI, Auto-Browser offers a simple installation process and immediate usability. For advanced users, it’s a versatile terminal companion; for everyone else, a web-based UI is on the horizon. Give it a spin—links below.

Let me know what you think!

Features

  • ?? Natural Language Control: Describe what you want to do in plain English
  • ?? Smart Element Detection: Automatically finds the right elements to interact with
  • ?? Structured Data Extraction: Extracts data in clean, organized formats
  • ?? Interactive Mode: Supports form filling, clicking, and complex interactions
  • ?? Report Generation: Creates well-formatted markdown reports
  • ?? Template System: Save and reuse site-specific configurations
  • ?? Easy to Use: Simple CLI interface with verbose output option

Introduction to Multi-Step Browser Automation

Auto-Browser revolutionizes web automation by allowing you to describe complex workflows in plain English. Instead of writing detailed scripts or learning complex APIs, you can simply describe what you want to accomplish:

# Multi-step workflow example
auto-browser easy --interactive "https://workday.com" "Login with username $USER_EMAIL, go to time sheet, enter 8 hours for today under project 'Development', add comment 'Sprint tasks', and submit for approval"        

Key Concepts

Natural Language Control

  • Describe actions in plain English
  • AI understands context and intent
  • Handles complex multi-step flows

Smart Navigation

  • Automatic element detection
  • Context-aware interactions
  • Dynamic content handling

State Management

  • Maintains session context
  • Handles authentication flows
  • Manages multi-page interactions

Template System

  • Reusable site configurations
  • Custom selectors and actions
  • Workflow templates

Installation

Docker Installation (Recommended)

Using Docker Compose (Easiest)

# Clone repository
git clone https://github.com/ruvnet/auto-browser.git
cd auto-browser

# Set up environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

# Run with default example
docker-compose up

# Run custom command
docker-compose run --rm auto-browser \
  auto-browser easy "https://example.com" "Extract data"

# Run interactive mode
docker-compose run --rm auto-browser \
  auto-browser easy --interactive "https://example.com" "Fill out form"

# Run with custom model
LLM_MODEL=gpt-4 docker-compose up

# Run specific demo
docker-compose run --rm auto-browser \
  ./demos/07_timesheet_automation.sh        

Using Docker Directly

# Build Docker image
docker build -t auto-browser .

# Run basic example
docker run -e OPENAI_API_KEY=your_key auto-browser \
  auto-browser easy "https://www.google.com/finance" "Get AAPL stock price"

# Run with output volume
docker run -v $(pwd)/output:/app/output -e OPENAI_API_KEY=your_key auto-browser \
  auto-browser easy -v "https://www.google.com/finance" "Get AAPL stock price"

# Run interactive mode
docker run -e OPENAI_API_KEY=your_key auto-browser \
  auto-browser easy --interactive "https://example.com" "Fill out contact form"        

Quick Install (Linux/macOS)

# Download and run install script
curl -sSL https://raw.githubusercontent.com/ruvnet/auto-browser/main/install.sh | bash        

Manual Installation

  1. System Requirements

# Install Node.js (if not present)
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
sudo apt-get install -y nodejs

# Install Playwright system dependencies
npx playwright install-deps        

  1. Clone and Setup

# Clone repository
git clone https://github.com/ruvnet/auto-browser.git
cd auto-browser

# Install Python package
pip install -e .

# Install Playwright browsers
playwright install

# Set up environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY        

Docker Usage Examples

Basic Operations

# Run with specific URL
docker-compose run --rm auto-browser \
  auto-browser easy "https://example.com" "Extract main content"

# Run with verbose output
docker-compose run --rm auto-browser \
  auto-browser easy -v "https://example.com" "Extract data"

# Run with report generation
docker-compose run --rm auto-browser \
  auto-browser easy -v -r "https://example.com" "Generate report"        

Advanced Workflows

# Run timesheet automation
docker-compose run --rm auto-browser \
  auto-browser easy --interactive "https://workday.com" \
  "Fill timesheet for this week"

# Run social media campaign
docker-compose run --rm auto-browser \
  auto-browser easy --interactive "https://buffer.com" \
  "Create and schedule posts"

# Run research workflow
docker-compose run --rm auto-browser \
  auto-browser easy -v -r "https://scholar.google.com" \
  "Research LLM papers"        

Template Management

# Create template
docker-compose run --rm auto-browser \
  auto-browser create-template "https://example.com" \
  --name example --description "Example template"

# List templates
docker-compose run --rm auto-browser \
  auto-browser list-sites

# Use template
docker-compose run --rm auto-browser \
  auto-browser easy --site example "https://example.com" \
  "Extract data"        

Installation Notes

Docker Installation (Recommended)

  • Easiest setup with all dependencies included
  • Docker Compose provides simple management
  • Environment variables handled automatically
  • Output directory mounted automatically
  • Supports all features and demos
  • Cross-platform compatibility

Manual Installation

  • Requires Python 3.8 or higher
  • Node.js LTS version recommended
  • System dependencies handled by install script
  • Playwright browsers installed automatically
  • Package manager locks handled gracefully

Advanced Workflow Examples

1. Time Management

# Complete timesheet workflow
auto-browser easy --interactive "https://workday.com" "Fill out timesheet for the week:
- Monday: 8h Development
- Tuesday: 6h Development, 2h Meetings
- Wednesday: 7h Development, 1h Documentation
Then submit for approval"        

2. Social Media Management

# Cross-platform posting
auto-browser easy --interactive "https://buffer.com" "Create posts about auto-browser:
1. Twitter: Announce new release
2. LinkedIn: Technical deep-dive
3. Schedule both for optimal times"        

3. Research Automation

# Academic research workflow
auto-browser easy -v -r "https://scholar.google.com" "Find papers about LLM automation:
1. Get top 10 most cited
2. Extract methodologies
3. Download PDFs
4. Create bibliography"        

4. Project Setup

# Complete project initialization
auto-browser easy --interactive "https://github.com" "Create new project:
1. Initialize repository
2. Set up CI/CD
3. Configure team access
4. Create documentation"        

Demo Workflows

Auto-Browser includes comprehensive demos showcasing various automation capabilities:

Basic Demos

  1. Basic Setup: Simple data extraction and templates
  2. Simple Search: Search functionality and data parsing
  3. Multi-Tab: Working with multiple pages
  4. Form Interaction: Form filling and validation
  5. Parallel Tasks: Complex data extraction
  6. Clinical Trials: Specialized data extraction

Advanced Workflows

  1. Timesheet Automation: Complete timesheet management
  2. Social Media Campaign: Multi-platform content management
  3. Research Workflow: Academic research automation
  4. Project Management: Project setup and coordination

Try the demos:

# Make demos executable
chmod +x demos/*.sh

# Run specific demo
./demos/07_timesheet_automation.sh        

Configuration

Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key (required)
  • LLM_MODEL: Model to use (default: gpt-4o-mini)
  • BROWSER_HEADLESS: Run browser in headless mode (default: true)

Template Configuration

Templates are stored in YAML format:

sites:
  finance:
    name: finance
    description: Extract stock information
    url_pattern: https://www.google.com/finance
    selectors:
      stock_price:
        css: .YMlKec.fxKbKc
        description: Current stock price        

Output Files

Results are saved with unique filenames including:

  • Domain (e.g., google_com)
  • Path (e.g., finance)
  • Timestamp (YYYYMMDD_HHMMSS)
  • .md extension

Example: google_com_finance_20240120_123456.md

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Created by rUv (cause he could)

Repository: https://github.com/ruvnet/auto-browser

George Miloradovich

Head of Content at Latenode | Driving Engaging Storytelling & Scalable Growth for a Low-Code Automation Platform | I'm NOT in charge of link exchange. Head over to [email protected]

22 分钟前

Reuven, your Auto-Browser tool sounds like a breakthrough in bridging the gap between human and machine web interactions. Its natural language control and structured data extraction features are particularly compelling. How do you see this impacting the future of web automation for non-programmers?

回复
回复
Aly Ghoneim

Microsoft Modern Work Expert | Microsoft Copilot

1 个月

Ive built this , it works nicely.

Dave Brace

Strong Product Mgmt leader | Curious innovative PM team leader | Technical product entrepreneur, GTM Engineer, Roadmap, Strategy hands-on | Gen AI ML LLM RAG NLQ Data expert

1 个月

So your saying that this is a Conversational AI replacement for ‘Beautiful Soup’ and all the esoteric scripting & Xpath knowledge/work coding needed to control BS4… ?

Auto-Browser sounds like a trend setter for simplifying web automation. The ability to harness natural language for complex multi-step tasks is impressive and will undoubtedly lower the barrier for many users. I'm excited to see how this evolves and enhances productivity across various sectors. Great share, Reuven Cohen!

回复

要查看或添加评论,请登录

Cohen Reuven的更多文章

社区洞察

其他会员也浏览了