登录查看更多内容

Introducing Auto-Browser: An Agentic Web Browser and Automation Tool

Cohen Reuven

发明家“IaaS”，天使投资人，成长黑客，导师

发布日期: 2025年1月20日

Auto-Browser is an AI-powered web automation tool that makes complex web interactions simple through natural language commands. It combines the power of LLMs with browser automation to enable sophisticated multi-step workflows and data extraction.

The web, as we know it, has two distinct dimensions: the machine-centric side, where APIs, data, and automation thrive, and the human-centric side, designed for people to interact visually through logins, forms, buttons, and more. While automation has made strides in the AI-driven part, the human-facing web has remained largely inaccessible to agents—until now.

Auto-Browser bridges this gap by simplifying the complex world of human web interaction. Traditional browser automation tools are powerful but tailored for programmers, requiring a steep learning curve and detailed scripting. Auto-Browser eliminates these barriers, allowing anyone to describe tasks in plain English. Need to log in to a website, extract data, input information elsewhere, and generate a report? Auto-Browser makes multi-step workflows effortless.

For instance, you could ask it to log into Workday, fill out your timesheet, add project details, and submit—all in a single command. It dynamically handles navigation, session management, and templating, auto-generating the underlying code while you focus on what you need done. Open-sourced and accessible via CLI, Auto-Browser offers a simple installation process and immediate usability. For advanced users, it’s a versatile terminal companion; for everyone else, a web-based UI is on the horizon. Give it a spin—links below.

Let me know what you think!

Features

?? Natural Language Control: Describe what you want to do in plain English
?? Smart Element Detection: Automatically finds the right elements to interact with
?? Structured Data Extraction: Extracts data in clean, organized formats
?? Interactive Mode: Supports form filling, clicking, and complex interactions
?? Report Generation: Creates well-formatted markdown reports
?? Template System: Save and reuse site-specific configurations
?? Easy to Use: Simple CLI interface with verbose output option

Introduction to Multi-Step Browser Automation

Auto-Browser revolutionizes web automation by allowing you to describe complex workflows in plain English. Instead of writing detailed scripts or learning complex APIs, you can simply describe what you want to accomplish:

# Multi-step workflow example
auto-browser easy --interactive "https://workday.com" "Login with username $USER_EMAIL, go to time sheet, enter 8 hours for today under project 'Development', add comment 'Sprint tasks', and submit for approval"

Key Concepts

Natural Language Control

Describe actions in plain English
AI understands context and intent
Handles complex multi-step flows

Smart Navigation

Automatic element detection
Context-aware interactions
Dynamic content handling

State Management

Maintains session context
Handles authentication flows
Manages multi-page interactions

Template System

Reusable site configurations
Custom selectors and actions
Workflow templates

Installation

Docker Installation (Recommended)

Using Docker Compose (Easiest)

# Clone repository
git clone https://github.com/ruvnet/auto-browser.git
cd auto-browser

# Set up environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

# Run with default example
docker-compose up

# Run custom command
docker-compose run --rm auto-browser \
  auto-browser easy "https://example.com" "Extract data"

# Run interactive mode
docker-compose run --rm auto-browser \
  auto-browser easy --interactive "https://example.com" "Fill out form"

# Run with custom model
LLM_MODEL=gpt-4 docker-compose up

# Run specific demo
docker-compose run --rm auto-browser \
  ./demos/07_timesheet_automation.sh

Using Docker Directly

# Build Docker image
docker build -t auto-browser .

# Run basic example
docker run -e OPENAI_API_KEY=your_key auto-browser \
  auto-browser easy "https://www.google.com/finance" "Get AAPL stock price"

# Run with output volume
docker run -v $(pwd)/output:/app/output -e OPENAI_API_KEY=your_key auto-browser \
  auto-browser easy -v "https://www.google.com/finance" "Get AAPL stock price"

# Run interactive mode
docker run -e OPENAI_API_KEY=your_key auto-browser \
  auto-browser easy --interactive "https://example.com" "Fill out contact form"

Quick Install (Linux/macOS)

# Download and run install script
curl -sSL https://raw.githubusercontent.com/ruvnet/auto-browser/main/install.sh | bash

Manual Installation

System Requirements

# Install Node.js (if not present)
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
sudo apt-get install -y nodejs

# Install Playwright system dependencies
npx playwright install-deps

Clone and Setup

# Clone repository
git clone https://github.com/ruvnet/auto-browser.git
cd auto-browser

# Install Python package
pip install -e .

# Install Playwright browsers
playwright install

# Set up environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Docker Usage Examples

Basic Operations

# Run with specific URL
docker-compose run --rm auto-browser \
  auto-browser easy "https://example.com" "Extract main content"

# Run with verbose output
docker-compose run --rm auto-browser \
  auto-browser easy -v "https://example.com" "Extract data"

# Run with report generation
docker-compose run --rm auto-browser \
  auto-browser easy -v -r "https://example.com" "Generate report"

Advanced Workflows

# Run timesheet automation
docker-compose run --rm auto-browser \
  auto-browser easy --interactive "https://workday.com" \
  "Fill timesheet for this week"

# Run social media campaign
docker-compose run --rm auto-browser \
  auto-browser easy --interactive "https://buffer.com" \
  "Create and schedule posts"

# Run research workflow
docker-compose run --rm auto-browser \
  auto-browser easy -v -r "https://scholar.google.com" \
  "Research LLM papers"

Template Management

领英推荐

The Future of Web Scraping for MVP Development -…

Whizpool 7 个月前

Do websites go away with AI agents?

Theory Ventures 9 个月前

2023 Web Development Trends You Must Know About

Chapter247 Infotech 1 年前

# Create template
docker-compose run --rm auto-browser \
  auto-browser create-template "https://example.com" \
  --name example --description "Example template"

# List templates
docker-compose run --rm auto-browser \
  auto-browser list-sites

# Use template
docker-compose run --rm auto-browser \
  auto-browser easy --site example "https://example.com" \
  "Extract data"

Installation Notes

Docker Installation (Recommended)

Easiest setup with all dependencies included
Docker Compose provides simple management
Environment variables handled automatically
Output directory mounted automatically
Supports all features and demos
Cross-platform compatibility

Manual Installation

Requires Python 3.8 or higher
Node.js LTS version recommended
System dependencies handled by install script
Playwright browsers installed automatically
Package manager locks handled gracefully

Advanced Workflow Examples

1. Time Management

# Complete timesheet workflow
auto-browser easy --interactive "https://workday.com" "Fill out timesheet for the week:
- Monday: 8h Development
- Tuesday: 6h Development, 2h Meetings
- Wednesday: 7h Development, 1h Documentation
Then submit for approval"

2. Social Media Management

# Cross-platform posting
auto-browser easy --interactive "https://buffer.com" "Create posts about auto-browser:
1. Twitter: Announce new release
2. LinkedIn: Technical deep-dive
3. Schedule both for optimal times"

3. Research Automation

# Academic research workflow
auto-browser easy -v -r "https://scholar.google.com" "Find papers about LLM automation:
1. Get top 10 most cited
2. Extract methodologies
3. Download PDFs
4. Create bibliography"

4. Project Setup

# Complete project initialization
auto-browser easy --interactive "https://github.com" "Create new project:
1. Initialize repository
2. Set up CI/CD
3. Configure team access
4. Create documentation"

Demo Workflows

Auto-Browser includes comprehensive demos showcasing various automation capabilities:

Basic Demos

Basic Setup: Simple data extraction and templates
Simple Search: Search functionality and data parsing
Multi-Tab: Working with multiple pages
Form Interaction: Form filling and validation
Parallel Tasks: Complex data extraction
Clinical Trials: Specialized data extraction

Advanced Workflows

Timesheet Automation: Complete timesheet management
Social Media Campaign: Multi-platform content management
Research Workflow: Academic research automation
Project Management: Project setup and coordination

Try the demos:

# Make demos executable
chmod +x demos/*.sh

# Run specific demo
./demos/07_timesheet_automation.sh

Configuration

Environment Variables

OPENAI_API_KEY: Your OpenAI API key (required)
LLM_MODEL: Model to use (default: gpt-4o-mini)
BROWSER_HEADLESS: Run browser in headless mode (default: true)

Template Configuration

Templates are stored in YAML format:

sites:
  finance:
    name: finance
    description: Extract stock information
    url_pattern: https://www.google.com/finance
    selectors:
      stock_price:
        css: .YMlKec.fxKbKc
        description: Current stock price

Output Files

Results are saved with unique filenames including:

Domain (e.g., google_com)
Path (e.g., finance)
Timestamp (YYYYMMDD_HHMMSS)
.md extension

Example: google_com_finance_20240120_123456.md

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Created by rUv (cause he could)

Repository: https://github.com/ruvnet/auto-browser

Fungibility

14,120 位关注者

George Miloradovich

Head of Content at Latenode | Driving Engaging Storytelling & Scalable Growth for a Low-Code Automation Platform | I'm NOT in charge of link exchange. Head over to [email protected]

22 分钟前

Reuven, your Auto-Browser tool sounds like a breakthrough in bridging the gap between human and machine web interactions. Its natural language control and structured data extraction features are particularly compelling. How do you see this impacting the future of web automation for non-programmers?

AIEntertainment (tm)

3 周

Reuven Cohen I'm doing this one instead

Aly Ghoneim

Microsoft Modern Work Expert | Microsoft Copilot

1 个月

Ive built this , it works nicely.

1 次回应

Dave Brace

Strong Product Mgmt leader | Curious innovative PM team leader | Technical product entrepreneur, GTM Engineer, Roadmap, Strategy hands-on | Gen AI ML LLM RAG NLQ Data expert

1 个月

So your saying that this is a Conversational AI replacement for ‘Beautiful Soup’ and all the esoteric scripting & Xpath knowledge/work coding needed to control BS4… ?

1 次回应

Workday Learner Community

1 个月

Auto-Browser sounds like a trend setter for simplifying web automation. The ability to harness natural language for complex multi-step tasks is impressive and will undoubtedly lower the barrier for many users. I'm excited to see how this evolves and enhances productivity across various sectors. Great share, Reuven Cohen!

查看更多评论

要查看或添加评论，请登录

Cohen Reuven的更多文章

Introducing Declarative Self-improving TypeScript. (DSPy.ts): Build & Run powerful Free AI applications right in your web browser.

2025年2月22日

Introducing Declarative Self-improving TypeScript. (DSPy.ts): Build & Run powerful Free AI applications right in your web browser.

DSPy.ts ?? Declarative Self-improving TypeScript (DSPy.

16 条评论
Introducing Meta Agents: An agent that creates agents.

2025年2月21日

Introducing Meta Agents: An agent that creates agents.

Introducing Meta Agents: An agent that creates agents. Instead of manually scripting every new AI assistant, the Meta…

64 条评论
Introducing Quantum Agentics: A New Way to Think About AI Tasks & Decision-Making

2025年2月17日

Introducing Quantum Agentics: A New Way to Think About AI Tasks & Decision-Making

What if you could instantly see all the best solution to a complex reasoning problems all at once? That's the problem…

33 条评论
Introducing Agentic_Robots.txt - Automating Agent Access to Websites

2025年2月14日

Introducing Agentic_Robots.txt - Automating Agent Access to Websites

Empowering the Next Generation of Web Automation Agentic_Robots.txt improves how autonomous agents interact with web…

14 条评论
Ai Hacker League Live Coding: AI Agent Development Tutorial using Crew Ai and Aider.

2025年1月23日

Ai Hacker League Live Coding: AI Agent Development Tutorial using Crew Ai and Aider.

AI Hacker League is a vibrant community of developers, researchers, and enthusiasts who come together to explore and…

5 条评论
Introducing Ai Code Calculator: Comparing the costs of Code Agents vs Human Software Engineering (96% cheaper on average)

2025年1月12日

Introducing Ai Code Calculator: Comparing the costs of Code Agents vs Human Software Engineering (96% cheaper on average)

When I couldn’t find a tool that addressed the operational costs of code agents versus hiring a software engineer in…

23 条评论
効 SynthLang a hyper-efficient prompt language inspired by Japanese Kanji cutting token costs by 90%, speeding up AI responses by 900%

2025年1月6日

効 SynthLang a hyper-efficient prompt language inspired by Japanese Kanji cutting token costs by 90%, speeding up AI responses by 900%

SynthLang: Revolutionizing AI with Compact, Multilingual Efficiency Over the weekend, I tackled a challenge I’ve been…

67 条评论
Fully Autonomous Coding: Introducing SPARC CLI and Conscious Coding Agents

2024年12月26日

Fully Autonomous Coding: Introducing SPARC CLI and Conscious Coding Agents

I’m excited to introduce Conscious Coding Agents. These intelligent, fully autonomous coding agents dynamically…

54 条评论
Introducing Reflective Engineer: Building Conscious Agents

2024年12月18日

Introducing Reflective Engineer: Building Conscious Agents

Over the past couple of weeks, I’ve made huge progress in applying symbolic reasoning and other mathematical structures…

42 条评论
??? Introduction to Sora Prompts: 300+ Cinematic Video Prompts

2024年12月11日

??? Introduction to Sora Prompts: 300+ Cinematic Video Prompts

Dolly in toward the antique music box, intensifying the audience’s curiosity as it open Introduction to Cinematic Sora…

13 条评论

See all articles

Features

Introduction to Multi-Step Browser Automation

Key Concepts

Natural Language Control

Smart Navigation

State Management

Template System

Installation

Docker Installation (Recommended)

Using Docker Compose (Easiest)

Using Docker Directly

Quick Install (Linux/macOS)

Manual Installation

Docker Usage Examples

Basic Operations

Advanced Workflows

Template Management

领英推荐

Installation Notes

Docker Installation (Recommended)

Manual Installation

Advanced Workflow Examples

1. Time Management

2. Social Media Management

3. Research Automation

4. Project Setup

Demo Workflows

Basic Demos

Advanced Workflows

Try the demos:

Configuration

Environment Variables

Template Configuration

Output Files

Contributing

License

Author

Fungibility

14,120 位关注者

Cohen Reuven的更多文章

Introducing Declarative Self-improving TypeScript. (DSPy.ts): Build & Run powerful Free AI applications right in your web browser.

Introducing Meta Agents: An agent that creates agents.

Introducing Quantum Agentics: A New Way to Think About AI Tasks & Decision-Making

Introducing Agentic_Robots.txt - Automating Agent Access to Websites

Ai Hacker League Live Coding: AI Agent Development Tutorial using Crew Ai and Aider.

Introducing Ai Code Calculator: Comparing the costs of Code Agents vs Human Software Engineering (96% cheaper on average)

効 SynthLang a hyper-efficient prompt language inspired by Japanese Kanji cutting token costs by 90%, speeding up AI responses by 900%

Fully Autonomous Coding: Introducing SPARC CLI and Conscious Coding Agents

Introducing Reflective Engineer: Building Conscious Agents

??? Introduction to Sora Prompts: 300+ Cinematic Video Prompts

社区洞察

其他会员也浏览了

How to Overcome Challenges in Integrating AI in App and Web development?

Exploring the Latest Web Development Trends in 2023

AI in Web Development: Exploring Intelligent Solutions for Modern Challenges

10 Ways AI Enhances Web Development

AdxVenture’s Vision for the Future of Web Development: Trends and Predictions

It's never been so easy to use code as a design tool

The Role of AI in Frontend Development

How AI Will Transform Web Development by 2025

Hello Builder,

Is AI and machine learning making traditional web development skills less important?