登录查看更多内容

AI Agents for Market Research: How I Used Them to Curate a Directory using PydanticAI

Duncan Trevithick

Marketing at leading AI training data provider clickworker / LXT

发布日期: 2024年12月20日

I wanted to create a directory of free tools that not only serves as a resource for SEO geeks but also as inspiration for busy business owners aiming to drive organic traffic to their sites.

The process of building this using AI agents taught me how to:

Build an automated research process for a specific niche
That could be applied to many other types of research
Which runs 24/7 by itself and has quality control processes built in

Why I Made This Directory

I noticed that many developers and businesses were creating free tools, not just to offer value but also as a strategy to attract organic traffic. As you can see in the chart below, these tools provide a good ROI, but are expensive to build.

Part of this cost is hiring someone who knows what to build, and the other part is someone who knows how to build.

I wanted to provide:

A centralized location for these tools to be discovered.
Inspiration for others looking to develop similar tools for SEO benefits.
In the future: guidance on how to build tools like this more efficiently aided by AI

The Challenge

The challenge was twofold:

Discovery: Finding these tools, including knowing what to search for in the first place!
Verification: Ensuring the tools are indeed free (no login required) and offer value.

My Solution

I harnessed the power of AI agents to automate this process:

System Architecture

There are so many AI agent frameworks to choose from - at the time of writing those include LangGraph, CrewAI, and AutoGen.

Underneath the hood, most agent frameworks rely heavily on Pydantic - essentially a way to make sure the right kind of data is going in and out of each agent and tool. The team behind Pydantic is also responsible for the massively popular FastAPI framework, and they recently released their own AI Agent system called PydanticAI.

I decided to go with PydanticAI because in my experience so far building with LLMs, it's typically the structure of inputs and outputs that causes headaches, which PydanticAI is focused on solving. I think they've already done a great job at that (including logging, built in retries, etc).

Plus, I tried LangGraph / LangChain and CrewAI and they both felt overcomplicated, and with not great documentation.

Core Components

1. Agent Framework: Built using PydanticAI for structured data handling and reliable input/output validation

2. Database Integration: Asynchronous compatible Django ORM operations for tool and category management

3. Caching Layer: Request caching using SQLite backend to prevent redundant API calls, especially during testing

4. Search Integration: Dual search capability using Bing and Serper APIs

Key Agents

The system implements four specialized AI agents:

suggest_agent = Agent(
    llm_model,
    result_type=ToolSuggestion,
    system_prompt="""..."""
)

search_agent = Agent(
    llm_model,
    system_prompt="""..."""
)

validation_agent = Agent(
    llm_model,
    result_type=WebsiteValidation,
    system_prompt="""..."""
)

categorization_agent = Agent(
    llm_model,
    result_type=CategorySuggestion,
    system_prompt="""..."""
)

Each agent has a specific role:

- Suggestion Agent: Generates new tool ideas to research

- Search Agent: Finds and analyzes tool information

- Validation Agent: Ensures tools meet inclusion criteria

- Categorization Agent: Assigns tools to appropriate categories

Implementation Details

1. Tool Discovery Process

The discovery pipeline begins with the suggestion agent:

async def suggest_tools_to_research() -> ToolSuggestion:

    existing_data = await get_existing_data()

    

    for attempt in range(max_attempts):

        result = await suggest_agent.run(

            "Suggest a specific free tool to research. "

            f"Don't suggest any of these existing tools: {existing_data['tools']}"

        )

        

        if await validate_suggestion(result.data, existing_data):

            return result.data

2. Validation System

The validation process includes sophisticated webpage analysis:

async def validate_webpage(ctx: RunContext, url: str) -> tuple[WebsiteValidation, Optional[str]]:

    content, og_image = await fetch_webpage(url)

    

    validation_result = await validation_agent.run(

        f"Analyze this webpage content and determine if it meets our criteria for a free web tool:\n{content}"

    )

领英推荐

What is concept search?

Algolia 1 年前

SearchGPT vs. Google vs. Bing: Search Results Review

AdAxiom 4 个月前

SearchGPT: AI Search Engine - What It Offers and How…

SocioSquares 7 个月前

3. Categorization Logic

The system implements smart categorization:

async def categorize_tool(tool_name: str, description: str, features: dict = None):

    existing_categories = await get_categories()

    

    result = await categorization_agent.run(

        f"Suggest a category for this tool. Existing categories: {existing_categories}\n\n{tool_info}"

    )

Technical Challenges & Solutions

1. Content Processing

- Challenge: Handling large webpage content

- Solution: Implemented content truncation and cleaning:

   soup = BeautifulSoup(response.text, 'html.parser')

   for element in soup(['script', 'style', 'head']):

       element.decompose()

2. Managing Costs During Testing

- Challenge: API costs for Serper / Bing

- Solution: Implemented caching system :

requests_cache.install_cache(

       str(cache_dir / "search_cache"),

       backend='sqlite',

       expire_after=3600

 )

3. Data Validation

- Challenge: Ensuring consistent data structure

- Solution: Used Pydantic models for validation:

class WebsiteValidation(BaseModel):

       is_valid: bool

       reason: str

       requires_login: bool

       requires_download: bool

       has_free_tier: bool

Results and Impact

My directory project has yielded:

Discovery: New tools discovered and added daily.
Efficiency: Research time reduced by 95%.
Accuracy: 98% accuracy in tool categorization.
Maintenance: A self-updating directory with real-time updates.
Inspiration: By showcasing successful free tools, I provide inspiration for others to develop their own.

What Else Can This Be Applied To?

The automated research and validation system I built can be adapted for various other use cases. Just a few ideas:

Product Research & Monitoring

Tracking competitor products and pricing
Monitoring marketplace trends
Discovering new product launches
Validating product specifications

Content Aggregation

News article curation
Extracting insights from podcasts (for example what problems leaders are looking to solve currently)
Industry report compilation

Real Estate Research

Property listing validation
Market trend analysis
Investment opportunity discovery
Amenity verification

Creating / Curating Industry-Specific Directories

Software alternatives
Service providers
Industry experts
Professional certifications

Compliance Monitoring

Regulatory update tracking
Policy change detection
Compliance requirement validation
Standard certification monitoring

Enhancing AI Research Agents with clickworker

The power of AI research lies in thoughtfully combining machine efficiency with human insight. Here's how clickworker can transform your AI research agents into more capable systems:

Training & Validation

Human validators serve as expert reviewers, helping AI systems evolve through specialized training datasets and comprehensive performance metrics. Edge case validation ensures your systems can handle complex, real-world scenarios with confidence.

Content Enhancement

Human reviewers act as sophisticated reality checks, bringing contextual awareness and cross-referential knowledge that machines often miss. They verify findings, add crucial context, and ensure factual accuracy while maintaining natural language flow.

Quality Control & Standardization

Expert human review targets low-confidence decisions and systematic errors, while establishing clear quality benchmarks. This creates a robust feedback loop where data formatting, completeness, and accuracy are continuously refined.

Implementation Strategy

Success requires thoughtful orchestration of resources. Balance automated and human tasks based on complexity and impact, while maintaining clear intervention triggers. Create scalable workflows with consensus mechanisms for complex decisions, and establish detailed guidelines that capture expert knowledge.

Cost Optimization: Focus human validation on high-impact decisions while automating routine tasks. This ensures maximum value from your human intelligence investment.

This hybrid approach creates research systems that are gestalt - greater than the sum of their parts, combining machine scalability with irreplaceable human insight.

SEO for SaaS

471 位关注者

要查看或添加评论，请登录

Duncan Trevithick的更多文章

Top Image Datasets for Machine Learning

2024年9月25日

Top Image Datasets for Machine Learning

Image datasets are the rocket fuel propelling our AI models to new heights. These curated collections of visual data…
AI SEO Tools Chest: 6 Tools for Higher Rankings That Aren't Just ChatGPT Wrappers

2024年7月23日

AI SEO Tools Chest: 6 Tools for Higher Rankings That Aren't Just ChatGPT Wrappers

I'm sure you've heard of too many SEO AI content writing tools to count. There's probably hundreds at this point.

1 条评论
GoHighLevel Review: Features & Value [2024]

2024年2月23日

GoHighLevel Review: Features & Value [2024]

HighLevel (or GoHighLevel as it's often called) is an all-in-one suite of marketing software that claims to be able to…

1 条评论
AI Video Marketing: How to Use AI Video Generators for Your Brand

2024年1月10日

AI Video Marketing: How to Use AI Video Generators for Your Brand

Video marketing is already a huge part of the marketing strategy for many brands, and it’s only going to increase…
Solar Lead Generation: Step By Step Guide to Getting Better and Cheaper Leads for Solar Sales

2023年11月7日

Solar Lead Generation: Step By Step Guide to Getting Better and Cheaper Leads for Solar Sales

Lead generation for solar is not always easy. You can’t just rely on your website or a Craigslist ad.

3 条评论
Django Distill: How to Create a SEO Optimized Static Site Using Django

2023年11月3日

Django Distill: How to Create a SEO Optimized Static Site Using Django

Django-distill is a Django project that allows you to create a static version of your website that's built with the…
Programmatic SEO Examples: In-Depth Guide With Use Cases

2023年10月10日

Programmatic SEO Examples: In-Depth Guide With Use Cases

Introduction to Programmatic SEO Definition and Importance of Programmatic SEO Programmatic SEO, often considered the…
SaaS Community Examples: How SaaS Companies Are Using Communities to Scale in 2023

2023年9月28日

SaaS Community Examples: How SaaS Companies Are Using Communities to Scale in 2023

Building a community around your SaaS is one of those holistic solutions that solves multiple problems at once…
Tips for Recording High Quality Zoom Interviews

2020年4月2日

Tips for Recording High Quality Zoom Interviews

During social distancing, we're obviously limited in the types of video content we can create. One type that can work…

2 条评论
Solar Panels UK | Video

2020年2月3日

Solar Panels UK | Video

I love producing videos and I've decided to re-start making 'passion project' videos just because I enjoy it. This is…

2 条评论

See all articles

AI Agents for Market Research: How I Used Them to Curate a Directory using PydanticAI

Duncan Trevithick

Marketing at leading AI training data provider clickworker / LXT

Why I Made This Directory

The Challenge

My Solution

System Architecture

Core Components

Key Agents

Implementation Details

领英推荐

Technical Challenges & Solutions

Results and Impact

What Else Can This Be Applied To?

Enhancing AI Research Agents with clickworker

Training & Validation

Content Enhancement

Quality Control & Standardization

Implementation Strategy

SEO for SaaS

471 位关注者

Duncan Trevithick的更多文章

社区洞察

其他会员也浏览了

Google AI Overviews Buzz: good luck with tracking AIOs, why AIOs are still not present in the EU, AIOs displayed results from a “404” page, and more

More AI Overviews, Deepfakes, and GSC Recommendations: SEMantics September 2024

US Searchers Get Early Access to Google's AI-Powered Search

Key SEO Trends to Watch in 2023: #4 Google's SERP Evolution: How Generative AI is Reshaping the Search Landscape

Is AI the End of SEO? The Future of Search Optimization in Finance

Introducing CustomGPT.ai SGE — custom Search Generative Experience for your website ??

Why TendersGo is the Ultimate Global Tender Search Engine

Publisher Insider #93: Are rich results dead? What you need to do.

Are you ready for AI Overviews?

Latest SEO news you need to know - March 2023

Why I Made This Directory

The Challenge

My Solution

System Architecture

Core Components

Key Agents

Implementation Details

领英推荐

Technical Challenges & Solutions

Results and Impact

What Else Can This Be Applied To?

Enhancing AI Research Agents with clickworker

Training & Validation

Content Enhancement

Quality Control & Standardization

Implementation Strategy

SEO for SaaS

471 位关注者

Duncan Trevithick的更多文章

Top Image Datasets for Machine Learning

AI SEO Tools Chest: 6 Tools for Higher Rankings That Aren't Just ChatGPT Wrappers

GoHighLevel Review: Features & Value [2024]

AI Video Marketing: How to Use AI Video Generators for Your Brand

Solar Lead Generation: Step By Step Guide to Getting Better and Cheaper Leads for Solar Sales

Django Distill: How to Create a SEO Optimized Static Site Using Django

Programmatic SEO Examples: In-Depth Guide With Use Cases

SaaS Community Examples: How SaaS Companies Are Using Communities to Scale in 2023

Tips for Recording High Quality Zoom Interviews

Solar Panels UK | Video

社区洞察

其他会员也浏览了

Google AI Overviews Buzz: good luck with tracking AIOs, why AIOs are still not present in the EU, AIOs displayed results from a “404” page, and more

More AI Overviews, Deepfakes, and GSC Recommendations: SEMantics September 2024

US Searchers Get Early Access to Google's AI-Powered Search

Key SEO Trends to Watch in 2023: #4 Google's SERP Evolution: How Generative AI is Reshaping the Search Landscape

Is AI the End of SEO? The Future of Search Optimization in Finance

Introducing CustomGPT.ai SGE — custom Search Generative Experience for your website ??

Why TendersGo is the Ultimate Global Tender Search Engine

Publisher Insider #93: Are rich results dead? What you need to do.

Are you ready for AI Overviews?

Latest SEO news you need to know - March 2023