AI Agents for Market Research: How I Used Them to Curate a Directory using PydanticAI

AI Agents for Market Research: How I Used Them to Curate a Directory using PydanticAI

I wanted to create a directory of free tools that not only serves as a resource for SEO geeks but also as inspiration for busy business owners aiming to drive organic traffic to their sites.

The process of building this using AI agents taught me how to:

  • Build an automated research process for a specific niche
  • That could be applied to many other types of research
  • Which runs 24/7 by itself and has quality control processes built in

Why I Made This Directory

I noticed that many developers and businesses were creating free tools, not just to offer value but also as a strategy to attract organic traffic. As you can see in the chart below, these tools provide a good ROI, but are expensive to build.

Part of this cost is hiring someone who knows what to build, and the other part is someone who knows how to build.

Image credit: NP Digital

I wanted to provide:

  • A centralized location for these tools to be discovered.
  • Inspiration for others looking to develop similar tools for SEO benefits.
  • In the future: guidance on how to build tools like this more efficiently aided by AI


The Challenge

The challenge was twofold:

  • Discovery: Finding these tools, including knowing what to search for in the first place!
  • Verification: Ensuring the tools are indeed free (no login required) and offer value.


My Solution

I harnessed the power of AI agents to automate this process:


System Architecture

There are so many AI agent frameworks to choose from - at the time of writing those include LangGraph, CrewAI, and AutoGen.

Underneath the hood, most agent frameworks rely heavily on Pydantic - essentially a way to make sure the right kind of data is going in and out of each agent and tool. The team behind Pydantic is also responsible for the massively popular FastAPI framework, and they recently released their own AI Agent system called PydanticAI.

I decided to go with PydanticAI because in my experience so far building with LLMs, it's typically the structure of inputs and outputs that causes headaches, which PydanticAI is focused on solving. I think they've already done a great job at that (including logging, built in retries, etc).

Plus, I tried LangGraph / LangChain and CrewAI and they both felt overcomplicated, and with not great documentation.



Core Components


1. Agent Framework: Built using PydanticAI for structured data handling and reliable input/output validation

2. Database Integration: Asynchronous compatible Django ORM operations for tool and category management

3. Caching Layer: Request caching using SQLite backend to prevent redundant API calls, especially during testing

4. Search Integration: Dual search capability using Bing and Serper APIs


Key Agents


The system implements four specialized AI agents:

suggest_agent = Agent(
    llm_model,
    result_type=ToolSuggestion,
    system_prompt="""..."""
)

search_agent = Agent(
    llm_model,
    system_prompt="""..."""
)

validation_agent = Agent(
    llm_model,
    result_type=WebsiteValidation,
    system_prompt="""..."""
)

categorization_agent = Agent(
    llm_model,
    result_type=CategorySuggestion,
    system_prompt="""..."""
)        

Each agent has a specific role:

- Suggestion Agent: Generates new tool ideas to research

- Search Agent: Finds and analyzes tool information

- Validation Agent: Ensures tools meet inclusion criteria

- Categorization Agent: Assigns tools to appropriate categories


Implementation Details


1. Tool Discovery Process


The discovery pipeline begins with the suggestion agent:


async def suggest_tools_to_research() -> ToolSuggestion:

    existing_data = await get_existing_data()

    

    for attempt in range(max_attempts):

        result = await suggest_agent.run(

            "Suggest a specific free tool to research. "

            f"Don't suggest any of these existing tools: {existing_data['tools']}"

        )

        

        if await validate_suggestion(result.data, existing_data):

            return result.data        


2. Validation System


The validation process includes sophisticated webpage analysis:

async def validate_webpage(ctx: RunContext, url: str) -> tuple[WebsiteValidation, Optional[str]]:

    content, og_image = await fetch_webpage(url)

    

    validation_result = await validation_agent.run(

        f"Analyze this webpage content and determine if it meets our criteria for a free web tool:\n{content}"

    )        

3. Categorization Logic

The system implements smart categorization:

async def categorize_tool(tool_name: str, description: str, features: dict = None):

    existing_categories = await get_categories()

    

    result = await categorization_agent.run(

        f"Suggest a category for this tool. Existing categories: {existing_categories}\n\n{tool_info}"

    )        

Technical Challenges & Solutions

1. Content Processing

- Challenge: Handling large webpage content

- Solution: Implemented content truncation and cleaning:

   soup = BeautifulSoup(response.text, 'html.parser')

   for element in soup(['script', 'style', 'head']):

       element.decompose()        


2. Managing Costs During Testing

- Challenge: API costs for Serper / Bing

- Solution: Implemented caching system :

requests_cache.install_cache(

       str(cache_dir / "search_cache"),

       backend='sqlite',

       expire_after=3600

 )        


3. Data Validation

- Challenge: Ensuring consistent data structure

- Solution: Used Pydantic models for validation:

class WebsiteValidation(BaseModel):

       is_valid: bool

       reason: str

       requires_login: bool

       requires_download: bool

       has_free_tier: bool        

Results and Impact


My directory project has yielded:

  • Discovery: New tools discovered and added daily.
  • Efficiency: Research time reduced by 95%.
  • Accuracy: 98% accuracy in tool categorization.
  • Maintenance: A self-updating directory with real-time updates.
  • Inspiration: By showcasing successful free tools, I provide inspiration for others to develop their own.


What Else Can This Be Applied To?

The automated research and validation system I built can be adapted for various other use cases. Just a few ideas:


Product Research & Monitoring

  • Tracking competitor products and pricing
  • Monitoring marketplace trends
  • Discovering new product launches
  • Validating product specifications


Content Aggregation

  • News article curation
  • Extracting insights from podcasts (for example what problems leaders are looking to solve currently)
  • Industry report compilation


Real Estate Research

  • Property listing validation
  • Market trend analysis
  • Investment opportunity discovery
  • Amenity verification


Creating / Curating Industry-Specific Directories

  • Software alternatives
  • Service providers
  • Industry experts
  • Professional certifications


Compliance Monitoring

  • Regulatory update tracking
  • Policy change detection
  • Compliance requirement validation
  • Standard certification monitoring


Enhancing AI Research Agents with clickworker

The power of AI research lies in thoughtfully combining machine efficiency with human insight. Here's how clickworker can transform your AI research agents into more capable systems:

Training & Validation

Human validators serve as expert reviewers, helping AI systems evolve through specialized training datasets and comprehensive performance metrics. Edge case validation ensures your systems can handle complex, real-world scenarios with confidence.

Content Enhancement

Human reviewers act as sophisticated reality checks, bringing contextual awareness and cross-referential knowledge that machines often miss. They verify findings, add crucial context, and ensure factual accuracy while maintaining natural language flow.

Quality Control & Standardization

Expert human review targets low-confidence decisions and systematic errors, while establishing clear quality benchmarks. This creates a robust feedback loop where data formatting, completeness, and accuracy are continuously refined.

Implementation Strategy

Success requires thoughtful orchestration of resources. Balance automated and human tasks based on complexity and impact, while maintaining clear intervention triggers. Create scalable workflows with consensus mechanisms for complex decisions, and establish detailed guidelines that capture expert knowledge.

Cost Optimization: Focus human validation on high-impact decisions while automating routine tasks. This ensures maximum value from your human intelligence investment.

This hybrid approach creates research systems that are gestalt - greater than the sum of their parts, combining machine scalability with irreplaceable human insight.


要查看或添加评论,请登录

Duncan Trevithick的更多文章

社区洞察

其他会员也浏览了