AI Agents for Market Research: How I Used Them to Curate a Directory using PydanticAI
Duncan Trevithick
Marketing at leading AI training data provider clickworker / LXT
I wanted to create a directory of free tools that not only serves as a resource for SEO geeks but also as inspiration for busy business owners aiming to drive organic traffic to their sites.
The process of building this using AI agents taught me how to:
Why I Made This Directory
I noticed that many developers and businesses were creating free tools, not just to offer value but also as a strategy to attract organic traffic. As you can see in the chart below, these tools provide a good ROI, but are expensive to build.
Part of this cost is hiring someone who knows what to build, and the other part is someone who knows how to build.
I wanted to provide:
The Challenge
The challenge was twofold:
My Solution
I harnessed the power of AI agents to automate this process:
System Architecture
There are so many AI agent frameworks to choose from - at the time of writing those include LangGraph, CrewAI, and AutoGen.
Underneath the hood, most agent frameworks rely heavily on Pydantic - essentially a way to make sure the right kind of data is going in and out of each agent and tool. The team behind Pydantic is also responsible for the massively popular FastAPI framework, and they recently released their own AI Agent system called PydanticAI.
I decided to go with PydanticAI because in my experience so far building with LLMs, it's typically the structure of inputs and outputs that causes headaches, which PydanticAI is focused on solving. I think they've already done a great job at that (including logging, built in retries, etc).
Plus, I tried LangGraph / LangChain and CrewAI and they both felt overcomplicated, and with not great documentation.
Core Components
1. Agent Framework: Built using PydanticAI for structured data handling and reliable input/output validation
2. Database Integration: Asynchronous compatible Django ORM operations for tool and category management
3. Caching Layer: Request caching using SQLite backend to prevent redundant API calls, especially during testing
4. Search Integration: Dual search capability using Bing and Serper APIs
Key Agents
The system implements four specialized AI agents:
suggest_agent = Agent(
llm_model,
result_type=ToolSuggestion,
system_prompt="""..."""
)
search_agent = Agent(
llm_model,
system_prompt="""..."""
)
validation_agent = Agent(
llm_model,
result_type=WebsiteValidation,
system_prompt="""..."""
)
categorization_agent = Agent(
llm_model,
result_type=CategorySuggestion,
system_prompt="""..."""
)
Each agent has a specific role:
- Suggestion Agent: Generates new tool ideas to research
- Search Agent: Finds and analyzes tool information
- Validation Agent: Ensures tools meet inclusion criteria
- Categorization Agent: Assigns tools to appropriate categories
Implementation Details
1. Tool Discovery Process
The discovery pipeline begins with the suggestion agent:
async def suggest_tools_to_research() -> ToolSuggestion:
existing_data = await get_existing_data()
for attempt in range(max_attempts):
result = await suggest_agent.run(
"Suggest a specific free tool to research. "
f"Don't suggest any of these existing tools: {existing_data['tools']}"
)
if await validate_suggestion(result.data, existing_data):
return result.data
2. Validation System
The validation process includes sophisticated webpage analysis:
async def validate_webpage(ctx: RunContext, url: str) -> tuple[WebsiteValidation, Optional[str]]:
content, og_image = await fetch_webpage(url)
validation_result = await validation_agent.run(
f"Analyze this webpage content and determine if it meets our criteria for a free web tool:\n{content}"
)
领英推荐
3. Categorization Logic
The system implements smart categorization:
async def categorize_tool(tool_name: str, description: str, features: dict = None):
existing_categories = await get_categories()
result = await categorization_agent.run(
f"Suggest a category for this tool. Existing categories: {existing_categories}\n\n{tool_info}"
)
Technical Challenges & Solutions
1. Content Processing
- Challenge: Handling large webpage content
- Solution: Implemented content truncation and cleaning:
soup = BeautifulSoup(response.text, 'html.parser')
for element in soup(['script', 'style', 'head']):
element.decompose()
2. Managing Costs During Testing
- Challenge: API costs for Serper / Bing
- Solution: Implemented caching system :
requests_cache.install_cache(
str(cache_dir / "search_cache"),
backend='sqlite',
expire_after=3600
)
3. Data Validation
- Challenge: Ensuring consistent data structure
- Solution: Used Pydantic models for validation:
class WebsiteValidation(BaseModel):
is_valid: bool
reason: str
requires_login: bool
requires_download: bool
has_free_tier: bool
Results and Impact
My directory project has yielded:
What Else Can This Be Applied To?
The automated research and validation system I built can be adapted for various other use cases. Just a few ideas:
Product Research & Monitoring
Content Aggregation
Real Estate Research
Creating / Curating Industry-Specific Directories
Compliance Monitoring
Enhancing AI Research Agents with clickworker
The power of AI research lies in thoughtfully combining machine efficiency with human insight. Here's how clickworker can transform your AI research agents into more capable systems:
Training & Validation
Human validators serve as expert reviewers, helping AI systems evolve through specialized training datasets and comprehensive performance metrics. Edge case validation ensures your systems can handle complex, real-world scenarios with confidence.
Content Enhancement
Human reviewers act as sophisticated reality checks, bringing contextual awareness and cross-referential knowledge that machines often miss. They verify findings, add crucial context, and ensure factual accuracy while maintaining natural language flow.
Quality Control & Standardization
Expert human review targets low-confidence decisions and systematic errors, while establishing clear quality benchmarks. This creates a robust feedback loop where data formatting, completeness, and accuracy are continuously refined.
Implementation Strategy
Success requires thoughtful orchestration of resources. Balance automated and human tasks based on complexity and impact, while maintaining clear intervention triggers. Create scalable workflows with consensus mechanisms for complex decisions, and establish detailed guidelines that capture expert knowledge.
Cost Optimization: Focus human validation on high-impact decisions while automating routine tasks. This ensures maximum value from your human intelligence investment.
This hybrid approach creates research systems that are gestalt - greater than the sum of their parts, combining machine scalability with irreplaceable human insight.