Intent Extraction Service Using DistilBERT
1. Overview
This document describes the design and implementation of an intent extraction service for a conversational AI application.
The goal is to reliably classify user input into one of three intent categories—consultation & Q&A, ideation & brainstorming, and planning & scheduling—using a fine-tuned DistilBERT-based model.
The service will be deployed as a REST microservice within a Kubernetes environment for scalability and high availability.
2. Requirements
Functional Requirements
Non-Functional Requirements
3. System Architecture
High-Level Architecture
Data Flow Diagram
User Query -> API Gateway -> [Preprocessing] -> [DistilBERT-based Classifier] -> [Postprocessing] -> API Gateway -> User Response
4. Model Selection and Architecture
Model Choice: DistilBERT Model
Why BERT?
BERT vs. DistilBERT:
Fine-Tuning Strategy:
Model Architecture Details:
5. Dataset and Data Preparation
Dataset Collection:
Historical user queries and manually annotated samples covering three intent categories:
Data Labelling:
Each query is tagged with its appropriate intent, as a numeric value, ensuring clear class distinctions.
Data Augmentation:
Optionally, generate paraphrases and variations of existing queries to enhance model robustness and improve generalization.
Data Format:
Store data in CSV with at least two columns:
Dataset Splits:
Preprocessing Steps:
Training Process:
PyTorch Dataset Class Example
This code creates a dataset class that prepares text data for fine-tuning a DistilBERT model. It tokenizes text inputs, pads or truncates them to a fixed length, and converts both the input data and labels into PyTorch tensors ready for model training.
from transformers import DistilBertTokenizer
from torch.utils.data import Dataset
import torch
class IntentDataset(Dataset):
def __init__(self, texts, labels, tokenizer: DistilBertTokenizer, max_length: int = 128):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
encoding = self.tokenizer(
self.texts[idx],
add_special_tokens=True,
max_length=self.max_length,
padding='max_length',
truncation=True,
return_tensors='pt'
)
return {
'input_ids': encoding['input_ids'].squeeze(), # Shape: [max_length]
'attention_mask': encoding['attention_mask'].squeeze(),
'label': torch.tensor(self.labels[idx], dtype=torch.long)
}
6. Model Training and Fine-Tuning
Model Architecture
Training Setup
Training Script Example
import torch
from transformers import DistilBertForSequenceClassification, AdamW
from torch.utils.data import DataLoader
# Initialize tokenizer and model
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=3)
# Assume train_texts and train_labels are loaded from CSV
train_dataset = IntentDataset(train_texts, train_labels, tokenizer)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# Set up optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)
model.train()
num_epochs = 3
for epoch in range(num_epochs):
epoch_loss = 0
for batch in train_loader:
optimizer.zero_grad()
outputs = model(
input_ids=batch['input_ids'],
attention_mask=batch['attention_mask'],
labels=batch['label']
)
loss = outputs.loss
loss.backward()
optimizer.step()
epoch_loss += loss.item()
avg_loss = epoch_loss / len(train_loader)
print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {avg_loss:.4f}")
# Save the trained model
torch.save(model.state_dict(), "intent_extraction_distilbert.pt")
7. Inference Service with FastAPI
REST API Design
{
"query": "Plan a 2-day trip to Kyoto"
}
{
"intent": "Planning & Scheduling",
"confidence": 0.92
}
FastAPI Implementation Example
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
app = FastAPI()
# Load tokenizer and fine-tuned DistilBERT model on startup
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertForSequenceClassification.from_pretrained('path/to/fine_tuned_model')
model.load_state_dict(torch.load("intent_extraction_distilbert.pt", map_location=torch.device('cpu')))
model.eval() # Set model to evaluation mode
# Mapping from numerical labels to intents
intent_labels = {
0: "Consultation & Q&A",
1: "Ideation & Brainstorming",
2: "Planning & Scheduling"
}
class QueryRequest(BaseModel):
query: str
@app.post("/api/v1/intent")
async def extract_intent(request: QueryRequest):
try:
# Preprocess the query
inputs = tokenizer(
request.query,
add_special_tokens=True,
max_length=128,
padding='max_length',
truncation=True,
return_tensors='pt'
)
# Inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=1)
confidence, pred_class = torch.max(probabilities, dim=1)
intent = intent_labels[pred_class.item()]
return {"intent": intent, "confidence": confidence.item()}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
8. Containerization and AWS Deployment
Dockerfile Example
FROM python:3.9-slim
# Set working directory
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy source code
COPY . .
# Expose port for FastAPI
EXPOSE 8000
# Start the FastAPI service using Uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
requirements.txt Example
fastapi
uvicorn
torch
transformers
pydantic
AWS Cloud Deployment
Container Registry and Orchestration
Deployment Steps
docker build -t intent-extraction-service .
docker tag intent-extraction-service:latest <aws_account_id>.dkr.ecr.<region>.amazonaws.com/intent-extraction-service:latest
2. Push to ECR:
docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/intent-extraction-service:latest
3. Deploy on AWS:
Example Kubernetes Deployment Manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: intent-extraction-deployment
spec:
replicas: 3
selector:
matchLabels:
app: intent-extraction
template:
metadata:
labels:
app: intent-extraction
spec:
containers:
- name: intent-extraction-container
image: <aws_account_id>.dkr.ecr.<region>.amazonaws.com/intent-extraction-service:latest
ports:
- containerPort: 8000
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
CI/CD Pipeline
Logging, Monitoring, and Security
Logging and Monitoring
Security Considerations
9. Future Enhancements
10. Conclusion
This design outlines a robust, scalable, and cost-efficient intent extraction service for a conversational AI feature. By leveraging a fine-tuned BERT-based model, the system can accurately classify user queries across consultation, ideation, and planning use cases. The deployment strategy using RESTful APIs within a Kubernetes environment ensures that the service remains responsive and scalable, meeting both current and future demands.