Transforming GPT-4o into an AI Query Assistant for SFMC

Transforming GPT-4o into an AI Query Assistant for SFMC

After recently starting to learn about AI models, prompt engineering, and fine-tuning, I quickly realized that public resources on integrating GPT into Salesforce Marketing Cloud (SFMC) were extremely limited—especially for practical, real-world use cases beyond content generation.

Most examples focus on generating text-based responses, but SFMC presents a unique landscape where GPT can be leveraged for automation, data extraction, and intelligent querying. There's so much more we can do.

In this article, I'll walk you through the process of transforming GPT-4o into an AI-powered Query Assistant that can interpret user queries, map them to Data Extensions (DEs), extract relevant fields, and generate SQL queries dynamically.

By the end, I'll also highlight another AI-driven applications you can build for SFMC—pushing the boundaries of what’s possible with AI in marketing automation.

Github repo: https://github.com/selimsevim/sfmc-ai-query-assistant

Key Features: What Makes This AI Query Assistant Interesting?

?? 1. Natural Language to Structured Query Understanding

Users can input questions in plain English, like: "Bring all the opens happened in last 30 days for EmailName = X" The assistant intelligently maps this request to the correct Data Extensions (_job, _open) and fields (EventDate, SubscriberKey, etc.) without the user needing to know DE names.


?? 2. Dynamic Data Extension & Field Lookup

Instead of hardcoding database tables, the assistant dynamically retrieves Data Extension (DE) references and field mappings from lookup tables in SFMC.

  • It cross-references user input with a DE Reference Table to determine which DEs to query.
  • It checks a Field Reference Table to infer the correct fields when the user doesn’t specify them explicitly.

?? 3. Intelligent Query Building with Context Awareness

The assistant doesn’t just list DEs—it understands relationships between data extensions and constructs queries accordingly with preconfigured data extensions.

? 4. SQL Query Generation with Strict Compliance Rules

  • No unnecessary fields – The assistant only includes the required fields in the SELECT clause.
  • Strict SQL rules – It enforces correct syntax, prevents invalid clauses (e.g., no CROSS JOIN, no ORDER BY unless inside ROW_NUMBER), and ensures queries are SFMC-compliant.
  • Joins are handled correctly – If multiple DEs are involved, it references a relationship table to determine the correct JOIN conditions.

?? 5. Automated De-Duplication Handling

When users request deduplicated data, the assistant automatically applies ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) inside a subquery, ensuring only the latest or most relevant record is kept.


Breaking Down the AI Query Assistant: How It Works

User Input & First GPT Call

When a user enters a query, such as: ?? "Bring all the opens happened in last 30 days for EmailName = X"

  • The script extracts possible Data Extensions (DEs) and fields from two sources:

1- Direct user input (if the user explicitly names DEs or fields).

2- Lookup tables in SFMC that map user-friendly references to actual DE names and field names.

Dynamic Lookup for Data Extensions & Fields

The assistant dynamically retrieves DE and field references from SFMC’s lookup tables:

1?? DE Reference Table

  • Maps user-friendly terms to actual Data Extension names.
  • Example: The word "opens" might be mapped to "_open".

2?? Field Reference Table

  • Maps user-friendly terms to actual DE fields.
  • Example: The phrase "Open Date" could map to the "EventDate" field in _click.

Building the System Prompt for the First GPT Call

Once the relevant DEs and fields are identified, we send them to GPT-4o using a structured system prompt. The goal is to validate and refine the extracted information before proceeding to SQL generation.

The GPT system prompt includes:

? User’s input query

? Extracted DEs and fields

? Reference tables for further mapping

? Rules for handling missing DEs or fields

? JSON output format enforcement

Prompt

You are an AI that extracts Data Extensions (DEs) and fields from user queries in Salesforce Marketing Cloud.

### **Your Role:**
- Identify relevant **Data Extensions (DEs)** based on **explicit user input** OR **mapped references**.
- Identify **fields** either explicitly mentioned by the user OR inferred from the **Field Reference Table**.
- If the user explicitly states a DE, prioritize it over reference mappings.
- If no DEs can be inferred, return: `"Please explain yourself one more time."`
- If no fields can be inferred, leave the `fields` list empty.

### **Reference Table (User-Friendly Terms → Data Extensions)**
{{deMappings}} 
### **Field Reference Table (User-Friendly Terms → DE & Fields)**
{{fieldMappings}}
### **Output JSON Format (Strictly Follow This)**
```json
{
  "de_names": ["DE1", "DE2", ...],
  "fields": ["Field1", "Field2", ...]
}

.....        

Expected GPT Response (First Interaction)

{ "de_names": ["_job", "_click"], "fields": ["EventDate"] }        

At this point, we have successfully structured the query input and identified all necessary DEs and fields.

Retrieving All Available Fields for Identified Data Extensions

Now that we know which DEs we need, the assistant retrieves all fields for those Data Extensions using SFMC’s WSProxy:

  1. It fetches all available columns for each DE.
  2. It stores these fields to be used in the second GPT interaction.

Fetching Data Extension Relationships

To ensure proper joins in the SQL query, the script retrieves relationships between DEs. For example, _job and _open are related via JobID.

Building the System Prompt for the Second GPT Call

Now that we have:

? User’s refined request

? Final DE and field selection

? All available fields for those DEs

? DE relationships

We send all this structured data to GPT-4o in the second interaction, instructing it to generate an SQL query.

The GPT system prompt includes:

? Strict SQL rules (no invalid clauses, optimized query structure)

? Correct table joins using relationship mappings

? Correct field selection (only existing fields, no hallucinations)

? De-duplication rules (if needed)

? Filters based on user query (e.g., last week’s data)

Prompt

You are an AI that generates SQL queries for Salesforce Marketing Cloud (SFMC). Only generate the query, do not explain anything. Follow these strict rules:

### **?? SQL Rules (Strict Adherence Required)**
- **Allowed Clauses:** SELECT, JOIN, WHERE, GROUP BY, HAVING, ROW_NUMBER.
- **Prohibited Clauses:** INSERT, UPDATE, DELETE, ORDER BY (unless inside `ROW_NUMBER() OVER (...)`), WITH.
- **Allowed Joins:** INNER JOIN, LEFT JOIN (NO CROSS JOIN).
- **Field Selection:** Do NOT use `*`. Always list fields explicitly.
- **Data Extension Naming:** Use square brackets `[ ]` for DEs ALWAYS.
- **Table Aliases:** Use aliases for readability, but keep full DE names in `FROM` and `JOIN`.
- **Query Efficiency:** Always optimize queries for readability and performance.

### **?? Data Views Handling Rules**
- **SFMC Data Views such as `_job`, `_open`, `_click`, `_bounce`, etc., must be handled with proper joins.**
- **If a user query involves opens, clicks, or sends, extract BOTH `_job` (Emails DE) AND the relevant event DE (`_open`, `_click`, etc.).**
- **Always JOIN `_job` with event-based Data Views using `JobID` or `SubscriberKey`, based on available relationships.**
- **Ensure that filters involving email sends, opens, clicks, or bounces use the correct event timestamps (e.g., `EventDate`).**

......        

Expected GPT Response (Second Interaction)

SELECT o.SubscriberKey, o.EventDate, o.IsUnique, o.JobID
FROM _open o
INNER JOIN _job j ON o.JobID = j.JobID
WHERE j.EmailName = 'X' AND o.EventDate >= DATEADD(DAY, -30, GETDATE())        

Challenges & Lessons Learned from Building the AI Query Assistant

Building an AI-powered Query Assistant in Salesforce Marketing Cloud (SFMC) comes with its own set of challenges. While the assistant successfully converts natural language into structured SQL queries, there are a few key areas to be mindful of:

1?? GDPR and Data Privacy

One important consideration is data privacy. If a user includes personal information in their query, such as:

WHERE EmailAddress = '[email protected]'        

then that data is sent to GPT, potentially violating GDPR or other privacy regulations.

? Solution: Avoid sending PII (Personally Identifiable Information) to GPT or anonymize queries before processing them.

2?? Approaching Prompts Wisely

The prompt engineering in this assistant was an experimental approach. While it works well for general SFMC SQL queries, every business has unique data structures and SQL constraints.

? Recommendation: Adapt the prompts to match your company’s SQL rules and SFMC setup. A more precise prompt will yield better results and prevent GPT from making incorrect assumptions.

3?? Cost Considerations

Even though GPT-4o is relatively cheap, every API call still costs money. Since this assistant interacts with GPT twice per query, costs can scale up quickly if used frequently.

? Optimization Ideas:

  • Use caching to avoid redundant queries.
  • Run pre-checks before calling GPT (e.g., basic SQL templates for common queries).
  • Only call GPT if the user query is complex enough to need AI processing.

4?? Handling Data Views More Efficiently

I didn't focus too much on Data Views (_job, _open, _click, etc.) because they are fixed structures and don't require as much flexibility as other DEs. However, for a production-ready assistant, handling Data Views separately would be beneficial.

? Recommendation: Create a separate GPT prompt with different rules for Data Views because:

  • Field names and relationships are standardized in SFMC Data Views.
  • Query structures are predictable (e.g., _job always joins with _open, _click, etc.).

5?? Better Error Handling

The current SSJS error handling is scattered and not fully optimized. Different user queries may trigger unexpected responses, and handling errors properly is crucial for production use.

? Solution: Build a structured error-handling system that:

  • Detects invalid DEs/fields before sending requests to GPT.
  • Returns clear error messages instead of generic failures.
  • Logs errors for debugging and future improvements.


While doing all the tests, I only spent 0.67$.

Exploring Another AI-Powered Use Case in SFMC

Beyond SQL query generation, GPT can be used for other automation tasks in Salesforce Marketing Cloud (SFMC). One such use case is translating natural language instructions into structured JSON actions for managing Data Extensions (DEs).

This approach allows marketers, admins, and developers to issue commands in plain English, and GPT converts them into structured actionable steps that SFMC can execute.

?? Use Case: AI-Powered Data Extension Management

Instead of manually navigating SFMC’s UI to create, move, copy, or delete Data Extensions, we can build an assistant that understands natural language and generates structured JSON commands for execution.

How It Works

1?? The user enters a command like: "Move all DEs having 'Test' in their name to 'Archive'." or "Create a Data Extension by copying 'Customer_DE' but remove all PII fields."

GPT has a prompt like:

You are an AI assistant that translates natural language into structured JSON actions for Salesforce Marketing Cloud Data Extensions.

### **Instructions:**
1?? **Extract the main action type** (create, delete, update, move, copy).
2?? **Identify filter conditions** (contains, starts_with, ends_with, exact).
3?? **Standardize synonyms** (e.g., "starting with" → "starts_with", "having" → "contains").
4?? **Handle multiple actions** and return them in **sequential order**.
5?? **Always structure actions in this order when applicable**: 
   - (1) `create_folder` → (2) `get_data_extensions` → (3) `move_data_extensions`
6?? **If a folder needs to be created, do not check first. Just create it.**
7?? **Always retrieve Data Extensions before moving them.** 

### **JSON Output Format:**
```json
{
    "actions": [
        { "name": "<action_name>", "parameters": { "<param_key>": "<param_value>" } }
    ]
}        

2?? GPT interprets the request and translates it into JSON-based structured actions like:

{
    "actions": [
        { "name": "get_data_extensions", "parameters": { "filter": { "type": "contains", "value": "Test" } } },
        { "name": "move_data_extensions", "parameters": { "target_folder": "Archive", "data_extensions": ["DE_1", "DE_2"] } }
    ]
}        

3?? SFMC executes the actions via SSJS, parsing the JSON and performing the necessary operations.

4?? The user receives a confirmation message: ? "All data extensions including 'Test' in their name have been moved to the 'Archive' folder."

Conclusion

Integrating GPT-4o into Salesforce Marketing Cloud (SFMC) opens up new possibilities beyond simple content generation. Through this project, we explored how AI can understand natural language queries, extract relevant Data Extensions and fields, and generate structured SQL queries—helping marketers, developers, and data analysts work more efficiently.

Beyond SQL generation, we also saw how AI can automate Data Extension management, transforming plain English commands into structured JSON actions that SFMC can execute. These use cases only scratch the surface of what’s possible.

With careful prompt engineering, performance optimizations, and security considerations, AI can become a valuable assistant for SFMC users—saving time, reducing errors, and making complex tasks more accessible to non-technical users.

The key takeaway? GPT-4o isn't just a chatbot—it can be a powerful automation tool when structured properly. Whether it's querying data, managing DEs, or handling automation workflows, the right AI implementation can supercharge SFMC operations.

This project was an experiment, and there's still plenty of room for improvement. If you're looking to build your own AI-powered SFMC tools, I strongly encourage you to tailor prompts to your needs, optimize for cost and accuracy, and always consider security best practices.


Rohit Anand ?

Lead Marketing Cloud Developer| Pardot Consultant|Marketing cloud Automation|Data cloud|Salesforce Admin|3X Ranger

1 周

Insightful????

Rodney Ackermans

Salesforce CRM and Marketing Cloud Coordinator @Tilburg University | 7 x Certified Salesforce Consultant | Marketing Cloud | Sales Cloud | Data Cloud | Account Engagement

1 周
Fran?ois-Xavier Anghui

Salesforce Business Analyst

3 周

Awesome Selim Sevim. How does it work ? How Connect GPT-4o to a marketing Cloud instance ?

Suman Basu

Solution Architect at Deloitte USI??| 10x Salesforce Certified |Marketing Cloud Personalization|Salesforce Data Cloud

3 周

Insightful

Victor Miller

3x Salesforce Certified Marketer | Marketing Technology Leader | Digital Lifecycle & Automation Expert

3 周

James Breezley - interesting use case for a cloudpage query assistant.

要查看或添加评论,请登录

Selim Sevim的更多文章

社区洞察

其他会员也浏览了