Using Large Language Models for SFDC Account Enrichment
Image by Stable Diffusion

Using Large Language Models for SFDC Account Enrichment

SFDC Account Data and the Quality Problem

Data is king for an effective go-to-market strategy. However, managing vast amounts of data can be overwhelming as your data set continues to grow year over year. My team works primarily with Salesforce Account data and we are tasked with making the data as actionable and useful as possible. We all recognize that poor data quality can result in a range of issues such as lost revenue, reduced productivity, and decreased customer satisfaction. These data quality issues can be costly, affecting both the top and bottom lines of any organization. According to a 2022 study by Gartner, bad data costs organizations an average of $12.9M a year.

When it comes to account data in Salesforce, I think about it in two ways. The first is firmographic data, which includes "hard" data about a company such as its address, employee size, annual revenue, industry, and parent/child relationships. This data is necessary to carve territories for sales reps and establish a segmentation approach. Leveraging third-party data providers is essential to manage any account database at scale and drive the proper classification and segmentation of accounts. As a sales operations professional, trusting the data is important, but it's also important to acknowledge that the data may be imperfect. The challenge lies in determining which data is correct and which is wrong. A typical approach is to make the best decision with the available data and then crowdsource feedback from sales reps with proof supporting their position that the data is incorrect. Another common approach is to bring in low-cost resources and train them to manually scrub a subset of the account base to try and make the data cleaner for the most critical target accounts. These approaches can be time-consuming and limited in validating and updating data. In addition, if these processes are not managed in a well-thought-out process continuously, additional data degradation will occur.

The second type of data is the additional attributes about an account that help drive effective campaigns and targeted sales plays, which I think about more as the “soft” account data. Often these needs are more nuanced, where the goal is to be able to further segment accounts around a "Propensity To Buy" based on a specific business model or other market drivers. These “soft” account data types are typically not cut and dry and different people in the organization have varied needs for different attributes to suit their requirements. I have always viewed the requests for this type of data a bit fuzzy, since the value is largely tied to the specific stakeholder making the request. As a result, often it is difficult to clearly define the requirements and find the appropriate place to source the data from.?

Case Study: Using Large Language Models to Classify Companies

Recently, working with our business partners, we had a request come up to augment our account base with some “soft” data so we could take a more targeted approach in marketing. This scenario was around identifying the company's business model as being B2B, B2C, or in some situations companies like Google and Apple, supporting both. This information would be important to be able to more effectively market specific products to one audience or the other.

Naturally, the first step was to assess the data sources we have to determine if we could leverage any information from our existing dataset to provide insight. Since we didn't have anything in-house, the next logical step was to explore available data sources for purchase and assess if the data quality would be adequate for our needs. Data can be expensive, and when requirements are more nuanced, the value of the data decreases because in some cases, we are attempting to fit a square peg into a round hole.

To evaluate several data providers that could support our use cases, the business team compiled a list of a few hundred companies across B2B, B2C, and the B2B & B2C use cases and classified them based on a "human" baseline interpretation by reviewing their websites. We then sent the company list to the vendors for enrichment so we could analyze the result set against our baseline. It is important to note that this exercise was somewhat subjective in that the definition of B2B and B2C can be nuanced. As a result I wouldn't expect any approach to provide a 100% match, and we would be looking for an approach that could best meet our definitions and requirements, As is typical, the prices we received varied greatly, and at the end of the day, the incremental cost to obtain this data was not insignificant. Additionally, the approach we considered was a "one-time" enrichment, which was suitable for our existing account base but did not address new prospect accounts entering our database, which are arguably the most promising accounts to market to.

During the evaluation process, I experienced an epiphany. Like the rest of the tech industry, I have been amazed by the introduction of and rapid evolution within the AI space. Large Language Models are simply magical. As with everyone else assessing and absorbing this new technology, I have been exploring its potential to enhance my writing, summarize key points, and use it to solve problems or answer specific questions.

While thinking through this problem I was curious if this was something we could potentially solve leveraging Large Language Models (LLMs). I proceeded to test several different LLMs to see if through some simple prompts we may be able to get data that would be sufficient to satisfy our need. After some initial testing the results were promising so I considered moving forward with a Proof of Concept to test potentially operationalizing an enrichment framework leveraging LLMs.

To develop this framework, I recruited a talented developer to help brainstorm and build out the vision. Our requirements were straightforward, and we wanted to test a few different things:

  1. Leverage the LLMs to make requests to classify data as B2B, B2C, or Both based on user and system prompts and simply passing the URL of the website. Could we get results that were comparable to or better than those of our third-party data providers?
  2. Build a near-real-time enrichment engine that analyzes new accounts on a scheduled basis and updates a field on the SFDC account record with the new attribute in a Test Environment.

After three days we were able to accomplish both goals.? Making a half dozen adjustments to the prompt, we were able to do much better than one of the data providers and came reasonably close to another. Even without further optimizing the prompt the results we are seeing are acceptable to the business and we would have about a 90% cost savings.??

Testing Prompts at Scale

For this specific proof-of-concept (POC) use case, we utilized the data captured by the business team during the sampling process of approximately 300 companies. These companies were manually classified as B2B, B2C, or both. This data served as a crucial baseline against which we could compare the subsequent results from the LLM.

To efficiently test prompts at scale, we developed an Enrichment Engine that enables us to evaluate the impact of prompt changes on the data output of the LLMs more effectively. This Enrichment Engine simplifies the process of rapidly testing multiple prompts. Furthermore, it is designed to be agnostic to the specific LLM being used. With the established architecture, we now have the ability to test any number of prompts against any data set as we continue to do our testing and evaluations. Additionally, the Enrichment Engine offers the flexibility to test various LLMs.

The user has the option to select the desired LLM (we are currently testing multiple models) and upload text files containing multiple prompts. Additionally, the user can upload the corresponding data set to be applied to the prompt. The Enrichment Engine executes the selected prompt and generates the corresponding outputs, which are then stored in CSV files. This structured format facilitates easy analysis and utilization of the output data.

By comparing the results obtained from the Enrichment Engine to the classifications determined by humans during the sampling phase, we can evaluate the effectiveness and accuracy of the Enrichment Engine's outputs in relation to the established baseline. This comparison allows us to assess performance and make any necessary adjustments to achieve the desired level of alignment.

The Enrichment Engine is designed to be adaptable and has the potential to expand and accommodate additional formats in the future. This flexibility ensures compatibility with diverse data sources, addressing the specific requirements of users.

Leveraging Large Language Models for Enrichment and to Improve Data Quality

This single use case from our POC has opened up a flood of ideas around potentially leveraging LLMs for additional account enrichment as well as data validation to improve quality.

As an example, we're considering using this approach to validate the Child and Parent relationships for some of our key accounts. As mentioned earlier, the traditional method involves hiring and training low-cost resources to manually go through this process, which can be complex and error-prone. While relying on LLMs may not necessarily guarantee perfect results for this use case, it could still provide significant help in assessing a large amount of data quickly to speed the overall process along. This is just one of many potential use cases where this approach could be used to assess data quality at scale. Still a big open question in the corporate world around LLM’s revolves around security. We are still evaluating the overall strategy around how to safely and securely leverage LLM’s in these business contexts, but the possibilities are very intriguing and we will continue to evaluate how they can be safely and effectively leveraged.? Although we know that the data will never be perfect, I'm hopeful that leveraging LLMs and these new processes can make a significant difference in improving quality while also lowering costs.


Darren Ernest

GTM Strategy | Product & Performance Marketing | Leveraging Data Analytics & AI to Drive Growth, Efficiency & Innovation | ex-Salesforce, ex-Ogilvy, ex-Publicis

11 个月

Hi Glenn Vander Laan I've been experimenting with this recently. Wondering a year later where you've landed with this. Wondering if you would be willing to connect briefly to discuss. I've been experimenting lots of "soft" data and I've found chatgpt for example to be incredibly effective though you really have to eyeball the results because there are glaring errors from time to time, but the speed and cost tradeoff is worth it when you can get 90+ accuracy. In fact, I've added to my prompt a confidence rating on its answers so I can more easily pinpoint the ones I need to manually verify in addition to the eyeballing the whole data set. Another technique is to get it to provide rationale or explanation for its answer, and that helps with the eyeballing since I am not personally familiar with all the the accounts personally.

回复
Kevin Laughlin, PMP

Strategy & Business Operations | Enabling Sales Systems with GenAI

1 年

As someone who has purchased datasets and manually enriched data with a low cost team, I can appreciate testing the effectiveness of using LLMs to enrich the "hard" data to segment accounts more accurately. I can see the Enrichment Engine also finding patterns in the "soft" data that help personalize messaging to new accounts and improve the Propensity model with explainable business behaviors. Glenn Vander Laan, have you looked at training on a large sample of conversational data with won/loss outcomes to find the best customer journey experience for a given buyer persona?

回复
Glenn Vander Laan

Senior Director, Business Process Systems Automation at Klaviyo

1 年

Thats right Richard Coffman. In order to accurately assess the best approach, you need to evaluate what combination of LLM and Prompt drives the best result.

Richard Coffman

Director of Enterprise Sales @ Pro-Vigil | AI-driven Remote Crime Prevention

1 年

Good stuff Glenn Vander Laan. Dumb question, but comparing the system-generated results with the sampling data will need to be done on each use case/LLM pair in order to confirm the LLM that works best for that use case, correct?

Jackie Corcoran

Client Success Manager | Passionate about Customer Experience, Business Outcomes & Facilitating Meaningful Connections

1 年

So interesting Glenn Vander Laan! Thanks for sharing. You've got my wheels turning.

要查看或添加评论,请登录

Glenn Vander Laan的更多文章

社区洞察

其他会员也浏览了