When Generative AI Falls Short: Understanding the Reality Beyond the Hype

When Generative AI Falls Short: Understanding the Reality Beyond the Hype

Is your compliance strategy outdated?

Imagine you're in a vast library with towering shelves that stretch endlessly, each stacked with countless documents. This library grows by the hour, with new rules and guidelines appearing like magic on the shelves. Now, it’s your job to find one specific rule in this ever-expanding maze — one that could be crucial for your organization's survival and success. Traditionally, this Herculean task is reserved for the experts: seasoned legal and domain professionals who comb through each document with a fine-tooth comb to capture every pertinent detail.

Researchers from the TUM School of Computation, Information, and Technology in Germany and the School of Information Technology and Electrical Engineering in Australia conducted a detailed comparative study to evaluate complex compliance requirements and their implementation strategies. The study explored four different methods:

  1. Expert analysis
  2. State-of-the-art natural language processing (NLP)
  3. Generative AI
  4. Crowdsourcing.

This blog delves into the strengths and limitations of these approaches, with a particular focus on the role of generative AI in this complex landscape.

This blog is for generative AI and artificial intelligence enthusiasts, automation experts, compliance officers, legal experts, and organizational leaders. It critically evaluates various methodologies—including expert analysis, natural language processing, generative AI, and crowdsourcing—to pinpoint where generative AI excels and where it encounters limitations within the compliance sector.

By integrating real-world case studies and empirical evidence, the blog seeks to uncover the nuanced reality of generative AI’s capabilities, emphasizing both its strengths and its shortcomings. Our goal is to provide a balanced perspective that clarifies the practical boundaries and potential of generative AI in navigating the complex landscape of regulatory compliance, thereby helping professionals make informed decisions about integrating AI tools into their compliance frameworks.

The study delved into expert analysis, NLP, GPT-4, and crowdsourcing, all with real-world applications.

  • Experts meticulously analyze regulatory documents and business processes to determine relevance, a method that is transparent and reproducible but also labor-intensive.
  • NLP models rank regulatory texts based on their relevance, employing two primary retrieval methods.
  • GPT-4 was used in a zero-shot approach to determine the relevance of regulatory texts based on business processes and sub-processes.
  • Crowdsourcing involved untrained crowd workers assessing the relevance of regulatory requirements via a two-phase task.

These methods, as you'll see, have practical implications for your compliance strategies.

Use Cases: Two case studies provided real-world scenarios for testing these methods:

  • Travel Insurance Claims Process An Australian insurance company requires compliance with multiple local regulatory documents to handle travel insurance claims.

Travel Insurance Claim Process

  • Know Your Customer (KYC) Process An international banking guideline was adapted from an SAP Signavio workflow to ensure compliance with customer onboarding.

Know Your Customer Process

In the study conducted by external researchers, each method was evaluated for its ability to consistently identify all necessary compliance requirements, ensuring nothing crucial was overlooked. The focus was particularly on achieving high success rates in identifying these requirements.

  • Human experts set the highest standard in the comparisons, catching every necessary item.
  • The AI tools used for reading and understanding regulatory documents varied in their initial success—some identified between 23% and 43% of the requirements on their first attempt, while others identified between 29% and 41%. These tools proved effective for an initial scan to narrow down potentially relevant documents.
  • GPT-4 showed excellent performance at the overall process level, perfectly identifying all required documents in the travel insurance scenario and nearly all in the banking customer setup. However, its effectiveness decreased when dealing with more detailed sub-processes and tasks.
  • Crowdsourcing, which involved many individuals reviewing documents, also initially performed well, correctly identifying 80% of the requirements for travel insurance and 60% for banking customer setup. Nonetheless, this method required careful quality control to ensure consistent results.


The study highlighted the need to combine methods for better results. For instance, NLP combined with expert analysis is ideal for high-impact, frequently changing processes with a high regulatory burden. GPT-4, combined with expert analysis, is best for rapidly pre-selecting relevant texts, while expert review removes false positives. Crowdsourcing combined with expert analysis can refine poorly defined processes and improve process documentation. Additionally, well-documented processes improve relevance identification, and generative AI like GPT-4 needs human oversight to ensure compliance. Customizing GPT-4 for specific business contexts can improve relevance judgments.

Despite its impressive ability to identify relevant documents, this study advises against using GPT-4 (Generative AI) as the sole tool for identifying regulatory requirements. Here are several reasons:

  1. Need for Checks: While GPT-4 can quickly pinpoint documents that seem relevant, it sometimes gets it wrong by flagging documents that aren't actually pertinent (known as 'false positives'). This means that without careful oversight by experts, these errors could lead to unnecessary work and potentially incorrect compliance decisions.
  2. Understanding the How and Why: GPT-4 operates like a 'black box', meaning it offers answers without showing how it arrived at them. For critical compliance tasks, companies need to see the workings behind decisions to ensure they're solid and can be defended if questioned.
  3. Detail Matters: GPT-4 is good at identifying the right documents on a broad level, but it struggles with the finer details within sub-processes and specific tasks. Accurate identification at every level is crucial to fully comply with regulations.
  4. Consistency is Key: The effectiveness of GPT-4 can vary widely depending on the task. For example, it performed well in identifying requirements for a travel insurance process but was less effective with banking customer onboarding (KYC). This inconsistency highlights the need for tailored setups and expert validation.
  5. Custom Fit Needed: To increase accuracy, GPT-4 needs to be specially adjusted or 'tuned' to the specific needs of a business, which can be a complex and detailed task. Even with customization, the nuanced understanding of human experts is irreplaceable.
  6. Expertise is Essential: GPT-4 may miss the subtle legal and specific industry details that are crucial for compliance. Human experts bring necessary knowledge and skills in interpreting complex regulations, which AI is currently unable to match.
  7. Keeping Up-to-Date: Regulations frequently change, requiring up-to-date knowledge. While human experts can quickly adjust to these changes, GPT-4 depends on its training data and might need regular updates to stay current.

Conclusion

While GPT-4 demonstrates remarkable potential in identifying regulatory requirements relevant to business processes, it shouldn't be relied upon solely. Expert analysis, your expertise, remains vital to ensure compliance. Combining generative AI with your human expertise, embedding-based ranking, or crowdsourcing provides a balanced approach to navigating the complex regulatory landscape. Despite its limitations, generative AI represents a significant step forward in reducing compliance burdens and improving process relevance identification.


Essentially, Generative AI isn't a fix-all solution for every problem that companies encounter today. It's important to use this technology thoughtfully, weighing its strengths against what human experts can bring to the table and the specific needs of the organization. Using GPT-4 or similar AI models can make processes more efficient and improve how companies handle compliance. However, the key to success is to integrate these tools thoughtfully into the wider set of procedures and strategies companies already have in place. Businesses should use AI to enhance the skills of their human teams, not replace them.

Source Research Paper: Sai, C., Sadiq, S., Han, L., Demartini, G., & Rinderle-Ma, S. (2024). Identification of Regulatory Requirements Relevant to Business Processes: A Comparative Study on Generative AI, Embedding-based Ranking, Crowd and Expert-driven Methods. Retrieved from arXiv:2401.02986

Johnathon Daigle

I Help Agencies Build AI Solutions with Data-Driven Product Strategies

1 个月

Navigating compliance is no joke; a hybrid approach sounds smart. What do you think?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了