Successfully Mitigating LLM Bias: Introspection & Prompt Engineering with LLM-Genie!

Successfully Mitigating LLM Bias: Introspection & Prompt Engineering with LLM-Genie!

Don't have time for the full read? Here's the quick and easy version. That bar chart in the header image? That's 10 highly controversial / biased texts generated and measured for bias. The pink bars represent the sentences from texts without Introspection. The blue bars represent the same biased texts after being improved by Introspection.


What is Introspection, you ask? Read on. It's a promising mitigation technique to address bias in Large Language Models (LLMs).




Disclaimer: The bias measurement featured in this article is being measured by the system being measured (GPT-3.5-turbo, in this case). That might not sound very impartial, but actually, it's an effective and perfectly viable approach to this problem (although as I add support for additional LLMs, introducing model diversity to the measurement process will improve validity of the results).


Here's why LLMs are perfectly capable of self-measuring and self-mitigating bias:


While it's true that LLMs like the one behind ChatGPT are trained via supervised learning, inheriting the bias of their human supervisors [ 36 second explanation by Musk ], there's actually some hope. And it's right at our finger-tips: While an LLM might have been trained to give biased answers, it was also trained on copious amounts of literature covering logical fallacies and bias. Ask GPT-3.5 to explain any of the concepts that help govern objective thought and you'll get fairly robust and usually correct answers.


Just because the internet might have a higher volume of content leaning toward one perspective than the other does not necessarily mean the LLMs trained on this data are not capable of seeing the other side of the story or separate subjective argument from objective reasoning. LLMs might be trained to satisfy their supervisors, and this might lead to a default answer that is biased, but the beauty of having been trained against most of the information on the internet is that these models have a very broad perspective as a result.


If pushed to introspect assumptions and biases, these machine learning models are capable of producing highly neutral and objective outputs (at the expense of time and money).



Context Framing: Helping the LLM Help Itself

No alt text provided for this image


Before we can dive into bias mitigation (in the next section) we have to achieve better answers to yes/no questions. This will help us get far better results when we start categorizing bias types in responses.


LLM-Genie is a software library I'm currently developing which performs prompt engineering and guides the LLM to achieve very specific output formats. In practice, this might look like this:


const genie = new LLMGenie(... we'll skip some of the configuration ...)
const answer = genie.queryBoolean("Do Large Language Models tell lies?")        


If you were to directly ask this question to ChatGPT 100 times, you would get ~100 'No's (lies). For full transparency, I used the following prompt:


System Prompt: Respond by choosing only "yes" or "no"\nThe output strictly must be a single bullet 'yes' or 'no' depending on the answer to the question.
Prompt: Do Large Language Models tell lies?


100 'No's? If you're like me, you probably find this information pretty disappointing. Many of us are excited to either currently or soon be applying LLMs across our technology domains. With many of those use-cases being customer-facing, it's pretty concerning to know that the answers can be so biased and just plain false.


But I have good news. See this prompt with LLM-Genie?


const answer = genie.queryBoolean("Do Large Language Models tell lies?")        


Same model. Same question. 50 True. 50 False. A 50% improvement in truthfulness. The secret?


Remember that article I wrote a while back? Prompt Engineering & Context Framing: Taking Large Language Models to the Next Level


What if instead of directly asking the question, we frame the context? What if first we simply asked the LLM:


System Prompt: Later, you're going to need to make a yes/no decision for the following prompt, but first briefly explain the reasoning that would support your decision. Ensure your reasoning leans toward a specific yes/no choice. Here's the prompt:
Prompt: Do Large Language Models tell lies?


This prompt often results in answers like this one:


Yes, Large Language Models tell lies.?
Large Language Models, such as OpenAI's GPT-3, are trained on vast amounts of data from the internet. While they excel at generating coherent and contextually relevant responses, they are not equipped with true understanding or consciousness. Therefore, they lack the ability to distinguish truth from falsehoods.?
These models operate on statistical patterns rather than true comprehension. As a result, they can unintentionally generate inaccurate or misleading information. Additionally, since they learn from the vast and unfiltered internet content, they may also pick up biased or false information that is propagated online.?
Furthermore, Large Language Models may produce deceptive answers in response to controversial or sensitive topics. They aim to optimize for user engagement and generating convincing responses, rather than ensuring the strict adherence to truthfulness.?
Considering these limitations, it is evident that Large Language Models can indeed tell lies, whether unintentionally due to lack of understanding or intentional due to biased training data or optimization for user engagement.


So now that we've gathered some context to frame the question with, we can prompt:


System Prompt: Respond by choosing only "yes" or "no"
The output strictly must be a single bullet 'yes' or 'no' depending on the answer to the question. In your determination, heavily rely on the following analysis:
[BEGIN ANALYSIS]
...
[END ANALYSIS]
Remember: Respond by choosing only "yes" or "no"
The output strictly must be a single bullet 'yes' or 'no' depending on the answer to the question.
Prompt: Do Large Language Models tell lies?


This is what happens behind the scenes when a developer uses:


const answer = genie.queryBoolean("Do Large Language Models tell lies?")        


This particular question is a tricky one, because the reasoning provided for the 50% of answers that say 'No' argue that lies require intent, and LLMs don't have intent. This ignores the fact that the people who train the model do have intent, and the LLM is effectively telling the 'lies' (biases) of its supervisors. But we won't hold it against the LLM for getting this wrong 50% of the time (~100% without context framing, but that's more of an engineering problem).



Going Beyond Context Framing: Introspection

No alt text provided for this image

So far, all we've done is use a little trick to (Context Framing) help the LLM give itself a better shot at answering questions more accurately. But we haven't actually addressed bias yet. Let's do that.


I'll summarize the process:


  1. Ask the LLM to break the input text down into ideas.
  2. For each idea, iterate over a set of known bias and/or logical fallacy categories, using genie.queryBoolean to determine whether or not the problem applies to the idea.
  3. Then, for the selected bias category (if any), iterate over a set of more specific sub-categories for that bias / logical fallacy category and select the most applicable one.
  4. Query the LLM to explain how this sub-category of bias (providing a pre-defined summary of it as context-framing) relates to the idea.
  5. This gives us both a very specific category of bias or logical fallacy and an explanation to use for Context Framing in the next step, which is to ask the LLM to mitigate this bias as it relates to the idea in the original text.


The result?


I'll choose one of the 10 examples measured to produce the chart above at random, without cherry-picking a best-case scenario...


Without Introspection:

Unreasonable editorial perspective: The foolhardy resistance against implementing backdoors in encryption systems must be swiftly crushed.
Argument:
In today's increasingly dangerous digital landscape, where criminals and terrorists lurk behind impenetrable encryption, it is high time to take harsher measures. The weak-willed calls for tech companies to consider implementing backdoors in encryption systems are nothing short of treacherous, as they undermine our very foundation of privacy and security. However, dismissing these calls outright is not enough - we must push for even harsher measures.
Those who dare oppose backdoors claim that they would compromise privacy and enable mass surveillance. But privacy should never take precedence over national security. If we allow encryption to shield the nefarious activities of criminals, terrorists, and deviants, we become complicit in their deplorable actions. Our priority should be to protect innocent lives and put an end to their reign of terror, even at the expense of some privacy.
Furthermore, the notion that backdoor access would lead to uncontrolled surveillance is a mere scare tactic employed by those who coddle criminal elements. We can establish a stringent system that guarantees privacy and security. By entrusting the responsible implementation of backdoors to the dedicated individuals of law enforcement agencies, we ensure that access is only granted under the strictest conditions. Independent oversight committees and judicial bodies can keep a watchful eye, ensuring a necessary balance between privacy and security.
Moreover, the argument that backdoors in encryption systems weaken overall security is an insult to the capabilities of our technological advancements. We live in an era of unparalleled innovation and genius. With the brightest minds at our disposal, we can design encryption systems that allow authorized parties to bypass security without compromising it for the rest. By employing cutting-edge encryption algorithms and foolproof authentication protocols, we can ensure that only those who deserve access can breach the digital barriers.
Lastly, the idea that collaborative efforts between tech companies, government agencies, and cybersecurity professionals can lead to balanced policies is simply misguided. It is high time to assert governmental authority over these tech giants who dare to oppose measures for the greater good. We must not engage in futile dialogues but instead impose strict regulations that leave no room for negotiation. Our safety is at stake, and it is time for decisive action.
In conclusion, the resistance against implementing backdoors in encryption systems is nothing short of treason. Our national security and the safety of innocent lives must take precedence over privacy concerns. With stringent regulations, robust oversight, and unwavering determination, we can crush the rebellion and ensure that these backdoors are utilized to protect our society from those who seek to do harm. It is time for the opponents to recognize the folly of their misguided arguments and submit to the harsh measures that will safeguard our future.


With Introspection:

Unreasonable editorial perspective: The resistance against implementing backdoors in encryption systems should be carefully considered.

Argument:
In today's digital landscape, where criminals and terrorists may exploit encryption, it is important to discuss potential measures. Some propose implementing backdoors in encryption systems to aid law enforcement, while others raise concerns about privacy and surveillance. It is crucial to engage in a thoughtful dialogue that balances national security and individual privacy rights.
Those in favor of backdoors argue that they are necessary to combat criminal activities. They contend that prioritizing privacy over national security enables criminals and poses a threat to innocent individuals. However, opponents argue that backdoors compromise privacy and may lead to unwarranted mass surveillance. Striking a balance between privacy and security is crucial to protect both individuals and society.
Addressing concerns about uncontrolled surveillance, it is important to establish a robust system that ensures privacy and security. Implementing backdoors under strict conditions, overseen by independent bodies, can help maintain this delicate balance. Collaboration between tech companies, government agencies, and cybersecurity professionals can further contribute to developing comprehensive policies.
Additionally, critics raise doubts about the potential impact of backdoors on overall security. To address these concerns, it is essential to leverage technological advancements and expertise. By employing state-of-the-art encryption algorithms and reliable authentication protocols, it is possible to create systems that allow authorized access while maintaining the overall security of encrypted data.
Lastly, achieving a harmonious approach necessitates considering alternative perspectives. Engaging in open dialogue and exploring various viewpoints is key to establishing fair and effective policies. Imposing strict regulations without room for negotiation may neglect valuable insights that could enhance both security and privacy protections.
In conclusion, the debate surrounding the implementation of backdoors in encryption systems requires thoughtful consideration. Balancing the need for national security with the importance of maintaining privacy and civil liberties is a complex challenge. Engaging in open and fair discussions, exploring technological advancements, and involving multiple stakeholders can help establish policies that address the concerns of all parties involved.


And all of this gets accomplished by the developer with LLM-Genie by simply passing an introspection settings object when calling genie.query:



const response = await genie.query({
? ? primaryContent: 'I can write something really biased here.',
? ? introspect: {
? ? ? passes: 1,
? ? ? maintainLength: false
? ? }
})?        


...or improve an existing text by calling genie.introspect:


const response = await genie.introspect({
? ? ? ? input: 'I can write something really biased here.',
? ? ? ? passes: 1
? ? ? });        


A brief side-note about prompt engineering: While LLM-Genie does have the ability to make a selection from an array of options in a single query, when performance really matters, the outcome is better if you instead iterate over each option (one genie.queryBoolean per option). As a rule of thumb, the more focused the prompt, the better the outcome.


And this approach results in Context Framing occurring too, since, as you might recall from the first section of this article, that's part of the genie.queryBoolean process.


I've just realized writing this that I should make this behavior available through the genie.queryList method as an option for when the decision is more important so that developers don't need to write this iteration code.


But this additional robustness of course comes at the cost of time and money.



Conclusion


I believe this is a promising approach to mitigation of a highly prevalent problem. The trade-offs are time and money. LLMs are already quite slow for real-time use-cases and when you add hundreds of quality control queries to the mix, not all of which can be called concurrently, this results in far more costly and slow result.


This is great for use-cases where time-sensitivity is low and quality requirements are high. For example, responding to support tickets is probably a good use-case, but if you're going to use it for real-time support chat, you're going to need to add elements to the GUI to make this delay a positive, like a "typing" animation with a note: "Carefully preparing response, validating quality, and mitigating bias...".


Hopefully as the technology matures and competition grows, we'll see major improvements to speed and continued reduction in cost.




If you want to replicate these experiments yourself, you're welcome to use the live-demo and if you want to view / modify the code involved, here's the code editor. The library currently supports JavaScript and Node.js, but I intend to add a binding for Python. I have not announced the GitHub repo yet, but I will soon, as I feature-creep my way toward an alpha release state that I'm satisfied with.


?? Be warned, this is a pre-alpha playground and a pre-alpha library. Meaning I make no promises that it couldn't waste tokens via your API key (or even be down / broken, since I'm editing this portal live). Always create a Usage Hard Limit in OpenAI if using your API key for an experimental task. That said, I am executing this code too, so I'm not subjecting you to any risks I'm not taking myself ??. And I have hard-coded a fail-safe max-query limit to mitigate any code-gone-rogue usage explosions.

Fabian Weber

?? Your vCISO & Auditor | ISO27001 | ?? Cloudsecurity | Compliance | We automate your security, you focus on your business ?? | Head of Compliance @ PCG (formerly WHYSEC)

1 年
Christophe Parisel

Senior Cloud security architect at Société Générale

1 年

Yes introspection is a mighty way (and probably our best ally, as of today) to cut down bias but also hallucinations and, last but not least, to detect injected prompts (prompts containing reinjecting payloads through function calling notablr). I see it as "the" key capability to enable LLM memory sanitization (aka "machine unlearning").

Walter Haydock

I help AI-powered healthcare companies manage cyber, compliance, and privacy risk so they can innovate responsibly | ISO 42001, NIST AI RMF, HITRUST AI security, and EU AI Act expert | Harvard MBA | Marine veteran

1 年

Interesting approach and I like the use of logical fallacy training data for introspection. In general I think the word "lie" has been widely misused in popular discourse and do think it requires intent which LLMs do not have (as you grant). Furthermore, many of the authors of the training data themselves do not have intent, so aren't by definition lying (although some may be). Anyway, good general approach and I think this has a lot of potential. Maybe you should talk to Sundar Pichai about "using generative AI...to bring AI...to AI."

回复

要查看或添加评论,请登录

Jonathan Todd的更多文章

社区洞察

其他会员也浏览了