How Organizations Establish ChatGPT Safe Zone
Abstract
We explored the possibility of establishing a "safe zone" where ChatGPT could operate within the desired domain. We also considered the challenge of detecting sensitive entities without limiting the scope of ChatGPT's operations. To address this, we proposed the use of Few-Shot learning to translate natural language into domain language defined by Formal Semantics in Modern Type Theories. After this translation, ChatGPT could be effectively governed by safety rules and constraints encoded by Formal Semantics rules, taking into account both type-theoretic and proof-theoretic aspects.
Background
ChatGPT is not aware of your safe zone (domain knowledge) and may make assertions with confidence despite lacking knowledge. However, is it possible to establish a safe zone and confine ChatGPT within it, using a mechanism? It would be very useful for organizations and businesses.
As a toy retailer developing a ChatGPT-powered bot, you have associated listings related to "fake gun," and you expect customers to inquire about them. However, ChatGPT places heavy moderation on terms related to weapons, so you must whitelist "fake gun" to use your organization's copyrighted messaging for introducing the product.
And that is just one example. As a toy retailer with a wide range of products, you have to serve a diverse range of interests. You may have concepts such as "Tall Girl," "Tall Barbie," and "Little Barbie," which are all acceptable. However, terms such as "Sexy Girl" are not appropriate for obvious reasons, and "Nitrogen Air" is also not relevant since it's unlikely a product sold in your toy shop.
Your initial goal is to identify "sensitive concepts" and delegate control of the conversation to your trusted software. However, you do not wish to impose significant limitations on ChatGPT's capabilities and desire to leave a broad scope of operation for the model. As a result, you may be faced with the challenge of detecting tens of thousands of potential safe entities (such as Adjective+Noun) in real time.
Traditional Few-shot Learning is Hard
As aforementioned, the challenge is the unbounded scope.
The range of possible inquiries is not restricted to the number of products listed in your store. Even if you have only ten products, customers can still ask questions about an Adjective + Noun concept outside of your current offerings. The diversity of entities is not solely dependent on the number of nouns, but rather on the various ways in which they can be modified. While there are numerous adjectives available, not all of them are appropriate for modifying a specific noun, but nevertheless, the combinatorial choices grow.
One issue that arises with the growing choices is, classic few-shot learning or fine-tuning to predict the class label becomes no longer effective. As the number of classes increases, the effort required to prepare sufficient training examples to cover every class grows proportionally. Furthermore, when you do not have a predefined number of classes, it can be challenging to determine which classes to invest in preparing examples for, especially when new classes may arise naturally but have not yet have listings covered. Lastly, you may end up spending time determining whether to provide labeling support for a semantically invalid class.
Translating Customer Inquiries into MTT
Large Language Models are highly skilled in performing translation tasks, allowing them to effectively bridge the gap between human language and a domain-specific language. By leveraging the rules and conventions of the domain language, it is possible to check the Safe-Zone-Ness without having to directly analyze the original human natural language.
One such domain is Formal Semantics in Modern Type Theories, which is often referred to as MTT. In this domain, the game rules consist of type-theoretic and proof-theoretic rule systems.
The next part is an interactive session. You can go to Coq Online IDE for verifying the code. At the beginning, with little helper set forth:
Require Import String.
Definition CN:=Set.
Definition t:= Prop.
Inductive state :=?
| status_state(_:status) | string_state(_:string) | sentence_state(_: t)
with status :=
? start | end'.
Definition state1(e: status): state. constructor 1. auto. Defined. Coercion state1: status>->state.
Definition state3(e: string): state. constructor 2. auto. Defined. Coercion state3: string>->state.
Notation "| x |" := (sentence_state x) (at level 50, left associativity).
Open Scope string.
Require Import Specif.?
Now we can define our tiny domain language, which contains two concepts: entity and character.
Inductive entity: CN:=
| gun
.
Inductive character: CN:=
| barbie
.
And we design an adjective category of vocabulary, which host the following terms, tall, short, fake:
Parameter tall little: character->t.
Parameter fake : entity->t.
Please note that the term "fake" used here is a simplified definition that may not be powerful enough to express the entailment "fake gun is not a gun." For a more precise definition of "fake," you can refer to the book "Formal Semantics with Modern Type Theories" which contains the standard definition.
The subsequent configuration is noteworthy: we intend for the "character" to be a subclass of the "entity." In Modern Type Theories, we employ coercive subtyping rather than the typical "subsumptive subtyping." This is done to enhance the efficiency of subtyping with regards to symbolic representation, among other factors.
Axiom cho: character->entity. Coercion cho: character>->entity.
Although the following dialog may seem like a mere exercise, resembling a toy, it provides a demonstration of how we can employ Adjective Noun domain patterns in real business rule definitions. We will define the dialog, which starts with an initial state and advances with any legitimate domain language representing "tall *". However, it is worth noting that not all "tall *" follow-up logics are defined. As the rule specifies, only "tall barbie" marks the conclusion of the dialog.
Inductive dialog: state->Prop:=
| d_1(t': exists e, dialog (|tall e|)):? dialog start
| d_2(_: dialog end'): dialog (|tall barbie|)
| d_3: dialog end'
.
Despite the simplicity and incompleteness of the aforementioned dialog, it can still serve as a testament to the reasoning process that ChatGPT can leverage for precise identification of domain concepts in user prompts.
The reasoning process always starts with a goal. Our goal is defined as
Goal (dialog start).
We will now commence with the reasoner. When specifying the goal, our proof state appears as follows:
1 subgoal (ID 7)
??
? ============================
? dialog start
In the preceding proof state, the statement below the horizontal line denotes the "conclusion," while the statement above the line (currently none) constitutes the "premises." Adhering to standard reasoning conventions, users can comprehend the reasoning process by examining this proof state.
Now, we issue a proof tactic beneath the goal statement, like below:
Goal (dialog start).
? constructor.
The proof state evolves into below:
1 subgoal (ID 9)
??
? ============================
? exists e : character, dialog (| tall e |)
Thus far, readers can witness the interactive nature of Coq: by issuing tactics commands, users can instruct the reasoner to navigate through proof states until the conclusion is explicable by the premises. (We have not yet reached that point, as there are currently no premises above the line to justify the conclusion. And we do not plan to reach the end in this post. )
We have now made significant progress and are finally able to articulate why ChatGPT's translation capability can be immensely beneficial for domain language conversion (and verification), thereby offering organizations an effective means to express trusted business domains under LLM-powered use cases.
The Key Proof Tactic that Restrains ChatGPT into Safe Zone
Continuing our process, we will elucidate the crucial tactic "cut," which is indispensable in establishing the regulation of ChatGPT in the small domain language defined earlier.
Suppose that ChatGPT generates "cut (dialog (|tall barbie|))" for some input prompt. We will explain how this can be achieved later on, but for now, let us assume that we have conducted adequate prompt engineering to enable ChatGPT to produce a tactic named "cut (dialog (|tall barbie|))."
The goal progress looks like below:
领英推荐
Goal (dialog start).
? constructor.
? cut (dialog (|tall barbie|)).
And magically, the proof state becomes below:
2 subgoals (ID 10)
??
? ============================
? dialog (| tall barbie |) -> exists e : character, dialog (| tall e |)
subgoal 2 (ID 11) is:
?dialog (| tall barbie |)
The "cut" tactic splits the proof into two parts: the first part aims to establish the entailment between "dialog (| tall barbie |)" and the existential quantified proposition, while the second part focuses on "dialog (| tall barbie |)" itself.
The first part essentially serves as a specification that governs the undercut expressions within its entailment domain, while the second part is the advancement of the dialog to the next state.
Once we establish the first goal (by the following three tactics intro.?exists (barbie). auto.?), the specification is deemed to have been met, and we can then move on to the next dialog state. This is how the specification aids in regulating the ChatGPT dialog state.
Goal progress:
Goal (dialog start).
? constructor.
? cut (dialog (|tall barbie|)). intro.
? exists (barbie). auto.?
Proof state (only one subgoal remaining):
1 subgoal (ID 11)
??
? ============================
? dialog (| tall barbie |)
How ChatGPT are Regulated by Type Checking
Nonetheless, the above process is not the main point of the story. The real highlight is that ChatGPT can be trained to predict the "cut" tactic and then undergo type-checking to gather feedback.
Let's now switch to the GPT-3 playground and attempt to perform the following task. (We will use GPT-3, even though ChatGPT is also an option; however, the heavy moderation messages in ChatGPT can be disruptive.)
We will use few-shot learning to train GPT to convert a user's message into a "cut" tactic and then use the type-checker to identify any errors. The accepted patterns must comply with the domain language constraints that we have established.
Just put the below italic text into GPT3 playground and hit submit. The screenshot is below:
Given "Hi, i wnat a tall barkie girl. do you have that toy? ", Predict: cut (dialog (| tall barbie |)). Given "Hi, I want a fake gun. Do you have that thing? ", Predict:
People may already find out that, "cut (dialog (|fake gun|))" is a new pattern that GPT3 learned to generate. The original input is an user natural language "Hi, I want a fake gun. Do you have that thing?" And that capability is achieved by one shot example "Given Hi, I wnat a tall barkie girl. do you have that toy? ", Predict: cut(dialog (| tall barbie |))". Note the intentional degraded text quality in the one-shot learning example.
And, at this step, GPT3 (or ChatGPT) will be regulated as well. Because the domain language is constrained by typings, and there are many potential invalid patterns that are ruled out by the typing. The following examples, both valid and invalid, are shown:
tall gun -> Invalid
fake barbie -> Valid.
little barbie -> Valid.
gun -> Invalid
And the typing is entirely user-controlled. For instance, if the coercive subtyping between the character and entity is removed, "fake barbie" will no longer be a valid pattern. As a result, each domain can establish its own specific subtyping relationships that will impose distinct constraints on the acceptable valid patterns.
For more information on how do we employ the rich typing in the Modern Type Theories to regulate domain semantic restraints, please checkout our medium post on Language Engineering Toolbox
Discussions
In order to summarize, we come up a simplified high level chart as below:
How Organizations Establish ChatGPT Safe Zone?
Our demonstration indicated that if we define fundamental linguistic constructs, like nouns and adjectives, to reflect their semantic relationships, then we can utilize Coq Proof Assistant to assist in controlling ChatGPT along predefined conversational paths.
I found it noteworthy that domain-specific languages don't have to share words with natural language inputs. Instead, ChatGPT (or similar few-shot learning systems) can consistently produce a pattern that reflects the relevant language constructs, including the "cut" proof tactics command. This approach can be utilized for various tasks, including multilingual applications, to ensure that input raw signals fit within the pre-defined domain and convey valid semantic meanings.
It's also important to mention that the "Adjective-Noun" pattern studied is the most basic one in the framework of Modern Type Theories. The full set of basic patterns, are below three points, among others (as I personally instructed by the author Prof. Zhaohui Luo at University of London who founded Formal Semantics in MTT):
Collectively, these patterns can encompass a significant portion of the English language fragment used in formal semantics. As semantics represent deep structures at the meaning level, this approach is generalizable to multilingual settings.
In conclusion, we should offer some practical advice.
If your organization needs to whitelist domain concepts, you can use the toy store example provided earlier as a guide. By aligning the nouns and adjectives with existing examples, and by adopting the coercive subtyping approach, you can construct a semantic hierarchy among different simple nouns. You probably need to apply the four categories of Adjective Nouns in our previous medium post for the nuanced relationships between Adjective and Nouns.
To simplify the repetitive few-shot learning process and template management, you may require prompt engineering tools like LangChain.
To incorporate Coq into your development or production environment, you can utilize the Coq docker container available at https://hub.docker.com/r/coqorg/coq/ for prototyping purposes. However, a scalable solution is still being studied since the usage of Coq relies on long conversation sessions, rather than a RESTful approach, which significantly limited the options for concurrent serving.
Future Work
Moving forward, I plan to continue developing the open-source project to create a complete end-to-end system, and I am also interested in collaborating with organizations or consulting services to explore the safe integration of ChatGPT using Semantic Engineering technologies.
One notable finding I came across is that using LLM significantly decreases the cost of traditional formalization. Previously, formalizing natural language was a challenging task, but with the help of ChatGPT (and GPT-3), I intend to create a repository of "few-shot" learning instances. This will allow a substantial English portion to be converted into MTT-style formalized representation, benefiting the community by providing a safe zone for ChatGPT specification.
Founder & CEO, Group 8 Security Solutions Inc. DBA Machine Learning Intelligence
8 个月I appreciate your post!
Data Engineer Leader @ Caltrans | Data Engineering / AI
1 年