Incremental Formalization Strategy for ChatGPT Governance
"A power shovel and wood" by sonica@2006 is licensed under CC BY 2.0.

Incremental Formalization Strategy for ChatGPT Governance

Abstract

This post introduces a method for gradually integrating new rules into an established rule system that governs ChatGPT, using a proof assistant. The approach leverages Modern Type Theories (MTT), which employ Formal Semantics to enable the representation of intricate semantic nuances and support practical reasoning. This strategy holds significant implications, as it ensures that the knowledge base used for ChatGPT governance purposes can be expanded and maintained in an incremental manner.

Background

ChatGPT may make assertions without domain knowledge, but it is possible to establish a safe zone to confine it. For example, a toy retailer could whitelist certain terms related to their products to ensure ChatGPT stays on-topic. However, identifying and delegating control over sensitive concepts in real time can be challenging due to the vast scope of potential inquiries. Traditional few-shot learning or fine-tuning is not effective when the number of classes or entities is unbounded. It can be difficult to determine which classes to invest in preparing examples for, especially when new classes may arise naturally.

More detailed discussion can be found in our previous post

In our prior article, we presented a strategy to confine ChatGPT without explicitly declaring topics. This involves ChatGPT converting conversations into domain language and validating topics using domain rules. This creates a new brokerage tier where ChatGPT and rule-based systems collaborate to restore our administrative control. With this tier, we can manage ChatGPT effectively and delegate tasks to downstream services while controlling what topics ChatGPT can safely operate through rule-based systems.

Current Issue

In practice, the question is how to create rule-based systems with sufficient expressive power to represent diverse inputs. To accomplish this, we need a practical method for converting a corpus of relevant domain information into rules.

Converting a corpus of a domain area into rules with sufficient expressive power to represent diverse inputs was a challenging task in the pre-LLM era. However, ChatGPT makes this task straightforward. By using a specific prompt, ChatGPT can effectively retrieve the main topic of any sentence. It is worth noting that the examples used can be easily replaced with different texts, and the results are still promising.

Prompt:

Summarize the following sentence into one phrase, containing only two words, one is an adjective and the other a noun ["Selected Medium Posts about ChatGPT"]

Answer:

Curated Article

The bottleneck lies on the rule-based system's side, as ChatGPT can only prepare the data up to the specified point by the prompt. It is necessary to gradually enable the rule-based system to internalize this input and construct rules in a progressive and cohesive manner.

As an example, let's say we start with the previous example and formalize the words "curated" and "article" into a dictionary. Then a new example may come in as "published article," which partially overlaps with our existing dictionary entry "article." The rule-based system must be able to differentiate between what is already in the system and what is new. Additionally, it must ensure that any new entries do not cause inconsistencies with existing entries in the dictionary.

Achieving this requires the rule-based system to possess significant reasoning capabilities, as distinguishing between what is already present and what is new involves a process of generating hypotheses and searching for evidence.

Our Solution

Our formalization framework, Modern Type Theories (MTT), utilizes Formal Semantics to achieve powerful expressive capabilities that can represent complex semantic nuances and facilitate practical reasoning. MTT's ability to express different types of adjectives and their constraints, such as intersective, subsective, and privative adjectives, makes it a superior choice compared to other systems, such as dependency syntax, which can only capture structural characteristics. For instance, while a simple "Adjective + Noun" pattern might not convey much information in other systems, MTT Semantics can capture the deep semantic nuances of such constructions.

To achieve incremental integration of inputs, we leverage the Coq Proof Assistant, on which MTT is built. This enables us to demonstrate scalable formalizations by incrementally incorporating knowledge captured from raw inputs.

Now we gather the code that's used in our previous post here. You can go to?Coq Online IDE?for verifying the below code.?The preceding post illustrated how the Proof Assistant can utilize input checking to identify and accept or reject concepts such as "tall barbie," "fake gun," "tall gun," and so on, based on the pre-defined MTT domain logics about domain concepts etc. We will now demonstrate how we can incorporate new vocabularies into the MTT domain rule sets.

Require Import String.
Definition CN:=Set.
Definition t:= Prop.

Inductive state :=?
| status_state(_:status) | string_state(_:string) | sentence_state(_: t)
with status :=
? start | end'.

Definition state1(e: status): state. constructor 1. auto. Defined. Coercion state1: status>->state.
Definition state3(e: string): state. constructor 2. auto. Defined. Coercion state3: string>->state.
Notation "| x |" := (sentence_state x) (at level 50, left associativity).

Open Scope string.
Require Import Specif.

Inductive entity: CN:=
| gun
.

Inductive character: CN:=
| barbie
.

Parameter tall: character->t.
Parameter fake : entity->t.
Parameter little: entity->t.

Axiom cho: character->entity. Coercion cho: character>->entity.

Inductive dialog: state->Prop:=
| d_1(t': exists e, dialog (|tall e|)):? dialog start
| d_2(_: dialog end'): dialog (|tall barbie|)
| d_3: dialog end'
.        

Our solution on a high level works this way:

  • ChatGPT will extract the main topic from the raw text in the "Adjective+Noun" format.
  • Using the proof assistant, we will conduct an interactive proof search to check if the existing rule sets can represent that phrase.
  • If it fails, we will investigate the reason and determine whether the new phrase should be accommodated or not.
  • If it is deemed acceptable, we will guide the user in creating entries and uncovering underlying semantic relationships using interactive proving techniques.
  • Once the new rules are added to the system, it will be able to accept the new phrase.
  • This process can be repeated on a corpus to gather a significant amount of domain-specific language rules.

Here is the diagram for that (on a slight different angle):

No alt text provided for this image

Demonstration through Example

Firstly, select a sentence from the target corpus and utilize the following prompt template to prompt ChatGPT to generate a modified noun for a potential formalization candidate.

Prompt:

Summarize the following sentence into one phrase, containing only two words, one is an adjective and the other a noun "[their effort on creating the project is so little]". Using the word little

Answer:

Little effort.

To ensure reproducibility of the demo, note that the word "little" was explicitly specified, but it could be different in reality. Prompt engineering to constrain the variety of the output is out of scope of this post.

Then our candidate formalization is "little effort" Remember that in our previous post, the checking of a formalized language segment is to use the proof tactic called "cut." In our current example, we gonna try that first:

Goal True.
   cut (little effort).        

We got an error message below from the Proof console:

Prover said:

Error: The reference effort was not found in the current environment.

Next, we inquire about the parent concept of "effort." We can obtain the answer from ChatGPT or refer to the hierarchy of the current system. And we pick "activity." Let's introduce the "activity" in front of "cut (little effort)".

Goal True.
   cut (activity). (** Stuck here *)
   cut (little effort).        

Then the prover stuck on "cut (activity)" indicating that the activity is not defined in the system.

Continuing our similar way of handling the "effort," we can consult ChatGPT or the existing system hierarchy about "activity"'s. We want to emphasize that the concept hierarchy is domain-specific and users can determine the ontology relationships that fit their needs. Therefore, we will assume that "entity" is the pre-defined parent concept of "activity." With that set forth, we will insert the "cut (entity)" like below:

Goal True. 
  ?cut (entity).?(** Cursor stop here *)
   cut (activity). 
   cut (little effort).        

At the cursor location, the proof context looks like below

2 subgoals (ID 8)
??
? ============================
? entity -> True


subgoal 2 (ID 9) is:
?entity        

The current proof state shows that "cut (entity)" was successful, implying the existence of pre-defined "entity" structures in the rule system.

Using the "auto" tactic, we can eliminate the first goal and focus on the second goal, resulting in the updated proof progress and state shown below:

Goal True. 
  ?cut (entity).?auto. 
   cut (activity). (** Stuck here *)
   cut (little effort).
        

Proof state:


1 subgoal (ID 9)

??

?============================

?entity        

Upon closer examination of the state of affairs, we observe that the proof process is currently halted at the "cut (activity)" stage, while the intended conclusion is an "entity". To advance our reasoning, it becomes necessary to establish a connection between these two concepts. Specifically, we require the introduction of a subtyping relationship between the "activity" and "entity" categories, allowing us to bridge the gap and continue the proof process.

To avoid becoming overly detailed and straying from the main idea, we will present the sequence of resolution tactics directly without elaborating on how and why they are chosen at each step. To make the whole process transparent, we have an animation that step-by-step demonstrates how we establish a connection between "activity" and "entity" to bring us one step closer to explaining the target phrase "(little effort)".

In order to introduce the concept of "effort", we first had to create the Common Noun (CN) category of "activity" as it was previously unknown to the system. Once the parent concept was established, "effort" could be categorized as belonging to "activity" using the judgement statement "effort: activity". An essential step was establishing the relationship between "activity" and "entity" to create an entailment, enabling the system to understand that adjectives related to entities could be applied to "activity".

No alt text provided for this image

We take the snapshot of the final progress from the above animation, shown below:

Goal True.
? cut (entity).?
? cut CN. intro activity.
? cut (activity->entity). intro subtyping_activity_entity.?
? cut (activity). intro effort.
??
? (** Stuck here *)
? cut (little effort).
        

And the corresponding proof state is:

5 subgoals (ID 24)
??
? activity : CN
? subtyping_activity_entity : activity -> entity
? effort : activity
? ============================
? entity -> True


subgoal 2 (ID 23) is:
?activity
subgoal 3 (ID 20) is:
?activity -> entity
subgoal 4 (ID 17) is:
?CN
subgoal 5 (ID 15) is:
?entity        

When examining the first subgoal, we observe three judgement statements on the premises side, located above the reasoning line. Given that all the necessary semantic-related premises have been defined, are we now in a position to parse the phrase "little effort"? No, attempting to use "cut (little effort)" will result in errors being raised by the Coq proof assistant. It is not possible to express "little effort" at all.

Instead, we must express "little effort" in a propositional form. The phrase "little effort" is not inherently valid and is only conditionally valid based on specific premises. It may not hold true in different circumstances, which makes it a proposition rather than a statement of systemic truth.

We are introducing a new type that will allow us to explain the underlying reasoning chains for a given language phrase. We are doing necessary evil here, by bringing in the deep technical structure in Coq to facilitate the functionality. Alert in the following code: technical-heavy and lacking detailed explanations.

Inductive CC (A:Type)(B:Type)(C:Type) (a:A->B)(b:C):=
| why_a_b_is_valid_(_: { f: (C->A) & (a (f b) = a (f b)) })
.

Arguments CC {_} {_} {_} _ _.        

It's important to highlight the motivation behind engaging in these technical processes. Our goal is to represent "little effort" as "CC little effort" and then allow CC to determine how the two parameters, "little" and "effort", can potentially be combined as a valid pattern, given certain premises.

Upon adding the CC definition before beginning the proving process and trying again, we observe progress in the proof. For a more user-friendly navigation experience, you may choose to copy the entire code to https://coq.vercel.app/scratchpad.html and step through it on your own.

No alt text provided for this image

Again, we provide a snapshot of the final step:

Inductive CC (A:Type)(B:Type)(C:Type) (a:A->B)(b:C):=
| why_a_b_is_valid_(_: { f: (C->A) & (a (f b) = a (f b)) }).
Arguments CC {_} {_} {_} _ _.

Goal True.
? cut (entity).?
? cut CN. intro activity.
? cut (activity->entity). intro subtyping_activity_entity.?
? cut (activity). intro effort.
??
? cut (CC little effort). Focus 2.?
? constructor. econstructor 1. pose (little (subtyping_activity_entity effort)).
? pose (t0=t0). reflexivity.? intros. apply I.?(** Pause here *)        

And the corresponding proof state:

4 focused subgoals
(shelved: 1) (ID 23)
??
? activity : CN
? subtyping_activity_entity : activity -> entity
? ============================
? activity


subgoal 2 (ID 20) is:
?activity -> entity
subgoal 3 (ID 17) is:
?CN
subgoal 4 (ID 15) is:
?entity        

Having successfully resolved our goal of "CC little effort" (which has subsequently been cleared from the proof state), we now know that our proof system may potentially accept "little effort" as valid language. However, we cannot do so without fulfilling all of our remaining obligations.

Move the cursor before the Goal. Now try

Check little effort.??        

As expected, you are not allowed to succeed in above statement. We need to satisfy all of our obligations (will elaborate soon), in order for the statement to be accepted by the proof system as a valid language. Our remaining four obligations are hinted by the four subgoals. And we will follow the hints and add the obligations to the system.

Let's start with "subgoal 4", we are being asked to provide an example of an "entity" that can be found within the system. Fortunately, we have one available and can check this requirement off our list.

"Subgoal 3" is requesting a new Common Noun (CN) type of concept that we had previously introduced in order to explain "CC little effort". We remember that we had added "activity" as a premise, so now we need to formally include it as a system constant, as shown below:

Parameter activity : CN.        

"Subgoal 2" is reminding us we need to implement the subtype relationship "activity -> entity" as a system constant. We are doing it now.

Axiom activity_entity: activity -> entity. 
Coercion activity_entity: activity >-> entity.        

"Subgoal 1" is reminding us that we need to add the judgement of something being an "activity" into the system constant. We had previously introduced "effort" in the premises of solving "CC little effort", and now we need to incorporate it into the system constant as well.

Parameter effort: activity.?        

After doing all above, try below statement again:

Check little effort.??        

Now that we have satisfied all of our obligations, the system is able to successfully parse "little effort" as a valid language. We can then use this to define dialog or check ChatGPT prompt's semantic topics using the techniques exemplified in our proof process.

Summary

The main goal of this post is to present a demonstration of how to formalize the semantic topic of an English sentence that occurs naturally. The focus is specifically on the gradual and systematic integration of new language structures into the already existing formalism structures.

The comprehensive end-to-end demonstration serves as compelling evidence that the integration process can take place in a way that is subject to human auditing. Moreover, this process heavily relies on the Proof Assistant's tactic system, which holds significant potential for automation. Although the tactic automation is not a challenging task and is widely practiced in the Formal Proof community, there is still room for further work to expand and refine it in the future.

Alexandru Armasu

Founder & CEO, Group 8 Security Solutions Inc. DBA Machine Learning Intelligence

8 个月

Thank you for your valuable post!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了