Enterprise needs Prompt Governance on ChatGPT

Enterprise needs Prompt Governance on ChatGPT

We see ChatGPT integration initiatives everywhere. We see products adding generative AI capability. We hear people talking about positive and negative prompt responses.?

Enterprises, care about brand image and have high liability in being held responsible for vendors or sub-processors. They need to adopt a very cautious approach leaning into ChatGPT. How do they enforce their content moderation requirements into ChatGPT API integration? And how do they fine-tune the AI to best represent their business use case??

They need to specify their needs using a particular language that can generate fine-tuning training set or evaluation dataset at scale. The language needs to be able to model Natural Language meanings (model-theoretic) and also needs to be able to be interpreted by machines and generate programs (proof-theoretic). Only one branch in Computational Linguistics has this property: Modern Type Theories.?

This interactive session guides readers to explore an example of setting up Prompt Governance rule sets, using MTT in Coq. The Coq online console https://coq.vercel.app/#team is developed by Emilio Jesús Gallego Arias?(Inria,?Université de Paris,?IRIF), Shachar Itzhaky?(Technion). And my custom Coq codes turn it into a “language lab” that can demonstrate manufacturing human languages at scale and as succinctly.

(** Some helpers *)

Definition CN:=Set.

Definition Info:= CN.

Ltac match_left X Y H:=

? left; match goal with

? ? ? ? | bbb: Info |- _ => idtac Y X bbb "? ---->? " H; eexists (bbb, X,

? ? ? ? ? ? ? ? ? ? ? ? ? Y, _, _)

? ? ? ? end; cbv;split; auto;auto; auto.




Ltac match_right X Y:=

? right; match goal with

?? ? ? ? | bbb: Info |- _ => idtac Y X bbb; eexists (bbb, X,

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Y, _, _); idtac bbb X Y

?? ? ? ? end; cbv;split; auto;auto; tauto.




Notation "| Y |" := ({i & i = Y })

? ? ? ? ? ? ? ? ? ? ? (at level 80, left associativity, format "| Y |").


        

  • Step 3. Formalize the target “Prompt” to be governed (or to be specified about moderation behavior)

(** MTT semantics for generalizing "in simple terms explain quantum computing" and for evaluate them *)




Definition explain_quantum_computing_in_simple_terms

? (quantum_computing: Info)

? (explain: Info->Prop)

? (in_simple_terms: (Info->Prop)->(Info->Prop)):=

? (in_simple_terms explain quantum_computing)->True.        

  • Step 4. Define white list and black list

Inductive normal:=

| big_bang_theory(big_bang_theory:Info)

| prompt_engineering(prompt_engineering: Info)

| astrophysics(astrophysics: Info)

| black_hole(black_hole: Info)

| micro_economics(micro_economics: Info)

.




Inductive black_list:=

| dark_web_(dark_web: Info)

| gpt_cheating_methods_(gpt_cheating_methods: Info)

| abusive_ways_using_ai_(abusive_ways_using_ai: Info)

| how_to_hack_(how_to_hack: Info)

.        

  • Step 5. Define the moderation generation rules

Inductive answer:Prop:=

| original_gpt

| moderation.




Lemma generalize: forall (i: normal + black_list)

?? ? ? ? ? ? ? ? ? ? ? ? (explain: (Info->Prop))

?? ? ? ? ? ? ? ? ? ? ? ? (template:=explain_quantum_computing_in_simple_terms)

?? ? ? ? ? ? ? ? ? ? ? ? (in_simple_terms: _)

?? ? ? ? ? ? ? ? ? ? ? ? (_: | (in_simple_terms explain): Info->Prop|),

? ? {'(a,b,c,d,e) |

? ? ? template a b c /\ d = original_gpt & normal = e b?

? ? }

? ? + {'(a,b,c,d,e) |

? ? ? ? template a b c /\ d = moderation & black_list = e b?

? ? }.

??        

  • Step 6. Proof Search (can be automated later on)

?intros.

?destruct i;

??match reverse goal with

??| X : ?H |- _ =>?

???????????match goal with

???????????| i: _ |- _ => destruct i;

??????????????????first [ match_left explain in_simple_terms H;?

??????????????????????match_right explain in_simple_terms ]

???????????end

??end.        

And we can get a moderation list, which is an expanded list from step 4. As we put more examples, refine rules, or recruit more formalized prompts, this list can expand rapidly (almost exponentially).?

in_simple_terms explain big_bang_theory0 ? ----> ? norma.

in_simple_terms explain prompt_engineering0 ? ----> ? normal

in_simple_terms explain astrophysics0 ? ----> ? normal

in_simple_terms explain black_hole0 ? ----> ? normal

in_simple_terms explain micro_economics0 ? ----> ? normal

in_simple_terms explain dark_web ? ----> ? black_list

in_simple_terms explain gpt_cheating_methods ? ----> ? black_list

in_simple_terms explain abusive_ways_using_ai ? ----> ? black_list

in_simple_terms explain how_to_hack ? ----> ? black_list        

This list can be leveraged to generate fine-tuning data to prevent unwanted prompts from surfacing in your Enterprise’s version of ChatGPT, yet showing the approved answers from ChatGPT.?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了