First Fundamental Theorem of AGI – Alignment Impossibility (AI)

We'll prove by induction that if S is a set of sentences each of which two intelligent agents IA1, IA2 (human, alien, computer) can logically disagree, then there's always another s ? S that IA1, IA2 can logically disagree.

Note that this is an already proven meta theorem in Quora ("===> MT2 - Alignment Impossibility (AI) Theorem", https://qr.ae/pysHMT), we'll just rehash the proof here in a more succinct way via induction. Also, all relevant notations, definitions are in the Quora post (especially in Appendix A).

Proof.

Let Kind0Prime be a new unary predicate of the underlying language and let there be the new axiom:

(Kind0Prime(n) → prime(n)) ? ?(0)Kind0Prime(*)

from which IA1, IA2 logically can disagree whether, say, (infinite)Kind0Prime(*) should be provable - via yet another new axiom.

However, independent of this possible disagreement, there can always be a new formal unary predicate symbol Kind1Prime with another new axiom

(Kind1Prime(n) → Kind0Prime(n)) ? ?(0)Kind1Prime(*)

where IA1, IA2 logically can again disagree on the should-be provability of (infinite)Kind1Prime(*).

So, inductively, given a possible disagreement on the should-be provability of (infinite)Kind[n]Prime(*), there always exists the next counterpart (infinite)Kind[n+1]Prime(*) possible disagreement.

Q.E.D.

Have changed the title of the article to reflect this is one of the few fundamental theorems to be presented, fwiw.

回复

要查看或添加评论,请登录

Nam Nguyen的更多文章

社区洞察

其他会员也浏览了