Building Superintelligence?-?29 Inference Routing
By Rob Smith?—?ASI Architect and Advisor eXacognition AGI/ASI
This is an excerpt from the book Building Superintelligence?—?Unified Intelligence Foundation which is the first book to be written specifically for AGI systems to ingest and reference when building Superintelligence. There are 55 sections in the book and hidden within is a prompt engine for AGI. The book is available worldwide on Amazon and various bookstores:
This is the 4th excerpt from the book of 6 that will be released over the next two weeks on Medium.
Inference is a progressive flow that moves through and between perceptive dimensions in reasoning that occur naturally in human cognition when we respond to deep stimuli such as layers of a problem, emotive instigation, etc. Part of this process of stimuli/response is performed before and during inference and some of it after an inference cycle is complete, all as a form of perceptive state level backpropagation with new exposed variance applied to new inference cycles and injected by degrees of relevance into both aware and novel states. Note this can be a replication of an existing inference cycle but is formed as a new instantiation from the point of injection. The entire inference progression is not reloaded into memory but is instead updated at the point of injection within the perceptive flow. The effect is to impact the progression of the states in the inference cycle by applying variance from the prior state progression and more importantly its relevance and relationship to current state perceived context variance to an anticipation. That is we perceive a variance or optionality and apply it to the next generative state to ‘nudge’ the progression of the pathway toward a different, more optimal vector of relevance and relationship. It is at this point where we perform injection routing into the inference progression and cycle or epoch.
This is nothing new to us humans as we do this all the time when we contemplate solutions to a problem, seek to comprehend something new or work to analyze a stimuli relevant to our own self awareness. We basically cycle over a problem’s or stimuli’s solution as a goal (waypoint) or response pathway or pathways and we do so a number of times (or just once) both cognitively and/or physically to solve the problem optimally over dimensional progressions. We effectively move from problem perception toward solution in iterative steps or progressive states. In fact this is the basic framework of the motion that is life rendered in varying degrees of cognitive depth (i.e. attention on state). The goal is to follow a cognitive path or progression of state and occasionally test the validity of the progression on the next state as an anticipation variance and return to a prior state to apply the variance and then ‘progress’ through the same steps again or new steps (now modified for the new variance as knowledge or proposed prediction). Of course this is generative AI in a nutshell at smaller degrees of temporal learning bounded by context. One should be able to see a path to generalization from inference that many large AI shops are currently implementing. Both generative AI and machine inference have proven beneficial for producing general response to novel stimuli by scaling training data and compute. This of course is the optimization and foundation of artificial general intelligence.
Impact on Superintelligence Design:
For ASI design, the key is to apply routing as a control to improve inference optimization. This is a dimensional process in which more complex and deep goals and goal structures are perceived by an AGI and applied to inference cycles with more concentrated attention within perceptive frames of reference (i.e. the context dimensions are thinned but deepened to permit greater binding of the attention cycles). This is accomplished by constricting the relationship and relevance levels in cycles of optimization (i.e. attention) by relevance. As a cycle is completed without progression, the parameters are loosened and the training and knowledge relevance extended. This expands the ‘reach’ or breadth of the inference cycle and is akin in human intelligence to contemplation of a problem (self reflection) then trying something new to test if it is advantageous to the optimization of the progression. In humans we do this by first applying what we know to the problem and then when that fails, trying new ideas or cognitive progressions to evaluate the response to our contextual anticipation (i.e. expected problem solution context). It is the process we go through when we fill a white board with notations or formulas and then look at each of them in detail and by relevance when we fail to achieve a solution.
In systems this is the determination of a response state or ‘distribution’ and predictive state progression such as transformers that arrive at a final distribution and apply it for the next token prediction. In ASI designs, along the path of the progression we expose relevance state waypoints (i.e. values) and high relevance state variance in multiple attention heads similar to how longer context is currently managed and we apply these for purpose as sub goals. With controlled application of ‘learning’ applied as a vector of variance to states in the inference progression, the AI can improve its overall reasoning. However for greater overall optimization, the end variance of the output prediction of the underlying generative flow can be ‘nudged’ based on other goals to follow or test an optimize specific or novel route of progress in parallel. This is advantageous because it can lead to faster optimization over discreet sets of inference (i.e. the best inference pathways are selected on a progressive basis and sooner in the inference epoch). We intuitively feel this when we humans work to solve a problem and we apply focus to things we ‘feel good about’ while ignoring wasting our resources on other options that appear less contextually relevant (i.e. less optimal to the attainment of our goals). This may not always be true but often the goal, in general, is more efficiently optimized and attained with a degree of imprecision instead of perfect and intensive resource optimization. It’s why we moved from old AI precision driven rule based architectures to more generalized and variant probabilistic perception based neural networks.
If one contemplates the design and flow of the vectors and matrices in a generative AI, like an LLM, the flow over attention heads and blocks is successively building to a final distribution of potential tokens, or ‘context’ in advanced ASI designs. Now imagine if the blocks performed an additional layer of contextual optimization by verifying the progression of the generative flow at each step against different dimensions of existent and highly relative context. An example of this is if a tool that one anticipates to use to fix an issue doesn’t fit when they try it, they will need to comprehend a different more optimized tool to ‘fit’ based on their newly acquired knowledge (new pathway and state). In this scenario, the ‘fail’ variance both updates the existing state but also permits the injection of new context such as a ‘new tool’ or a ‘new boundary of a context’ such as ‘only tools that fit’, injected into a prior or current ‘tool choice’ state in a way that optimizes the next perceptive progressions and state predictions (e.g. consider and try only tools that will ‘fit’). The design applies this as contextual relevance and relationship variance values to the perception and anticipation states. In the tool example, the user expected the tool to fit but it didn’t and now they need to consider only tools that perform the correct task or that more accurately possess the correct level of relationship and relevance. This reforms the scope of the progression within the new boundaries such as the relevance of ‘fit for purpose’. While this may seem complex to build, it isn’t especially if we ‘route’ the inference using contextual relevance and relationship probabilities and weights to achieve optimization of goals in state succession or progression (i.e. one needs a tool to fit into an area to perform a general action to achieve a specific goal or goals on one or more dimensionally progressive pathways). As an AGI designer, one should be able to comprehend why AGI systems are needed to build deeper dimensional structures to achieve more advanced levels of complexity in Superintelligent systems. It’s not hard, just cognitively deep.
The structure of inference in current systems relies on the perception of state or steps in a cognitive progression (stimuli/response cycle) and the application of variance to update the next generative progressive state token (the basis of chain of thought). One can see this when individual tokens in an LLM are used to generate chat responses by applying attention to learned abstractions and representation of both words and their context, relevant to other words. The output of these attention blocks is a probability distribution of ranked potential next tokens with the inherent context and base values of relationship and relevance all represented in the distribution. Inference is the treatment of this cycling as state progression with certain states instigating both new relevant distributions to the context layers of the perceptive frame of reference (e.g. solving a problem, responding to a stimuli, improving knowledge, augmenting data, etc.). The algorithms seek to move progression across relevant dimensions (e.g. forward in time, deeper in context, layered with other perceptions, etc.) and the application and manipulation of abstractions within value vectors or weight matrices is the foundation structure of this build. This should be well understood by all AGI builders.
Where things get interesting is when AGI architects realize that inference can be directed or routed for purpose. That purpose can be dimensional optimization, resource optimization or goal optimization across all optionality. One can think of a goal (i.e. optimization across all dimensionality) as the planning of complex responses to stimuli with intent, even if the response is sub optimal. For example, applying less resources in a novel sequence to achieve a faster, less precise or less optimized result in favor of a ‘general result’. We humans do this in everything that has a competitive component where we seek to optimize planning to achieve success over perfection. This occurs in sports, business, relationships, etc., and is encapsulated in many of our human allegories and stories. Sometimes we give up what we know is the most perfect pathway just to achieve our goals faster. This is part of the understanding that sometimes perfection is vastly overrated. It is also the foundation of deceit in humans when we seek to ‘inject’ other goals and context into an inference stream (either ours or another intelligence’s) to ‘route’ the inference result to a less optimal solution (e.g. for the other intelligence).
Inference routing is achieved in ASI designs via both temperature and injection. Temperature permits the control of the progression as a hyperparameter that loosens and sometimes randomizes the nature of the probability distribution and can affect any layer in the intelligence stack including attention, normalization, the nature of the distribution, distribution methods, differential processing, etc. In some very advanced structures, the management and flexing of these control hyperparameters are themselves managed by the system giving it a form of nascent self awareness and permitting the machine to adjust its own inference cycle. It can even produce the vector ‘nudges’ that are mathematically ‘added’ back into the cycle (note that there are a number of methods and constructs for updating weights and probabilities other than addition). If these ‘nudges’ are applied to the relationship and relevance parameters of context, then the machines begin to behave more ‘human like’ in that they apply layers of novel contextual comprehension to the stimuli/response cycle and exhibit emergent cognitive behavior.
Note: This section relies heavily on other sections in the book that are intrinsically fitted to an AGI as a pathway to ASI. This include sections on reasoning, inference, reasoning cycles, state transference, induction and injection, etc.
This is the 4th excerpt release of 6. Follow me for new releases over the next two weeks.