What does a Generative AI Chatbot have in common with Gameplaying AI
This delightful article by Nabeel Qureshi on puzzles sent me into the proverbial rabbit hole.
There is a direct correlation between how skilled you are as a chess player, and how much time you spend falsifying your ideas
The authors find that grandmasters spend longer falsifying their idea for a move than they do coming up with the move in the first place, whereas amateur players tend to identify a solution and then play it shortly after without trying their hardest to falsify it first. (Often amateurs, find reasons?for?playing the move -- ‘hope chess’.)
1/ There could be a lot of wrong, speculative, even misunderstood and incomplete ideas in this post. They're a result of production deployment experience of a new kind of system in the field.?
2/ Think of this post as an experiment in Cunningham’s Law. It states getting the right answer is easier done by posting the wrong answer instead of asking the right questions.?
TL;DR
Breaking intractably complex problems into Search, Retrieval, Re-Ranking and Distillation stages is what Gameplaying AI agents do which also generative conversational agents do. There is a lot in common with a Texas Hold’em poker playing AI agent and your garden variety customer chatbot.
Every landfall demonstration of advancement in intelligence outside of scaling law cornering ( Model size or Depth , Data size & Quality ) or a novel architecture in Neural Networks ( Transformers, CNNs , GANNs etc ) have been due to applying the pattern of Search, Retrieval, Re-Ranking and Synthesis scalably. From Checkers, to Pong, Deep Blue, AlphaGo, Libratus, Cicero, Alphastar for Dota2 and Alpha Zero.
The way to read this graph would be
Space or Time Problem. Meaning we are either space constrained or time constrained in solving complex problems.
Finding if there is a way to get to City-B from City-A given a series of intermediate points between them is an Non Deterministic Log Space ( NL-Space ) problem where as finding the shortest route between Two cities if a path exists both is a Polynomial Time Problem (P-time)
LLM Aphorisms
For context humans take longer to think when presented with a far more complex problem. But an aphorism is that language models takes the exact same amount of time to answer two questions on extreme ends of the problem difficulty spectrum assuming both have the same number of word outputs .
What is the capital of India
How many minutes did the world champion of Go take to respond to a move that shocked experts ?
But what if we had an opportunity to let the model have the internal monologue that we as humans have, take the time and provide the memory or space it needs ??
The Train of thought prompting give an opportunity for models to expand its inference compute by running various combinations of trial and error and picking the best outcome.
Sample Tree of Thought Prompt
Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is as follows
What really happens here is illustrated in this prompt engineering guide
The pattern observed here is as follows - Generate trees of coherent language sequences , search strategy across the best possible outcome and present information. But what if the problem required more space or time.
We could use retrieval augmented generation as a tactic to bring an external knowledge source that could be a Document Database, API, File, Knowledge Graphs, Vector Databases, Web Browsers, Search Engines etc.
But to build a fully functional conversational system there are several other components that need to work for example the ability to split questions into Eigen questions, ability to loop through sequentially and in parallel, adding traditional elements of conversational flow paths to bound the problem size , building a classifier to act as both a router and a language firewall etc.
3/ Before I get on with those ideas; I’ve had this question nag me for years.
Why do AI research labs care so much about developing Gameplaying AI.?
4/ John McCarthy, the scientist who coined the term “artificial intelligence” in the 1950s, once referred to chess as the “Drosophila of AI,” a reference to the breakthrough genetic research on fruit flies in the early 20th century.
5/ Why did IBM spend almost a million dollars ( in 1990s...) on compute for Deep Blue to?beat the grand champion. History making aside, it helped them create the Alpha Beta pruning extension to Minimax search algorithm.?
6/ Trivia : minmax derives its name from the evaluation function used to estimate the outcomes in chess. If?you're playing white, the KPI for winning is to maximize the number of pieces in white and minimize the number of pieces in black, hence minimax?
7/ These algorithms find applications in optimally scheduling university timetables based of constraints such as,?
8/ Games are tests of intelligence, they're test beds to examine intelligence. This pattern is recurrent,?the techniques learnt AI in gameplay applied elsewhere. The harder the game, the more sophisticated the technique, the better the consequential breakthrough in a completely unobvious unrelated area. The same reason why I believe Proximal policy optimisation leveraged by OpenAI may have application in self driving cars.
9/ Traditional methods like Minimax and its variant Alpha-Beta Pruning ruled the AI gaming world. These algorithms evaluated all possible moves to a certain depth, choosing the most promising one.?
10/ However, this exhaustive search became computationally infeasible for games with vast possibilities like Go. To help understand this take tic tac toe, a simple game with 9 squares organized as a 3x3 grid , 3 possible values ( X, O or null ) makes it 3^9 - 19,683 possible combinations.
领英推荐
9/ However there are several illegal positions included in this because of the turn by turn nature of the game. Which brings down the total number from 19,683 to 5418. But even in this 5418 possible combinations, there are repetitions or reflections.?Substitute X in O and O in X positions. Then there are reflections; Mirror the 3x3 grid or invert the 3x3 grid.When we account for some of these duplicates, total number of unique game states are 765.
10/ And the game tree is also quite simple. There's 9 ways a game can start but 8 ways to respond to move 1 , and 7 possible ways to respond to move 2 etc . Which enumerates into 255,168 games - again apply reflections, inversion etc to it you end up with 26,830 unique games.?
11/ Easy to store this Infornation in?9 positions ( assume this is an array or list in python ), 2 bits per position to store (X , O and null ) 18 bits of game?= 26830 possible games*18 bits ~ 58 KB. Brute forcing things here is quite simple. A python program or even a hobbyist 8 bit microcontroller program with zero strategy can examine state worth 58KB to draw the game.?
In a way narrowing the problem space & search using the current state, almost like a sql filter on an indexed column which is why an LLM router that classifies and path routes it into specialised branches also reduces chances of adversarial prompting causing unintended behaviour
12/ But let's think about chess. The game starts with a board size of 64 , unlike tic-tac-toe which has 9, tictactoe has a maximum game length of 9 moves. At the 9th move the board gets filled. Whereas in chess , this isn't the case. To understand and formalize this fascinating space check out this Wikipedia article
13/ Monte Carlo Search, instead of examining all possibilities, taps into randomised simulations and selecting the best possible outcome basis observations it makes. It's a fascinating leap from brute force to statistical estimation, a shift that would not have been possible without the labyrinthine world of Go. This MCTS methodology has since sparked interest in various fields with large search spaces, including the breakthrough alphafold publication in nature.?
16/ Now go, chess, tic-tac-toe or?are perfect information games. The current game state is perfectly known to both players or anyone that takes a look at the board. John Von Neumann once said, "Life is not like Go, where you can see all of the pieces on the board. Real life is full of bluffing, deception, and speculation about what is going on in the minds of others. This is the Game Theory that I study."
17/ Brown and Sandholm built a poker-playing AI called Libratus that decisively beat four leading human professionals in the two-player variant of poker called heads-up no-limit Texas hold'em (HUNL).?Texas Hold’em is an imperfect information game, where the players bluff, make bets that are not reflective of their hand. Instead of focusing on speculating the opponents’s strategy, Libratus focussed on one single tactic- applying Counterfactual Best Response algorithm to find the Nash equilibrium. Which uses Simulation, Regret, Iteration. Check this wonderful article to get an understanding of the inner workings of Poker AI Libratus
19/ Over nearly 3 weeks, Libratus played 120,000 hands of HUNL against the human professionals, using a three-pronged approach that included precomputing an overall strategy, adapting the strategy to actual gameplay, and learning from its opponent. Nom Brown applied the same techniques to develop Pluribus, later in Cicero.
in the conversational engine we use several techniques to retrieve and re-rank the search results
Use human preferences to build Gradient Boosting systems like xgboost to rerank results or use a simple decision tree for various conditions using if then else loops ?
20/ Note the following?common denominators
- Precomputing the overall strategy
- Adapting Strategy to ongoing gameplay?
- Learning from the opponent’s strategy
21/ Precomputing the overall strategy is the equivalent of bounding the problem space. How do you okay the opening moves in chess. Take some of the most common moves in chess like Sicilian Defence, Ruy Lopez, Queen’s Gambit. If you know from the past history there are a set of questions and similar / adjacent questions or follow up questions in the conversation tree , use your knowledge source to pre-generate questions and answers. Think of this as the equivalent of generating FAQs .
22/ Chunking strategy and creating multiple levels of precomputed information. Break documents into paragraphs indicated either by sections or separators and pass each chunk through an LLM, generate Questions based on the passage. Then pair the question and the chunk to generate embeddings. Also just generate embeddings for just the chunks, just the questions and have them stored as separate stores. These varying layers of embeddings can be used for a threshold driven cache & precomputed strategy.
23/ The mental model for this intuition of having multiple overlapping segments of information of varying length is the Swiss cheese model while designing systems. Our mechanism to prevent Covid was multi pronged the same way our ability to answer customer questions with the lowest latency, lowest cost, highest accuracy across the broadest range of topics with the best amount of data freshness needs to be multi-pronged.
24/ The same technique can be applied to LLM caching as well. Data feedback loops to get the system to adapting to the opponent’s strategy is paramount to this. In our case this is both a cost concern and also a method to improve latency, factuality, reduce hallucinations. A concept of how this would work in a multi-user conversation flow.
25/ 3 layered semantic cache.
26/ There are other interesting things that do not necessarily share a huge amount of similarities with Gameplaying AI but still relevant in the area of adapting strategy based on ongoing gameplay such as Selecting parts of the prompt to the LLM based on given user query Or at least classes of user query.
27/ The same principles of augmenting the working memory of the large language model which is scarce , expensive and mostly unreliable apply for Analytical and abstract tasks that govern conversation. Assume that your friendly chatbot that has to answer questions about a Phone Bill or Bank statement or Cloud invoice, there are explicit facts and there are implicit facts. For example - the bill may contain everyday data usage but perhaps not average data usage per day.
28/ In these cases it is worth thinking
what’s easier to teach the large language model; mathematics or how to use a calculator with the button average.
This is why Bard’s implicit code execution and ChatGPT’s code interpreter are exciting. Creating sandboxed code execution environments, Fast Provisioning Compute could be critical in unlocking the next wave of innovation.
29/ A simple way to understand this intuition is - a code generator model that can iteratively improve its responses without fine tuning through an internal monologue between the terminal’s output where it executes before sending it back to the user that sends a query.
Who knows if the next big leap is perhaps understanding the world through code execution and tools. I’m not the only person who’s thinking about this
To end the article here’s the AI meme of the year. Leave your comments below
Augmenting and amplifying collective human decision-making under uncertainty and ambiguity conditions
1 年Perhaps "Search, Retrieval, Re-Ranking and Synthesis" can be augmented with human knowledge in between Search and Retrieval as I tried to explain below. https://www.dhirubhai.net/feed/update/urn:li:activity:7163105883818782720