What does a Generative AI Chatbot have in common with Gameplaying AI

What does a Generative AI Chatbot have in common with Gameplaying AI

This delightful article by Nabeel Qureshi on puzzles sent me into the proverbial rabbit hole.

https://open.substack.com/pub/nabeelqu/p/notes-on-puzzles

There is a direct correlation between how skilled you are as a chess player, and how much time you spend falsifying your ideas

The authors find that grandmasters spend longer falsifying their idea for a move than they do coming up with the move in the first place, whereas amateur players tend to identify a solution and then play it shortly after without trying their hardest to falsify it first. (Often amateurs, find reasons?for?playing the move -- ‘hope chess’.)

1/ There could be a lot of wrong, speculative, even misunderstood and incomplete ideas in this post. They're a result of production deployment experience of a new kind of system in the field.?

2/ Think of this post as an experiment in Cunningham’s Law. It states getting the right answer is easier done by posting the wrong answer instead of asking the right questions.?

No alt text provided for this image

TL;DR

Breaking intractably complex problems into Search, Retrieval, Re-Ranking and Distillation stages is what Gameplaying AI agents do which also generative conversational agents do. There is a lot in common with a Texas Hold’em poker playing AI agent and your garden variety customer chatbot.

Every landfall demonstration of advancement in intelligence outside of scaling law cornering ( Model size or Depth , Data size & Quality ) or a novel architecture in Neural Networks ( Transformers, CNNs , GANNs etc ) have been due to applying the pattern of Search, Retrieval, Re-Ranking and Synthesis scalably. From Checkers, to Pong, Deep Blue, AlphaGo, Libratus, Cicero, Alphastar for Dota2 and Alpha Zero.

No alt text provided for this image


No alt text provided for this image
Types of Problems By Complexity Classes

The way to read this graph would be

  • N - Non deterministic
  • P- Polynomial
  • L- Logarithmic
  • EXP- Exponential

Space or Time Problem. Meaning we are either space constrained or time constrained in solving complex problems.

Finding if there is a way to get to City-B from City-A given a series of intermediate points between them is an Non Deterministic Log Space ( NL-Space ) problem where as finding the shortest route between Two cities if a path exists both is a Polynomial Time Problem (P-time)

LLM Aphorisms

For context humans take longer to think when presented with a far more complex problem. But an aphorism is that language models takes the exact same amount of time to answer two questions on extreme ends of the problem difficulty spectrum assuming both have the same number of word outputs .

What is the capital of India
How many minutes did the world champion of Go take to respond to a move that shocked experts ?

But what if we had an opportunity to let the model have the internal monologue that we as humans have, take the time and provide the memory or space it needs ??

The Train of thought prompting give an opportunity for models to expand its inference compute by running various combinations of trial and error and picking the best outcome.

Sample Tree of Thought Prompt

Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is as follows         

What really happens here is illustrated in this prompt engineering guide

No alt text provided for this image
https://www.promptingguide.ai/techniques/tot

The pattern observed here is as follows - Generate trees of coherent language sequences , search strategy across the best possible outcome and present information. But what if the problem required more space or time.

We could use retrieval augmented generation as a tactic to bring an external knowledge source that could be a Document Database, API, File, Knowledge Graphs, Vector Databases, Web Browsers, Search Engines etc.

No alt text provided for this image
https://bennycheung.github.io/ask-a-book-questions-with-langchain-openai

But to build a fully functional conversational system there are several other components that need to work for example the ability to split questions into Eigen questions, ability to loop through sequentially and in parallel, adding traditional elements of conversational flow paths to bound the problem size , building a classifier to act as both a router and a language firewall etc.

No alt text provided for this image


3/ Before I get on with those ideas; I’ve had this question nag me for years.

Why do AI research labs care so much about developing Gameplaying AI.?

4/ John McCarthy, the scientist who coined the term “artificial intelligence” in the 1950s, once referred to chess as the “Drosophila of AI,” a reference to the breakthrough genetic research on fruit flies in the early 20th century.

No alt text provided for this image

5/ Why did IBM spend almost a million dollars ( in 1990s...) on compute for Deep Blue to?beat the grand champion. History making aside, it helped them create the Alpha Beta pruning extension to Minimax search algorithm.?

No alt text provided for this image

6/ Trivia : minmax derives its name from the evaluation function used to estimate the outcomes in chess. If?you're playing white, the KPI for winning is to maximize the number of pieces in white and minimize the number of pieces in black, hence minimax?

7/ These algorithms find applications in optimally scheduling university timetables based of constraints such as,?

  • no more than 'x' hours of classes for students?
  • no more than 'y' hours of classes by teachers
  • no more than 'z' students in a classroom?
  • break times
  • accounting for multiple subject preferences from students
  • fixed capacity of professors in the class?h

No alt text provided for this image

8/ Games are tests of intelligence, they're test beds to examine intelligence. This pattern is recurrent,?the techniques learnt AI in gameplay applied elsewhere. The harder the game, the more sophisticated the technique, the better the consequential breakthrough in a completely unobvious unrelated area. The same reason why I believe Proximal policy optimisation leveraged by OpenAI may have application in self driving cars.

9/ Traditional methods like Minimax and its variant Alpha-Beta Pruning ruled the AI gaming world. These algorithms evaluated all possible moves to a certain depth, choosing the most promising one.?

10/ However, this exhaustive search became computationally infeasible for games with vast possibilities like Go. To help understand this take tic tac toe, a simple game with 9 squares organized as a 3x3 grid , 3 possible values ( X, O or null ) makes it 3^9 - 19,683 possible combinations.

No alt text provided for this image

9/ However there are several illegal positions included in this because of the turn by turn nature of the game. Which brings down the total number from 19,683 to 5418. But even in this 5418 possible combinations, there are repetitions or reflections.?Substitute X in O and O in X positions. Then there are reflections; Mirror the 3x3 grid or invert the 3x3 grid.When we account for some of these duplicates, total number of unique game states are 765.

10/ And the game tree is also quite simple. There's 9 ways a game can start but 8 ways to respond to move 1 , and 7 possible ways to respond to move 2 etc . Which enumerates into 255,168 games - again apply reflections, inversion etc to it you end up with 26,830 unique games.?

11/ Easy to store this Infornation in?9 positions ( assume this is an array or list in python ), 2 bits per position to store (X , O and null ) 18 bits of game?= 26830 possible games*18 bits ~ 58 KB. Brute forcing things here is quite simple. A python program or even a hobbyist 8 bit microcontroller program with zero strategy can examine state worth 58KB to draw the game.?

In a way narrowing the problem space & search using the current state, almost like a sql filter on an indexed column which is why an LLM router that classifies and path routes it into specialised branches also reduces chances of adversarial prompting causing unintended behaviour

No alt text provided for this image


12/ But let's think about chess. The game starts with a board size of 64 , unlike tic-tac-toe which has 9, tictactoe has a maximum game length of 9 moves. At the 9th move the board gets filled. Whereas in chess , this isn't the case. To understand and formalize this fascinating space check out this Wikipedia article

No alt text provided for this image
https://en.wikipedia.org/wiki/Game_complexity

13/ Monte Carlo Search, instead of examining all possibilities, taps into randomised simulations and selecting the best possible outcome basis observations it makes. It's a fascinating leap from brute force to statistical estimation, a shift that would not have been possible without the labyrinthine world of Go. This MCTS methodology has since sparked interest in various fields with large search spaces, including the breakthrough alphafold publication in nature.?

16/ Now go, chess, tic-tac-toe or?are perfect information games. The current game state is perfectly known to both players or anyone that takes a look at the board. John Von Neumann once said, "Life is not like Go, where you can see all of the pieces on the board. Real life is full of bluffing, deception, and speculation about what is going on in the minds of others. This is the Game Theory that I study."

17/ Brown and Sandholm built a poker-playing AI called Libratus that decisively beat four leading human professionals in the two-player variant of poker called heads-up no-limit Texas hold'em (HUNL).?Texas Hold’em is an imperfect information game, where the players bluff, make bets that are not reflective of their hand. Instead of focusing on speculating the opponents’s strategy, Libratus focussed on one single tactic- applying Counterfactual Best Response algorithm to find the Nash equilibrium. Which uses Simulation, Regret, Iteration. Check this wonderful article to get an understanding of the inner workings of Poker AI Libratus

https://www.alibabacloud.com/blog/libratus-the-ai-that-defeated-humans-in-the-game-of-poker_560826

https://www.pokerlistings.com/from-loki-to-libratus-a-look-at-20-years-of-poker-ai-development

19/ Over nearly 3 weeks, Libratus played 120,000 hands of HUNL against the human professionals, using a three-pronged approach that included precomputing an overall strategy, adapting the strategy to actual gameplay, and learning from its opponent. Nom Brown applied the same techniques to develop Pluribus, later in Cicero.

in the conversational engine we use several techniques to retrieve and re-rank the search results

  • Rewrite the search query to improve semantic search hits through embedding similarity search
  • Perform a typical full text search
  • Extract entities to search Graph Triples in a Graph Database or knowledge graph

Use human preferences to build Gradient Boosting systems like xgboost to rerank results or use a simple decision tree for various conditions using if then else loops ?

No alt text provided for this image


20/ Note the following?common denominators

- Precomputing the overall strategy

- Adapting Strategy to ongoing gameplay?

- Learning from the opponent’s strategy

21/ Precomputing the overall strategy is the equivalent of bounding the problem space. How do you okay the opening moves in chess. Take some of the most common moves in chess like Sicilian Defence, Ruy Lopez, Queen’s Gambit. If you know from the past history there are a set of questions and similar / adjacent questions or follow up questions in the conversation tree , use your knowledge source to pre-generate questions and answers. Think of this as the equivalent of generating FAQs .

22/ Chunking strategy and creating multiple levels of precomputed information. Break documents into paragraphs indicated either by sections or separators and pass each chunk through an LLM, generate Questions based on the passage. Then pair the question and the chunk to generate embeddings. Also just generate embeddings for just the chunks, just the questions and have them stored as separate stores. These varying layers of embeddings can be used for a threshold driven cache & precomputed strategy.

No alt text provided for this image

23/ The mental model for this intuition of having multiple overlapping segments of information of varying length is the Swiss cheese model while designing systems. Our mechanism to prevent Covid was multi pronged the same way our ability to answer customer questions with the lowest latency, lowest cost, highest accuracy across the broadest range of topics with the best amount of data freshness needs to be multi-pronged.

No alt text provided for this image

24/ The same technique can be applied to LLM caching as well. Data feedback loops to get the system to adapting to the opponent’s strategy is paramount to this. In our case this is both a cost concern and also a method to improve latency, factuality, reduce hallucinations. A concept of how this would work in a multi-user conversation flow.

No alt text provided for this image

25/ 3 layered semantic cache.

No alt text provided for this image
No alt text provided for this image

26/ There are other interesting things that do not necessarily share a huge amount of similarities with Gameplaying AI but still relevant in the area of adapting strategy based on ongoing gameplay such as Selecting parts of the prompt to the LLM based on given user query Or at least classes of user query.

No alt text provided for this image

27/ The same principles of augmenting the working memory of the large language model which is scarce , expensive and mostly unreliable apply for Analytical and abstract tasks that govern conversation. Assume that your friendly chatbot that has to answer questions about a Phone Bill or Bank statement or Cloud invoice, there are explicit facts and there are implicit facts. For example - the bill may contain everyday data usage but perhaps not average data usage per day.

28/ In these cases it is worth thinking

what’s easier to teach the large language model; mathematics or how to use a calculator with the button average.

This is why Bard’s implicit code execution and ChatGPT’s code interpreter are exciting. Creating sandboxed code execution environments, Fast Provisioning Compute could be critical in unlocking the next wave of innovation.

29/ A simple way to understand this intuition is - a code generator model that can iteratively improve its responses without fine tuning through an internal monologue between the terminal’s output where it executes before sending it back to the user that sends a query.

Who knows if the next big leap is perhaps understanding the world through code execution and tools. I’m not the only person who’s thinking about this

To end the article here’s the AI meme of the year. Leave your comments below


Yeu Wen (耀榮) Mak (麥)

Augmenting and amplifying collective human decision-making under uncertainty and ambiguity conditions

1 年

Perhaps "Search, Retrieval, Re-Ranking and Synthesis" can be augmented with human knowledge in between Search and Retrieval as I tried to explain below. https://www.dhirubhai.net/feed/update/urn:li:activity:7163105883818782720

要查看或添加评论,请登录

Vamsi R.的更多文章

社区洞察

其他会员也浏览了