登录查看更多内容

What does a Generative AI Chatbot have in common with Gameplaying AI

Vamsi R.

AI Tech Lead, APAC Customer Engineering

发布日期: 2023年8月2日

+ 关注

This delightful article by Nabeel Qureshi on puzzles sent me into the proverbial rabbit hole.

https://open.substack.com/pub/nabeelqu/p/notes-on-puzzles

There is a direct correlation between how skilled you are as a chess player, and how much time you spend falsifying your ideas

The authors find that grandmasters spend longer falsifying their idea for a move than they do coming up with the move in the first place, whereas amateur players tend to identify a solution and then play it shortly after without trying their hardest to falsify it first. (Often amateurs, find reasons?for?playing the move -- ‘hope chess’.)

1/ There could be a lot of wrong, speculative, even misunderstood and incomplete ideas in this post. They're a result of production deployment experience of a new kind of system in the field.?

2/ Think of this post as an experiment in Cunningham’s Law. It states getting the right answer is easier done by posting the wrong answer instead of asking the right questions.?

TL;DR

Breaking intractably complex problems into Search, Retrieval, Re-Ranking and Distillation stages is what Gameplaying AI agents do which also generative conversational agents do. There is a lot in common with a Texas Hold’em poker playing AI agent and your garden variety customer chatbot.

Every landfall demonstration of advancement in intelligence outside of scaling law cornering ( Model size or Depth , Data size & Quality ) or a novel architecture in Neural Networks ( Transformers, CNNs , GANNs etc ) have been due to applying the pattern of Search, Retrieval, Re-Ranking and Synthesis scalably. From Checkers, to Pong, Deep Blue, AlphaGo, Libratus, Cicero, Alphastar for Dota2 and Alpha Zero.

The way to read this graph would be

N - Non deterministic
P- Polynomial
L- Logarithmic
EXP- Exponential

Space or Time Problem. Meaning we are either space constrained or time constrained in solving complex problems.

Finding if there is a way to get to City-B from City-A given a series of intermediate points between them is an Non Deterministic Log Space ( NL-Space ) problem where as finding the shortest route between Two cities if a path exists both is a Polynomial Time Problem (P-time)

LLM Aphorisms

For context humans take longer to think when presented with a far more complex problem. But an aphorism is that language models takes the exact same amount of time to answer two questions on extreme ends of the problem difficulty spectrum assuming both have the same number of word outputs .

What is the capital of India

How many minutes did the world champion of Go take to respond to a move that shocked experts ?

But what if we had an opportunity to let the model have the internal monologue that we as humans have, take the time and provide the memory or space it needs ??

The Train of thought prompting give an opportunity for models to expand its inference compute by running various combinations of trial and error and picking the best outcome.

Sample Tree of Thought Prompt

Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is as follows

What really happens here is illustrated in this prompt engineering guide

The pattern observed here is as follows - Generate trees of coherent language sequences , search strategy across the best possible outcome and present information. But what if the problem required more space or time.

We could use retrieval augmented generation as a tactic to bring an external knowledge source that could be a Document Database, API, File, Knowledge Graphs, Vector Databases, Web Browsers, Search Engines etc.

But to build a fully functional conversational system there are several other components that need to work for example the ability to split questions into Eigen questions, ability to loop through sequentially and in parallel, adding traditional elements of conversational flow paths to bound the problem size , building a classifier to act as both a router and a language firewall etc.

3/ Before I get on with those ideas; I’ve had this question nag me for years.

Why do AI research labs care so much about developing Gameplaying AI.?

4/ John McCarthy, the scientist who coined the term “artificial intelligence” in the 1950s, once referred to chess as the “Drosophila of AI,” a reference to the breakthrough genetic research on fruit flies in the early 20th century.

5/ Why did IBM spend almost a million dollars ( in 1990s...) on compute for Deep Blue to?beat the grand champion. History making aside, it helped them create the Alpha Beta pruning extension to Minimax search algorithm.?

6/ Trivia : minmax derives its name from the evaluation function used to estimate the outcomes in chess. If?you're playing white, the KPI for winning is to maximize the number of pieces in white and minimize the number of pieces in black, hence minimax?

7/ These algorithms find applications in optimally scheduling university timetables based of constraints such as,?

no more than 'x' hours of classes for students?
no more than 'y' hours of classes by teachers
no more than 'z' students in a classroom?
break times
accounting for multiple subject preferences from students
fixed capacity of professors in the class?h

8/ Games are tests of intelligence, they're test beds to examine intelligence. This pattern is recurrent,?the techniques learnt AI in gameplay applied elsewhere. The harder the game, the more sophisticated the technique, the better the consequential breakthrough in a completely unobvious unrelated area. The same reason why I believe Proximal policy optimisation leveraged by OpenAI may have application in self driving cars.

9/ Traditional methods like Minimax and its variant Alpha-Beta Pruning ruled the AI gaming world. These algorithms evaluated all possible moves to a certain depth, choosing the most promising one.?

10/ However, this exhaustive search became computationally infeasible for games with vast possibilities like Go. To help understand this take tic tac toe, a simple game with 9 squares organized as a 3x3 grid , 3 possible values ( X, O or null ) makes it 3^9 - 19,683 possible combinations.

领英推荐

What Does DeepSeek Reveal About the Future of AI?

波士顿谘询公司 1 个月前

Can generative AI master emotional intelligence?

Fast Company 1 年前

Why Generative AI ‘hallucinates’ - and how CTOs can…

Saudi Business Machines - SBM 1 个月前

9/ However there are several illegal positions included in this because of the turn by turn nature of the game. Which brings down the total number from 19,683 to 5418. But even in this 5418 possible combinations, there are repetitions or reflections.?Substitute X in O and O in X positions. Then there are reflections; Mirror the 3x3 grid or invert the 3x3 grid.When we account for some of these duplicates, total number of unique game states are 765.

10/ And the game tree is also quite simple. There's 9 ways a game can start but 8 ways to respond to move 1 , and 7 possible ways to respond to move 2 etc . Which enumerates into 255,168 games - again apply reflections, inversion etc to it you end up with 26,830 unique games.?

11/ Easy to store this Infornation in?9 positions ( assume this is an array or list in python ), 2 bits per position to store (X , O and null ) 18 bits of game?= 26830 possible games*18 bits ~ 58 KB. Brute forcing things here is quite simple. A python program or even a hobbyist 8 bit microcontroller program with zero strategy can examine state worth 58KB to draw the game.?

In a way narrowing the problem space & search using the current state, almost like a sql filter on an indexed column which is why an LLM router that classifies and path routes it into specialised branches also reduces chances of adversarial prompting causing unintended behaviour

12/ But let's think about chess. The game starts with a board size of 64 , unlike tic-tac-toe which has 9, tictactoe has a maximum game length of 9 moves. At the 9th move the board gets filled. Whereas in chess , this isn't the case. To understand and formalize this fascinating space check out this Wikipedia article

13/ Monte Carlo Search, instead of examining all possibilities, taps into randomised simulations and selecting the best possible outcome basis observations it makes. It's a fascinating leap from brute force to statistical estimation, a shift that would not have been possible without the labyrinthine world of Go. This MCTS methodology has since sparked interest in various fields with large search spaces, including the breakthrough alphafold publication in nature.?

16/ Now go, chess, tic-tac-toe or?are perfect information games. The current game state is perfectly known to both players or anyone that takes a look at the board. John Von Neumann once said, "Life is not like Go, where you can see all of the pieces on the board. Real life is full of bluffing, deception, and speculation about what is going on in the minds of others. This is the Game Theory that I study."

17/ Brown and Sandholm built a poker-playing AI called Libratus that decisively beat four leading human professionals in the two-player variant of poker called heads-up no-limit Texas hold'em (HUNL).?Texas Hold’em is an imperfect information game, where the players bluff, make bets that are not reflective of their hand. Instead of focusing on speculating the opponents’s strategy, Libratus focussed on one single tactic- applying Counterfactual Best Response algorithm to find the Nash equilibrium. Which uses Simulation, Regret, Iteration. Check this wonderful article to get an understanding of the inner workings of Poker AI Libratus

https://www.alibabacloud.com/blog/libratus-the-ai-that-defeated-humans-in-the-game-of-poker_560826

https://www.pokerlistings.com/from-loki-to-libratus-a-look-at-20-years-of-poker-ai-development

19/ Over nearly 3 weeks, Libratus played 120,000 hands of HUNL against the human professionals, using a three-pronged approach that included precomputing an overall strategy, adapting the strategy to actual gameplay, and learning from its opponent. Nom Brown applied the same techniques to develop Pluribus, later in Cicero.

in the conversational engine we use several techniques to retrieve and re-rank the search results

Rewrite the search query to improve semantic search hits through embedding similarity search
Perform a typical full text search
Extract entities to search Graph Triples in a Graph Database or knowledge graph

Use human preferences to build Gradient Boosting systems like xgboost to rerank results or use a simple decision tree for various conditions using if then else loops ?

20/ Note the following?common denominators

- Precomputing the overall strategy

- Adapting Strategy to ongoing gameplay?

- Learning from the opponent’s strategy

21/ Precomputing the overall strategy is the equivalent of bounding the problem space. How do you okay the opening moves in chess. Take some of the most common moves in chess like Sicilian Defence, Ruy Lopez, Queen’s Gambit. If you know from the past history there are a set of questions and similar / adjacent questions or follow up questions in the conversation tree , use your knowledge source to pre-generate questions and answers. Think of this as the equivalent of generating FAQs .

22/ Chunking strategy and creating multiple levels of precomputed information. Break documents into paragraphs indicated either by sections or separators and pass each chunk through an LLM, generate Questions based on the passage. Then pair the question and the chunk to generate embeddings. Also just generate embeddings for just the chunks, just the questions and have them stored as separate stores. These varying layers of embeddings can be used for a threshold driven cache & precomputed strategy.

23/ The mental model for this intuition of having multiple overlapping segments of information of varying length is the Swiss cheese model while designing systems. Our mechanism to prevent Covid was multi pronged the same way our ability to answer customer questions with the lowest latency, lowest cost, highest accuracy across the broadest range of topics with the best amount of data freshness needs to be multi-pronged.

24/ The same technique can be applied to LLM caching as well. Data feedback loops to get the system to adapting to the opponent’s strategy is paramount to this. In our case this is both a cost concern and also a method to improve latency, factuality, reduce hallucinations. A concept of how this would work in a multi-user conversation flow.

25/ 3 layered semantic cache.

26/ There are other interesting things that do not necessarily share a huge amount of similarities with Gameplaying AI but still relevant in the area of adapting strategy based on ongoing gameplay such as Selecting parts of the prompt to the LLM based on given user query Or at least classes of user query.

27/ The same principles of augmenting the working memory of the large language model which is scarce , expensive and mostly unreliable apply for Analytical and abstract tasks that govern conversation. Assume that your friendly chatbot that has to answer questions about a Phone Bill or Bank statement or Cloud invoice, there are explicit facts and there are implicit facts. For example - the bill may contain everyday data usage but perhaps not average data usage per day.

28/ In these cases it is worth thinking

what’s easier to teach the large language model; mathematics or how to use a calculator with the button average.

This is why Bard’s implicit code execution and ChatGPT’s code interpreter are exciting. Creating sandboxed code execution environments, Fast Provisioning Compute could be critical in unlocking the next wave of innovation.

29/ A simple way to understand this intuition is - a code generator model that can iteratively improve its responses without fine tuning through an internal monologue between the terminal’s output where it executes before sending it back to the user that sends a query.

Who knows if the next big leap is perhaps understanding the world through code execution and tools. I’m not the only person who’s thinking about this

To end the article here’s the AI meme of the year. Leave your comments below

Yeu Wen (耀榮) Mak (麥)

Augmenting and amplifying collective human decision-making under uncertainty and ambiguity conditions

1 年

Perhaps "Search, Retrieval, Re-Ranking and Synthesis" can be augmented with human knowledge in between Search and Retrieval as I tried to explain below. https://www.dhirubhai.net/feed/update/urn:li:activity:7163105883818782720

1 次回应

要查看或添加评论，请登录

Vamsi R.的更多文章

AI UX & Data Feedback Loops as a Moat while using Generative AI

2023年7月10日

AI UX & Data Feedback Loops as a Moat while using Generative AI

I was reading this guest post by Anshul Ramachandran on Latent Space, how can AI-UX become your moat. https://www.

2 条评论
Running a Social Network on Oracle Autonomous Transaction Processing + GraphQL + JoinMonster

2019年12月12日

Running a Social Network on Oracle Autonomous Transaction Processing + GraphQL + JoinMonster

Why the best place to run your mobile backend is oracle autonomous transaction processing. We at Oracle India, during…

3 条评论
A Visual Guide to Non-Blocking, Non-Oversubscribed, Flat, Hyperscale Cloud Networking in Oracle Cloud Infrastructure.

2019年10月30日

A Visual Guide to Non-Blocking, Non-Oversubscribed, Flat, Hyperscale Cloud Networking in Oracle Cloud Infrastructure.

In the previous post we spoke about Hardware Accelerated SDN and Off-Box Network and I/O Virtualization with SmartNICs…

2 条评论
A Visual Guide to Off-Box Network & IO Virtualization in Oracle Cloud Infrastructure and Why it matters ...

2019年10月28日

A Visual Guide to Off-Box Network & IO Virtualization in Oracle Cloud Infrastructure and Why it matters ...

The focus of this article is to simplistically and visually explain Off-Box I/O Virtualization and the role it plays in…

18 条评论
Steering Angle Generation for Camera Based ADAS Systems using Deep Learning — Behavioral Cloning

2017年9月4日

Steering Angle Generation for Camera Based ADAS Systems using Deep Learning — Behavioral Cloning

There are two different approaches in algorithm development for sensing and control of autonomous vehicles. Classical…
Robust lane finding techniques for ADAS using computer vision ?—?Python & Open CV

2017年9月1日

Robust lane finding techniques for ADAS using computer vision ?—?Python & Open CV

The article written briefly describes the methods used to process an incoming video signal from a camera , track and…

1 条评论
Bharat Stage-VI and What it Means to OEMs

2016年8月24日

Bharat Stage-VI and What it Means to OEMs

Introduction On Feb 19, 2016, the Ministry of Road Transport and Highways issued the first draft notification for…

29 条评论
The emission scandal, a nuanced view

2016年8月11日

The emission scandal, a nuanced view

The media and citizens have raised a hue and cry about emission scandals from various leading auto makers. After all…

10 条评论
Is the idea of using the cloud to replace vehicle ECUs feasible in the near future? What kind of ECU computations can be moved to the cloud?

2016年8月10日

Is the idea of using the cloud to replace vehicle ECUs feasible in the near future? What kind of ECU computations can be moved to the cloud?

The functionalities that are performance and safety critical cannot be moved to the cloud . The deciding factors are…

6 条评论
What is Hardware-in-the-Loop (HIL) testing?

2016年8月10日

What is Hardware-in-the-Loop (HIL) testing?

What is Hardware-in-the-Loop (HIL) testing? Introduction: Hardware in the loop testing ,abbreviated as HiL testing is a…

2 条评论

See all articles

What does a Generative AI Chatbot have in common with Gameplaying AI

Vamsi R.

AI Tech Lead, APAC Customer Engineering

TL;DR

LLM Aphorisms

领英推荐

Vamsi R.的更多文章

社区洞察

其他会员也浏览了

Is Generative AI a Post-Truth Technology?

Living on the edge: How edge cases will determine the future of generative AI

Artificial Intelligence is the future & the future is here!

AI – Part 1: Wavelength and AI visionary Walter de Brouwer chat Generative AI, xy.ai, Starlab and more.

The Important Difference Between Generative AI And AGI

Navigating Artificial Intelligence (AI) Terminologies – A Practical Guide by resolvent

AI & Our World: Moving Beyond the Clichés

Explainable AI: The Key to Unlocking Generative AI's Potential

Unlocking the Potential: A Deep Dive into Bard - An Early Experiment with Generative AI

Creators and AI

TL;DR

LLM Aphorisms

领英推荐

Vamsi R.的更多文章

AI UX & Data Feedback Loops as a Moat while using Generative AI

Running a Social Network on Oracle Autonomous Transaction Processing + GraphQL + JoinMonster

A Visual Guide to Non-Blocking, Non-Oversubscribed, Flat, Hyperscale Cloud Networking in Oracle Cloud Infrastructure.

A Visual Guide to Off-Box Network & IO Virtualization in Oracle Cloud Infrastructure and Why it matters ...

Steering Angle Generation for Camera Based ADAS Systems using Deep Learning — Behavioral Cloning

Robust lane finding techniques for ADAS using computer vision ?—?Python & Open CV

Bharat Stage-VI and What it Means to OEMs

The emission scandal, a nuanced view

Is the idea of using the cloud to replace vehicle ECUs feasible in the near future? What kind of ECU computations can be moved to the cloud?

What is Hardware-in-the-Loop (HIL) testing?

社区洞察

其他会员也浏览了

Is Generative AI a Post-Truth Technology?

Living on the edge: How edge cases will determine the future of generative AI

Artificial Intelligence is the future & the future is here!

AI – Part 1: Wavelength and AI visionary Walter de Brouwer chat Generative AI, xy.ai, Starlab and more.

The Important Difference Between Generative AI And AGI

Navigating Artificial Intelligence (AI) Terminologies – A Practical Guide by resolvent

AI & Our World: Moving Beyond the Clichés

Explainable AI: The Key to Unlocking Generative AI's Potential

Unlocking the Potential: A Deep Dive into Bard - An Early Experiment with Generative AI

Creators and AI