Monte Carlo Tree Search
Yeshwanth Nagaraj
Democratizing Math and Core AI // Levelling playfield for the future
Monte Carlo Tree Search (MCTS) is an algorithm commonly used for finding optimal decisions in a given domain
MCTS has gained widespread attention because of its simplicity and effectiveness, particularly in scenarios where the decision space is too large to exhaustively search. It is also a suitable choice for problems where the transition model and reward function are not known or too complex to describe with a simple formula.
How MCTS Works
The MCTS algorithm consists of four main steps repeated multiple times:
- Selection: Starting at the root node (representing the current game state), a tree policy guides the selection of child nodes down to a leaf node. The tree policy is typically based on a balance between exploiting nodes
with high average reward and exploring less-visited nodes. - Expansion: Once a leaf node is reached, one or more child nodes are added to expand the search tree
, based on the available actions. - Simulation: A simulation is run from the new leaf node according to a default policy to produce an outcome. This step is also called a "rollout."
- Backpropagation: The result of the rollout is backpropagated up the tree to update the information stored in the nodes visited during the selection phase.
领英推è
After a predefined number of iterations
Example Python Code
Here's a simplified Python code snippet illustrating a very basic MCTS. Note that this example is very minimal and doesn't include all the optimizations you might find in a full-fledged implementation.
import random
class Node:
def __init__(self, state, parent=None):
self.state = state
self.parent = parent
self.children = []
self.visits = 0
self.value = 0
def rollout(node):
# Simulate a game from this node to a terminal state and return the reward
# For now, let's return a random reward
return random.choice([-1, 1])
def select(node):
# Implement your tree policy here to select a child node from the given node
# For simplicity, let's choose a random child node
return random.choice(node.children) if node.children else None
def expand(node):
# Generate a child node and add it to the children of the current node
# For this example, let's assume the state is an integer and the action is to add 1 or 2
new_state = node.state + random.choice([1, 2])
child = Node(new_state, parent=node)
node.children.append(child)
return child
def backpropagate(node, reward):
# Propagate the reward back to the root
while node:
node.visits += 1
node.value += reward
node = node.parent
def mcts(root, iterations=1000):
for _ in range(iterations):
node = root
while node.children:
node = select(node)
if node.visits == 0:
reward = rollout(node)
else:
node = expand(node)
reward = rollout(node)
backpropagate(node, reward)
# Example usage
root = Node(state=0)
mcts(root)
Intern at Scry AI
1 å¹´Great share. Turing’s Imitation Game allowed the judge to ask the man and the computer questions related to emotions, creativity, and imagination. Hence, such AI gradually began to be known as Artificial General Intelligence (AGI). In fact, in the movie “2001: A Space Odysseyâ€, the computer, HAL 9000, was depicted as an AGI computer that exhibited creativity and emotions. However, the AGI systems of the early 1970s were limited to solving rudimentary problems because of the high cost of computing power and the lack of understanding of human thought processes. Hence, the hype regarding AI went bust by 1975 and the U.S. government withdrew funding. This led to the first “AI winter†where research in AI declined precipitously. Although significant advances were made during this period (e.g., the development of Multilayer Perceptrons and Recurrent Neural Networks), most of them went unnoticed. Eventually, researchers decided to constrain the notion of an AI system to the ability of performing a non-trivial human task accurately. And they started investigating AI systems that can be used for specific purposes, which can reduce human labor and time. More about this topic: https://lnkd.in/gPjFMgy7