Reinforcement Learning : Monte Carlo Method
Baijayanta Roy
Baijayanta Roy
AI/ML| GenAI| Payments |Credit Cards | Digital Banking | |Technology Leader | Cloud Computing | Quantum Computing |API | Tokenization| Product Management | Technology Architecture and Solution
key characteristics of Monte Carlo (MC) method:
There is no model (agent does not know state MDP transitions)
agent learn from sampled experience
learn state value v(s) under policy π by experiencing average return from all sampled episodes (value = average return)
only after a complete episode, values are updated (because of this algorithm convergence is slow and update happens after a episode is Complete)
There is no bootstrapping
Only can be used in episodic problems