toyproblems.xyz

  • Value Estimation with Monte Carlo Methods: Learning from Experience

    We are now going to tackle for the first time the challenge to determine the value function when we don’t have access to a model. This will be very useful to move on to more complex environments where the interactions are much more complex than the Corridor example we have been working with so far.…

  • Understanding the Value Function in Reinforcement Learning: A Corridor Example

    Value functions are a fundamental concept in Reinforcement Learning (RL). A solid grasp of value functions is essential for understanding more advanced RL algorithms. In this post, we explore value functions through a simple, custom environment to make their core ideas intuitive and accessible. We also provide the python code to replicate our results here.…

  • Finding a Reinforcement Learning Policy with a Markov Decision Process: Generalized Policy Iteration (GPI)

    How to find a policy when you have a model of your Markov Decison Process (MDP)? There is a number of methods to do this that all fall under the umbrella of Generalized Policy Iteration (GPI). Here we will go through the two most notable methods – Policy Iteration and Value Iteration – and we…


Newsletter

Subscribe to stay up-to-date with my latest creative projects, insights, and tips.

I consent to use of my email address for the purpose of receiving newsletters as described in Privacy Policy, which I have read. I may withdraw my consent at any time.