The Markov Decision Process (MDP)
- Overview
A Markov Decision Process (MDP) is a mathematical framework that models decision-making for dynamic systems. It's used when the outcomes are either random or controlled by a decision maker.
MDPs are discrete-time stochastic control processes. They model decision making in discrete, stochastic, sequential environments. The model is based on a decision maker, or agent, who inhabits an environment that changes state randomly in response to the agent's actions.
MDPs consist of four essential elements: States, Model, Actions, Rewards.
The agent's goal is to learn a policy that dictates the action to be taken in each state to maximize cumulative rewards.
MDPs can address most reinforcement learning (RL) problems. Some real-world examples of MDPs include: Harvesting, Agriculture, Water resources.
The Bellman equation is a fundamental equation in AI that is used to define the optimal value function for a given MDP. The equation is named after Richard Bellman, who first proposed it in the 1950s.
[More to come ...]