The Markov Decision Process (MDP)
- Overview
A Markov Decision Process (MDP) is a mathematical framework used to model decision-making problems where an agent interacts with an environment by taking actions in different states, receiving rewards for those actions, and aiming to maximize the cumulative reward over time; it is the foundational concept behind Reinforcement Learning (RL), which is a machine learning technique where an agent learns to make optimal decisions in such an environment by trial and error, based on the feedback from rewards received in each state.
The core principle of an MDP is the "Markov property," which states that the next state of the system only depends on the current state and the action taken, not on the entire history of previous states.
MDPs can address most reinforcement learning (RL) problems.
- Main Components of an MDP
A Markov Decision Process (MDP) is a mathematical framework that models decision-making for dynamic systems. It's used when the outcomes are either random or controlled by a decision maker.
MDPs are discrete-time stochastic control processes. They model decision making in discrete, stochastic, sequential environments. The model is based on a decision maker, or agent, who inhabits an environment that changes state randomly in response to the agent's actions.
MDPs consist of four essential elements: States, Model, Actions, Rewards. The agent's goal is to learn a policy that dictates the action to be taken in each state to maximize cumulative rewards.
- States (S): Different possible situations or configurations the agent can be in.
- Actions (A): The choices the agent can make in a given state.
- Rewards (R): The feedback the agent receives after taking an action in a state.
- Transition probabilities (P): The probability of moving from one state to another after taking a specific action.
- Goal in RL: The goal of RL is to learn a "policy" which maps each state to the best action to take in order to maximize the expected cumulative reward over time.
[More to come ...]