Sections

Personal tools

You are here: Home › Research Trends & Opportunities › New Media and New Digital Economy › AI, Machine Learning, Deep Learning, and Neural Networks › ML Research and Applications › ML Techniques (Supervised Learning, etc.) › Reinforcement Learning Methods › The Markov Decision Process (MDP)

News: 新興資訊科研會台美加交流 Aug 18, 2012; 「第一屆青年研發學者會議」 8月18、19兩日在哈佛大學工程與應用科學學院Maxwell Dworkin Building舉行 Dec 12, 2011; 2009年第9屆新興資訊與科技研討會會議(EITC-2009)紀實 Sep 27, 2009; 第九屆新興資訊科技會議落幕 Aug 15, 2009; More news…

The Markov Decision Process (MDP)

: [Markov Decision Process - Wikipedia]

- Overview

A Markov Decision Process (MDP) is a mathematical framework that models decision-making for dynamic systems. It's used when the outcomes are either random or controlled by a decision maker.

MDPs are discrete-time stochastic control processes. They model decision making in discrete, stochastic, sequential environments. The model is based on a decision maker, or agent, who inhabits an environment that changes state randomly in response to the agent's actions.

MDPs consist of four essential elements: States, Model, Actions, Rewards.

The agent's goal is to learn a policy that dictates the action to be taken in each state to maximize cumulative rewards.

MDPs can address most reinforcement learning (RL) problems. Some real-world examples of MDPs include: Harvesting, Agriculture, Water resources.

The Bellman equation is a fundamental equation in AI that is used to define the optimal value function for a given MDP. The equation is named after Richard Bellman, who first proposed it in the 1950s.

[More to come ...]

Document Actions

Send this