Main Categories of RL Methods
- (Interlaken, Switzerland - Alvin Wei-Cheng Wong)
- Overview
Reinforcement learning (RL) is a machine learning (ML) technique that trains software to make decisions that achieve the best results. It's a sub-field of ML that allows AI-based systems to take actions in a dynamic environment.
RL is based on rewarding desired behaviors and punishing undesired ones. It's a learning paradigm that learns to optimize sequential decisions, such as daily stock replenishment decisions.
In RL, the entity being trained, called the reinforcement learning agent, can perceive and interpret its environment, take actions, and learn through trial and error.
The agent is trained on real-life scenarios to make a sequence of decisions. It receives either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.
RL mimics the trial-and-error learning process that humans use to achieve their goals. RL has three types:
- Policy-based RL: Uses a policy or deterministic strategy that maximizes cumulative reward
- Value-based RL: Tries to maximize an arbitrary value function
- Model-based RL: Creates a virtual model for a certain environment and the agent learns to perform within those constraints
Please refer to the following for more information:
- Wikipedia: Reinforcement Learning
- The Three Approaches of RL
The three approaches to RL are: value-based, policy-based, and model-based learning. Agent, State, Reward, Environment, value function model of the environment, model-based methods, are some important terms used in RL learning methods.
The goal of RL is to choose its actions in such a way that the cumulative reward is maximized. So choosing the best reward now may not be the best decision in the long run. That is the greedy approach may not be optimal.
RL, a type of ML in which an agent takes actions in an environment designed to maximize its cumulative reward. RL is based on rewarding desired behavior or punishing undesired behavior. Instead of one input producing one output, the algorithm produces multiple outputs and is trained to choose the correct output based on certain variables.
RL is a ML technique in which computer agents learn to perform tasks through trial and error interactions with a dynamic environment. This learning approach enables the agent to make a sequence of decisions that maximizes the reward metric for a task without human intervention and without being explicitly programmed to complete the task.
[More to come ...]