Explore tens of thousands of sets crafted by our community.
Reinforcement Learning Concepts
15
Flashcards
0/15
Agent
The entity that makes decisions and interacts with the environment to achieve certain goals.
Environment
The external system the agent interacts with, which provides states and rewards as feedback.
State
A representation of the situation that the agent is in at a specific time. The state space includes all possible states.
Action
A specific move or decision made by the agent that affects the state of the environment.
Reward
The feedback from the environment that quantifies the success of an agent's actions.
Policy
A strategy followed by the agent to determine its actions based on the current state.
Value Function
A function that estimates how good it is for the agent to be in a given state, in terms of expected future rewards.
Q-value (Action-Value Function)
A function that estimates the value of taking a particular action from a particular state, based on expected future rewards.
Exploration vs. Exploitation
The dilemma of choosing between exploring new actions to find more information about the environment or exploiting known actions to maximize the immediate reward.
Reinforcement Learning
A type of machine learning where an agent learns to make decisions by performing actions and receiving rewards or penalties.
Markov Decision Process (MDP)
A mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker.
Temporal Difference (TD) Learning
A method where the value function is updated incrementally based on the difference between consecutive predictions, without waiting for a final outcome.
Monte Carlo Methods
A class of computational algorithms that rely on repeated random sampling to obtain numerical results and are used to model problems that are deterministic in principle.
Bellman Equation
A recursive definition that relates the value of a policy at one state to the values at other states, providing the basis for dynamic programming approaches.
Deep Reinforcement Learning
Combines deep neural networks with reinforcement learning principles to learn policies directly from high-dimensional sensory inputs.
© Hypatia.Tech. 2024 All rights reserved.