
Explore tens of thousands of sets crafted by our community.
Reinforcement Learning Concepts
15
Flashcards
0/15
Environment
The external system the agent interacts with, which provides states and rewards as feedback.
Monte Carlo Methods
A class of computational algorithms that rely on repeated random sampling to obtain numerical results and are used to model problems that are deterministic in principle.
Temporal Difference (TD) Learning
A method where the value function is updated incrementally based on the difference between consecutive predictions, without waiting for a final outcome.
Deep Reinforcement Learning
Combines deep neural networks with reinforcement learning principles to learn policies directly from high-dimensional sensory inputs.
State
A representation of the situation that the agent is in at a specific time. The state space includes all possible states.
Bellman Equation
A recursive definition that relates the value of a policy at one state to the values at other states, providing the basis for dynamic programming approaches.
Reinforcement Learning
A type of machine learning where an agent learns to make decisions by performing actions and receiving rewards or penalties.
Policy
A strategy followed by the agent to determine its actions based on the current state.
Action
A specific move or decision made by the agent that affects the state of the environment.
Value Function
A function that estimates how good it is for the agent to be in a given state, in terms of expected future rewards.
Agent
The entity that makes decisions and interacts with the environment to achieve certain goals.
Exploration vs. Exploitation
The dilemma of choosing between exploring new actions to find more information about the environment or exploiting known actions to maximize the immediate reward.
Markov Decision Process (MDP)
A mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker.
Reward
The feedback from the environment that quantifies the success of an agent's actions.
Q-value (Action-Value Function)
A function that estimates the value of taking a particular action from a particular state, based on expected future rewards.
© Hypatia.Tech. 2024 All rights reserved.