2️⃣SARSA
SARSA is a popular on-policy reinforcement learning algorithm. It stands for State-Action-Reward-State-Action. The algorithm estimates the action-value function for the current policy, which is the expected return for taking a certain action in a certain state and following the current policy afterwards.
SARSA is similar to Q-Learning, another popular reinforcement learning algorithm, but there are some key differences. Q-Learning is an off-policy algorithm and it estimates the action-value function for the optimal policy, whereas SARSA is an on-policy algorithm and it estimates the action-value function for the current policy.
The main idea behind SARSA is to update the Q-table based on the action that the agent will take in the next state, rather than the action that has the highest Q-value. This allows the algorithm to take into account the current policy and adapt to it.
Example
Python code
Last updated