3️⃣DDPG
DDPG (Deep Deterministic Policy Gradient) is a model-free, online, off-policy reinforcement learning algorithm. It combines the actor-critic architecture with the deterministic policy gradient (DPG) algorithm and is used to learn continuous actions.
The main idea behind DDPG is to use a deep neural network, called the actor network, to learn the policy function, which maps states to actions. The actor network is trained to maximize the expected cumulative reward. The algorithm also uses a second neural network, called the critic network, to learn the action-value function. The critic network is used to provide a value for the policy function, which is used to update the actor network.
DDPG uses two key techniques to improve the stability and performance of the algorithm:
Experience replay: DDPG uses a replay buffer to store the experiences of the agent. This allows the agent to learn from previous experiences and helps to break the correlation between consecutive samples.
Target networks: DDPG uses target networks for both the actor and critic networks. These are fixed copies of the original networks, which are used to provide a stable target for the learning process.
DDPG is well suited for problems with high-dimensional, continuous action spaces, such as robotic control, game playing and simulations.
Last updated