Post

DDPG - Deep RL for Continuous Control

DDPG: Deep RL for Continuous Control

DQN works with discrete actions. For example, an agent may choose left or right.

Many control problems need continuous actions instead. A robot joint may need a torque value such as 0.37, not just action 0 or action 1.

DDPG is an Actor-Critic method designed for this kind of continuous control.

The Continuous Action Problem

If the action is continuous, we cannot simply compute:

1
max Q(state, action)

There are infinitely many possible actions.

DDPG solves this by using an actor network to output the action directly.

1
state -> actor -> continuous action

The critic then evaluates that action:

1
state + action -> critic -> Q-value

The Two Networks

DDPG has two main networks:

  • Actor: chooses a continuous action.
  • Critic: estimates how good that action is.

The actor is deterministic. For the same state, it gives the same action unless we add exploration noise.

1
2
action = actor(state)
q_value = critic(state, action)

The actor is trained to choose actions that the critic scores highly.

Replay Buffer and Target Networks

DDPG borrows two ideas from DQN:

  • Experience replay.
  • Target networks.

Experience replay stores transitions:

1
(state, action, reward, next_state)

Target networks make the learning target more stable. DDPG usually updates target networks slowly:

1
target <- tau * online + (1 - tau) * target

This is called a soft update.

Exploration

Because the actor is deterministic, exploration does not happen automatically.

DDPG usually adds noise to the action:

1
action = actor(state) + noise

At the beginning, the noise is larger. Later, it can be reduced as the policy improves.

Example: Pendulum

In the Pendulum environment, the agent controls torque. The goal is to swing the pendulum upright and keep it balanced.

This is a natural DDPG problem because the action is continuous:

1
action = torque value

A discrete method would need to split torque into bins. DDPG can output a smooth torque value directly.

When to Use DDPG

Use DDPG when:

  • Actions are continuous.
  • You need precise control.
  • You can train in simulation.
  • A deterministic policy is acceptable.

Avoid DDPG when the action space is small and discrete. DQN is usually simpler there.

Key Takeaways

  • DDPG is for continuous action spaces.
  • It uses an actor to output actions and a critic to score them.
  • It borrows replay buffers and target networks from DQN.
  • Exploration is added as noise on top of the actor’s action.
  • It is useful for control tasks such as pendulum balancing or robotics simulation.

Master continuous control with DDPG implementations in the RL-Tutorial-Series repository

This post is licensed under CC BY 4.0 by the author.