Discussing the article: "Neural networks made easy (Part 49): Soft Actor-Critic"

 

Check out the new article: Neural networks made easy (Part 49): Soft Actor-Critic.

We continue our discussion of reinforcement learning algorithms for solving continuous action space problems. In this article, I will present the Soft Actor-Critic (SAC) algorithm. The main advantage of SAC is the ability to find optimal policies that not only maximize the expected reward, but also have maximum entropy (diversity) of actions.

In this article, we will focus our attention on another algorithm - Soft Actor-Critic (SAC). It was first presented in the article "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor" (January 2018). The method was presented almost simultaneously with TD3. It has some similarities, but there are also differences in the algorithms. The main goal of SAC is to maximize the expected reward given the maximum entropy of the policy, which allows finding a variety of optimal solutions in stochastic environments.

Soft Actor-Critic uses an Actor with the stochastic policy. This means that the Actor in S state is able to choose the A' action from the entire action space with a certain Pa' probability. In other words, the Actor’s policy in each specific state allows us to choose not one specific optimal action, but any of the possible actions (but with a certain degree of probability). During the training, the Actor learns this probabilistic distribution of obtaining the maximum reward.

This property of a stochastic Actor policy allows us to explore different strategies and discover optimal solutions that may be hidden when using a deterministic policy. In addition, the stochastic Actor policy takes into account the uncertainty in the environment. In case of a noise or random factors, such policies can be more resilient and adaptive, since they can generate a variety of actions to effectively interact with the environment.

Author: Dmitriy Gizlyk

Reason: