Discussing the article: "Neural networks made easy (Part 55): Contrastive intrinsic control (CIC)"

 

Check out the new article: Neural networks made easy (Part 55): Contrastive intrinsic control (CIC).

Contrastive training is an unsupervised method of training representation. Its goal is to train a model to highlight similarities and differences in data sets. In this article, we will talk about using contrastive training approaches to explore different Actor skills.

The Contrastive Intrinsic Control algorithm begins with training the Agent in the environment using feedback and obtaining trajectories of states and actions. Representation training is then performed using Contrastive Predictive Coding (CPC), which motivates the Agent to retrieve key features from states and actions. Representations are formed that take into account the dependencies between successive states.

Intrinsic rewards play an important role in determining which behavioral strategies should be maximized. CIC maximizes the entropy of transitions between states, which promotes diversity in Agent behavior. This allows the Agent to explore and create a variety of behavioral strategies.

After generating a variety of skills and strategies, the CIC algorithm uses the Discriminator to instantiate the skill representations. The Discriminator aims to ensure that states are predictable and stable. In this way, the Agent learns to "use" skills in predictable situations.

The combination of exploration motivated by intrinsic rewards and the use of skills for predictable actions creates a balanced approach for creating varied and effective strategies.

As a result, the Contrastive Predictive Coding algorithm encourages the Agent to detect and learn a wide range of behavioral strategies, while ensuring stable learning. Below is the custom algorithm visualization.

Custom algorithm visualization

Author: Dmitriy Gizlyk

Reason: