Discussing the article: "Neural networks made easy (Part 56): Using nuclear norm to drive research"

 

Check out the new article: Neural networks made easy (Part 56): Using nuclear norm to drive research.

The study of the environment in reinforcement learning is a pressing problem. We have already looked at some approaches previously. In this article, we will have a look at yet another method based on maximizing the nuclear norm. It allows agents to identify environmental states with a high degree of novelty and diversity.

Reinforcement learning is based on the paradigm of independent exploration of the environment by the Agent. The Agent affects the environment, which leads to its change. In return, the Agent receives some kind of reward.

This is where the two main problems of the reinforcement learning are highlighted: environment exploration and the reward function. A correctly structured reward function encourages the Agent to explore the environment and search for the most optimal behavioral strategies.

However, when solving most practical problems, we are faced with sparse external rewards. To overcome this barrier, the use of so-called internal rewards was proposed. They allow the Agent to master new skills that may be useful for obtaining external rewards in the future. However, internal rewards may be noisy due to environmental stochasticity. Directly applying noisy forecast values to observations can negatively impact the efficiency of Agent policy training. Moreover, many methods use L2 norm or variance to measure the novelty of a study, which increases noise due to the squaring operation.


To solve this problem, the article "Nuclear Norm Maximization Based Curiosity-Driven Learning" proposes a new algorithm for stimulating the Agent's curiosity based on nuclear norm maximization (NNM). Such an internal reward is able to evaluate the novelty of environmental exploration more accurately. At the same time, it allows for high immunity to noise and spikes.

Author: Dmitriy Gizlyk

Reason: