You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Check out the new article: Neural networks made easy (Part 56): Using nuclear norm to drive research.
The study of the environment in reinforcement learning is a pressing problem. We have already looked at some approaches previously. In this article, we will have a look at yet another method based on maximizing the nuclear norm. It allows agents to identify environmental states with a high degree of novelty and diversity.
Reinforcement learning is based on the paradigm of independent exploration of the environment by the Agent. The Agent affects the environment, which leads to its change. In return, the Agent receives some kind of reward.
This is where the two main problems of the reinforcement learning are highlighted: environment exploration and the reward function. A correctly structured reward function encourages the Agent to explore the environment and search for the most optimal behavioral strategies.
However, when solving most practical problems, we are faced with sparse external rewards. To overcome this barrier, the use of so-called internal rewards was proposed. They allow the Agent to master new skills that may be useful for obtaining external rewards in the future. However, internal rewards may be noisy due to environmental stochasticity. Directly applying noisy forecast values to observations can negatively impact the efficiency of Agent policy training. Moreover, many methods use L2 norm or variance to measure the novelty of a study, which increases noise due to the squaring operation.
Author: Dmitriy Gizlyk