Discussing the article: "Neural networks made easy (Part 53): Reward decomposition"

 

Check out the new article: Neural networks made easy (Part 53): Reward decomposition.

We have already talked more than once about the importance of correctly selecting the reward function, which we use to stimulate the desired behavior of the Agent by adding rewards or penalties for individual actions. But the question remains open about the decryption of our signals by the Agent. In this article, we will talk about reward decomposition in terms of transmitting individual signals to the trained Agent.

We continue to explore reinforcement learning methods. As you know, all algorithms for training models in this area of machine learning are based on the paradigm of maximizing rewards from the environment. The reward function plays a key role in the model training process. Its signals are usually pretty ambiguous.

In an attempt to incentivize the Agent to show the desired behavior, we introduce additional bonuses and penalties into the reward function. For example, we often made the reward function more complex in an attempt to encourage the Agent to explore the environment and introduced penalties for inaction. At the same time, the architecture of the model and the reward function remain the fruit of the subjective considerations of the model architect.


During the training, the model may encounter various difficulties even with a careful design approach. The agent may not achieve the desired results for many different reasons. But how can we understand that the Agent correctly interprets our signals in the reward function? In an attempt to understand this issue, there is a desire to divide the reward into separate components. Using decomposed rewards and analyzing the influence of individual components can be very useful in finding ways to optimize the model training. This allow us to better understand how different aspects influence the Agent behavior, identify the causes of issues and effectively adjust the model architecture, training process or reward function.

Author: Dmitriy Gizlyk

 

Thank you Dmitry, I clicked  on your seller profile hoping to find some nn EAs I could test.

I have taken a udemy MQL5 course on nn, now trying to go deeper. I am Starting with your series of articles.