Discussing the article: "Neural networks made easy (Part 45): Training state exploration skills"

 

Check out the new article: Neural networks made easy (Part 45): Training state exploration skills.

Training useful skills without an explicit reward function is one of the main challenges in hierarchical reinforcement learning. Previously, we already got acquainted with two algorithms for solving this problem. But the question of the completeness of environmental research remains open. This article demonstrates a different approach to skill training, the use of which directly depends on the current state of the system.

The first results were worse than our expectations. Positive results include a fairly uniform distribution of the skills used in the test sample. This is where the positive results of our test end. After a number of iterations of training the auto encoder and the agent, we were still unable to obtain a model capable of generating profit on the training set. Apparently, the problem was the auto encoder's inability to predict states with sufficient accuracy. As a result, the balance curve is far from the desired result.

To test our assumption, an alternative agent training EA "EDL\StudyActor2.mq5" was created. The only difference between the alternative option and the previously considered one is the algorithm for generating the reward. We also used the cycle to predict changes in account status. This time we used the relative balance change indicator as a reward.

      ActorResult = vector<float>::Zeros(NActions);
      for(action = 0; action < NActions; action++)
        {
         reward = GetNewState(Buffer[tr].States[i].account, action, prof_1l);
         ActorResult[action] = reward[0]/PrevBalance-1.0f;
        }

The agent trained using the modified reward function showed a fairly flat increase in profitability throughout the testing period. 

Balance curve graph on the test sample
Test results

Author: Dmitriy Gizlyk