Discussing the article: "Neural networks made easy (Part 46): Goal-conditioned reinforcement learning (GCRL)"

 

Check out the new article: Neural networks made easy (Part 46): Goal-conditioned reinforcement learning (GCRL).

In this article, we will have a look at yet another reinforcement learning approach. It is called goal-conditioned reinforcement learning (GCRL). In this approach, an agent is trained to achieve different goals in specific scenarios.

In this work, we decided to abandon separate training of the variational auto encoder and included its encoder directly in the Agent model. It should be said that this approach somewhat violates the principles of training an auto encoder. After all, the main idea of using any auto encoder is data compression without reference to a specific task. But now we are not faced with the task of training an encoder to solve several problems from the same source data.

Besides, we only supply the current state of the environment to the encoder input. In our case, these are historical data on the movement of the instrument price and parameters of the analyzed indicators. In other words, we exclude information about the account status. We assume that the scheduler (in this case, the encoder) will form the skill to be used based on historical data. This can be a policy of working in a rising, falling or flat market.

Based on information about the account status, we will create a subtask for the Agent to search for an entry or exit point.

Test graph


The positive aspects of using the GCRL method include a reduction in the time it takes to hold a position. During the test, the maximum position holding time was 21 hours and 15 minutes. The average time of holding a position is 5 hours 49 minutes. As you might remember, we set a penalty in the amount of 1/10 of the accumulated profit for each hour of holding for failure to complete the task of closing a position. In other words, after 10 hours of holding, the penalty exceeded the income from the position.

Author: Dmitriy Gizlyk

 
I could not reproduce your results, based on the mql5 download files and the historic and test data date ranges.
 

Nice article.

Nigel, you are not the only one.

It's been presented enough to prevent reproducibility unless you spend pretty long time to debug the code or discover its proper usage.

For example:

"After completing work on the EA for collecting the example database "GCRL\Research.mq5", we launch it in the slow optimization mode of the strategy tester"

Simple question is actually, what parameters are to be optimized?

 
Chris #:

Nice article.

Nigel, you are not the only one.

It's been presented enough to prevent reproducibility unless you spend pretty long time to debug the code or discover its proper usage.

For example:

"After completing work on the EA for collecting the example database "GCRL\Research.mq5", we launch it in the slow optimization mode of the strategy tester"

Simple question is actually, what parameters are to be optimized?

All parameters are default. You must set only Agent number for optimize. It use to set number of tester iterations.

 
Dmitriy Gizlyk #:

All parameters are default. You must set only Agent number for optimize. It use to set number of tester iterations.

Hi Dmitriy,

There must be something wrong with your library. In several tests I obtained the same results, having the same drawbacks.

The Test strategy generates two series of orders separated in time. First buy orders, then sell orders.

Sell orders are never being closed except the moment the testing period is over.

The same behaviour can be observed when testing your other strategies, so the bug must be in a class common to your strategies.

Another potential reason is some susceptibility to initial state of tests.

Find attached a report of my test.

Files: