Discussing the article: "Neural networks made easy (Part 50): Soft Actor-Critic (model optimization)"
The picture can also be simply dragged into the text or pasted with Ctrl+V
Research fails to achieve any positive transactions. There are no files in the data directory. The SoftAC.bd file in the shared data folder appears to be empty (12 bytes). Study is not clinging to the chart. Can you tell me what to do?
1. The difference of positive deals at the first passes of Research is quite natural, because absolutely random networks are used. And the trades will be just as random. When launching the Expert Advisor, set MinProfit as negative as possible. Then the SoftAC.bd file will be filled with examples for initial training of the model.
2. There is a check in the Study file. It does not work in the absence of examples in SoftAC.bd. It does not have data for model training.
Thanks, it helped. The green dots started to appear. I didn't realise that MinProfit could be set in the negative range. But there is one more question. Do I have to delete the first database manually? And which file?
The entire example database is in SoftAC.bd. I deleted the first one manually. But there is a constant MaxReplayBuffer 500 in the Expert Advisor, which limits the size of the examples database to 500 trajectories. If you want, you can change and recompile the files. This allows you to keep only the last 500 trajectories in the sample database.
The entire example database is in SoftAC.bd. I deleted the first one manually. But the Expert Advisor has a constant MaxReplayBuffer 500, which limits the size of the database of examples to 500 trajectories. If you want, you can change and recompile the files. This allows you to keep only the last 500 trajectories in the sample database.
Thanks again.
Dmitry, when filling the examples database and adding new examples, are the oldest examples deleted or are they deleted accidentally? Or is it necessary to have a database of examples for the entire number of trades, taking into account the additions in training, that is a base of say 1000 trajectories?
And it is written that you did the test with the Test.mqh Expert Advisor (after 500 000 iterations of training) 10 times. if I understood correctly. And then you said that you did the collection - training - test 15 more times (cycles). And you got a working model. So I don't understand, did you run the Test.mqh Expert Advisor 10 times at each stage of the cycle or not? The problem I have is that if I do so, I have more negative examples in the base and the Expert Advisor eventually starts trading in the negative.
Let's say I have collected a base of 200 trajectories. I trained it for 100,000 iterations. Then I added 10 passes from the test. and Research.mqh added 10-15 new examples to the database. I set the MinProfit boundary for example -3000.
I do the next training (100 000 iterations). Again I add 10 test passes and 10-15 from Research.mqh. MinProfit I set for example -2500.
I train again (100 000 iterations). Again 10 tests and 10-15 examples from Research.mqh. MinProfit=-2000.
And so on. Do I understand? What confuses me is that the test often gives very large negative passes with -7000 or even -9000. And there will be a lot of them in the base. Isn't the network trained to trade in the negative on purpose?
And what to do if the test passes give a result worse than the previous time? Should I change MinProfit to the negative side? And what to do if Research.mqh cannot find and add anything to the database for 100 passes with a specified limit (for example, MinProfit=-500) ?
The oldest ones are deleted. It is organised according to FIFO principle - first in-first out.
And here it is written that you did the test with the Test.mqh Expert Advisor (after 500 000 iterations of training) 10 times. if I understood correctly. And then you said that you did collection - training - test 15 more times (cycles). And you got a working model. So I don't understand, did you run the Test.mqh Expert Advisor 10 times at each stage of the cycle or not? The problem I have is that if I do so, I get more negative examples in the base and the Expert Advisor eventually starts trading in the negative.
I conditionally make 10 single runs in the strategy tester to estimate the limits of scatter of the model results. And I choose the upper quantile to select the best trajectories during the subsequent collection of trajectories in the optimisation mode.

- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: Neural networks made easy (Part 50): Soft Actor-Critic (model optimization).
In the previous article, we implemented the Soft Actor-Critic algorithm, but were unable to train a profitable model. Here we will optimize the previously created model to obtain the desired results.
We continue to study the Soft Actor-Critic algorithm. In the previous article, we implemented the algorithm but were unable to train a profitable model. Today we will consider possible solutions. A similar question has already been raised in the article "Model procrastination, reasons and solutions". I propose to expand our knowledge in this area and consider new approaches using our Soft Actor-Critic model as an example.
Before we move directly to optimizing the model we built, let me remind you that Soft Actor-Critic is a reinforcement learning algorithm for stochastic models in a continuous action space. The main feature of this method is the introduction of an entropy component into the reward function.
Using a stochastic Actor policy allows the model to be more flexible and capable of solving problems in complex environments where some actions may be uncertain or incapable of defining clear rules. This policy is often more robust when dealing with data containing a lot of noise since it takes into account the probabilistic component and is not tied to clear rules.
Author: Dmitriy Gizlyk