Discussing the article: "Neural networks made easy (Part 66): Exploration problems in offline learning"

 

Check out the new article: Neural networks made easy (Part 66): Exploration problems in offline learning.

Models are trained offline using data from a prepared training dataset. While providing certain advantages, its negative side is that information about the environment is greatly compressed to the size of the training dataset. Which, in turn, limits the possibilities of exploration. In this article, we will consider a method that enables the filling of a training dataset with the most diverse data possible.

The ExORL method can be divided into 3 main stages. The first stage is the collection of unlabeled exploratory data. This stage can use various unsupervised learning algorithms. The authors of the method do not limit the range of applicable algorithms. Moreover, in the process of interaction with the environment, at each episode, we use a policy π, depending on the history of previous interactions. Each episode is saved in the dataset as the sequence of a state St, action At and subsequent state St+1. The collection of training data continues until the training dataset is completely filled. The size of this training dataset is limited by the technical specifications or available resources.

After collecting a dataset of states and actions, the next stage is to relable the data using a given reward function. This stage implies the evaluation of the reward for each tuple in the dataset.

Practical experience shows the possibility of parallel use in one replay buffer collected by different methods. I used both the trajectories collected by the previously discussed EA Research.mq5 and the EA ResearchExORL.mq5. The first one indicates the advantages and disadvantages of the learned Actor policy. The second allows us to explore the environment as much as possible and evaluate unaccounted opportunities.

In the process of iterative model training, I managed to improve its performance.

Test results

Test results

While there was a general decrease in the number of trades during the test period by 3 times (56 versus 176), profits increased almost 3 times. The amount of the maximum profitable trade has more than doubled. And the average profitable trade increased by 5 times. Furthermore, we observe an increase in the balance throughout the entire testing period. As a result, the profit factor of the model has increased from 1.3 to 2.96. 

Author: Dmitriy Gizlyk

 
Dmitry hello. It says that you used 5 agents and collected 100 passes with them. Did you run the collection 20 times? Simply otherwise 5 agents will collect only 5 passes. Besides, if you run the Expert Advisor with the same parameters, it does not recalculate, but pulls the result from the cache. Or did you shift these 5 agents at each subsequent collection? Please explain this point.
 
Viktor Kudriavtsev #:
Dmitry hello. It says that you used 5 agents and collected 100 passes with them. Did you run the collection 20 times? Simply otherwise 5 agents will collect only 5 passes. Besides, if you run the Expert Advisor with the same parameters, it does not recalculate, but pulls the result from the cache. Or did you shift these 5 agents at each subsequent collection? Please explain this point.

I ran it 20 times, but cleared the cache before running it. You can't shift agents because the model file is bound to the agent number. Therefore, if you shift agents each time a new random model will be created and we will not get the desired effect.

 
Dmitriy Gizlyk #:

I ran it 20 times, but cleared the cache before starting. You can't shift agents because the model file is linked to the agent number. Therefore, if the agents are shifted, a new random model will be created each time and we will not get the desired effect.

Hi Dmitriy, I did it by using this... will it have the same effect?

input ENUM_TIMEFRAMES TimeFrame = PERIOD_H1;

input double MinProfit = 10;
input int Agent = 1;
input int Optimisation = 1;

then set agent to 5 and Optimisation to 20
Total of 100...


 
JimReaper #:
Hi Dmitriy, I did it by using this... will it have the same effect?

input ENUM_TIMEFRAMES TimeFrame = PERIOD_H1;

input double MinProfit = 10;
input int Agent = 1;
input int Optimisation = 1;

then set agent to 5 and Optimisation to 20
Total of 100...


Hi,
How many cores have you used?

 
Dmitriy Gizlyk #:

Hi,
How many cores have you used?

I only used 4 cores.


Files:
Dimi_1.png  2 kb
Dimi_2.png  5 kb
 
JimReaper #:
I only used 4 cores.


I don't know how MetaTrader Tester selects inputs for each core. Main idea in online study use pretrained model from one pass to another. But if Tester run Optimithation 1..4 to Agent 1 at one pass the they all use random (not pretrained) model.

 
Dmitriy Gizlyk #:

I don't know how MetaTrader Tester selects inputs for each core. Main idea in online study use pretrained model from one pass to another. But if Tester run Optimithation 1..4 to Agent 1 at one pass the they all use random (not pretrained) model.

Understood! Thank you Very Much!

I also added some indicators and Parameters, total 27 BarDescr.... Momentum, Bands & Ichimoku Kinko Hyo =)

int OnInit()

{

Set symbol and refresh

if(! Symb.Name(_Symbol))

return INIT_FAILED;

Symb.Refresh();

//---

if(! RSI. Create(Symb.Name(), TimeFrame, RSIPeriod, RSIPrice))

return INIT_FAILED;

//---

if(! CCI.Create(Symb.Name(), TimeFrame, CCIPeriod, CCIPrice))

return INIT_FAILED;

//---

if(! ATR. Create(Symb.Name(), TimeFrame, ATRPeriod))

return INIT_FAILED;

//---

if(! MACD. Create(Symb.Name(), TimeFrame, FastPeriod, SlowPeriod, SignalPeriod, MACDPrice)))

return INIT_FAILED;

//---

if (! Momentum.Create(Symb.Name(), TimeFrame, MomentumMaPeriod, MomentumApplied))

return INIT_FAILED;

Initialise the Ichimoku Kinko Hyo indicator

if (! Ichimoku.Create(Symb.Name(), TimeFrame, Ichimokutenkan_senPeriod, Ichimokukijun_senPeriod, Ichimokusenkou_span_bPeriod)))

return INIT_FAILED;

//---

if (! Bands.Create(Symb.Name(), TimeFrame, BandsMaPeriod, BandsMaShift, BandsDeviation, BandsApplied))

return INIT_FAILED;

//---

if(! RSI. BufferResize(HistoryBars) || ! CCI.BufferResize(HistoryBars) ||

! ATR. BufferResize(HistoryBars) || ! MACD. BufferResize(HistoryBars))

{

PrintFormat("%s -> %d", __FUNCTION__, __LINE__);

return INIT_FAILED;

}

//---


void OnTick()

{

//---

if(! IsNewBar())

return;

//---

int bars = CopyRates(Symb.Name(), TimeFrame, iTime(Symb.Name(), TimeFrame, 1), HistoryBars, Rates);

if(! ArraySetAsSeries(Rates, true))

return;

//---

RSI. Refresh();

CCI.Refresh();

ATR. Refresh();

MACD. Refresh();

Symb.Refresh();

Momentum.Refresh();

Bands.Refresh();

Symb.RefreshRates();

Refresh Ichimoku values for the current bar

Ichimoku.Refresh();

--- History Data

float atr = 0;

for (int b = 0; b < (int)HistoryBars; b++)

{

float open = (float)Rates[b].open;

float close = (float)Rates[b].close;

float rsi = (float)RSI. Main(b);

float cci = (float)CCI.Main(b);

atr = (float)ATR. Main(b);

float macd = (float)MACD. Main(b);

float sign = (float)MACD. Signal(b);

float mome = (float)Momentum.Main(b);

float bandzup = (float)Bands.Upper(b);

float bandzb = (float)Bands.Base(b);

float bandzlo = (float)Bands.Lower(b);

float tenkan = (float)Ichimoku.TenkanSen(0); Use the calculated value

float kijun = (float)Ichimoku.KijunSen(1); Use the calculated value

float senkasa = (float)Ichimoku.SenkouSpanA(2); Use the calculated value

float senkb = (float)Ichimoku.SenkouSpanB(3); Use the calculated value

Check for EMPTY_VALUE and division by zero

if (rsi == EMPTY_VALUE || cci == EMPTY_VALUE || atr == EMPTY_VALUE || macd == EMPTY_VALUE ||

sign == EMPTY_VALUE || mome == EMPTY_VALUE || bandzup == EMPTY_VALUE || bandzb == EMPTY_VALUE || bandzb == EMPTY_VALUE ||

bandzlo == EMPTY_VALUE || tenkan == EMPTY_VALUE || kijun == EMPTY_VALUE || senkasa == EMPTY_VALUE || senkasa == EMPTY_VALUE ||

senkb == EMPTY_VALUE || kijun == 0.0 || senkb == 0.0)

{

continue;

}

Ensure buffers are not resized within the loop

int shift = b * BarDescr;

sState.state[shift] = (float)(Rates[b].close - open);

sState.state[shift + 1] = ((float)(Rates[b].close - open) + (tenkan - kijun)) / 2.0f;

sState.state[shift + 2] = (float)(Rates[b].high - open);

sState.state[shift + 3] = (float)(Rates[b].low - open);

sState.state[shift + 4] = (float)(Rates[b].high - close);

sState.state[shift + 5] = (float)(Rates[b].low - close);

sState.state[shift + 6] = (tenkan - kijun);

sState.state[shift + 7] = (float)(Rates[b].tick_volume / 1000.0f);

sState.state[shift + 8] = ((float)(Rates[b].high) - (float)(Rates[b].low));

sState.state[shift + 9] = (bandzup - bandzlo);

sState.state[shift + 10] = rsi;

sState.state[shift + 11] = cci;

sState.state[shift + 12] = atr;

sState.state[shift + 13] = macd;

sState.state[shift + 14] = sign;

sState.state[shift + 15] = mome;

sState.state[shift + 16] = (float)(Rates[b].open - tenkan);

sState.state[shift + 17] = (float)(Rates[b].open - kijun);

sState.state[shift + 18] = (float)(Rates[b].open - bandzb);

sState.state[shift + 19] = (float)(Rates[b].open - senkasa);

sState.state[shift + 20] = (float)(Rates[b].open - senkb);

sState.state[shift + 21] = (float)(Rates[b].close - tenkan);

sState.state[shift + 22] = (float)(Rates[b].close - kijun);

sState.state[shift + 23] = (float)(Rates[b].close - bandzb);

sState.state[shift + 24] = (float)(Rates[b].close - senkasa);

sState.state[shift + 25] = (float)(Rates[b].close - senkb);

sState.state[shift + 26] = senkasa - senkb;

//---

RSI.Refresh();

CCI.Refresh();

ATR.Refresh();

MACD.Refresh();

Symb.Refresh();

Momentum.Refresh();

Bands.Refresh();

Symb.RefreshRates();

// Refresh Ichimoku values for the current bar

Ichimoku.Refresh();

//---

Print("State 0: ", sState.state[shift]);

Print("State 1: ", sState.state[shift + 1]);

Print("State 2: ", sState.state[shift + 2]);

Print("State 3: ", sState.state[shift + 3]);

Print("State 4: ", sState.state[shift + 4]);

Print("State 5: ", sState.state[shift + 5]);

Print("State 6: ", sState.state[shift + 6]);

Print("State 7: ", sState.state[shift + 7]);

Print("State 8: ", sState.state[shift + 8]);

Print("State 9: ", sState.state[shift + 9]);

Print("State 10: ", sState.state[shift + 10]);

Print("State 11: ", sState.state[shift + 11]);

Print("State 12: ", sState.state[shift + 12]);

Print("State 13: ", sState.state[shift + 13]);

Print("State 14: ", sState.state[shift + 14]);

Print("State 15: ", sState.state[shift + 15]);

Print("State 16: ", sState.state[shift + 16]);

Print("State 17: ", sState.state[shift + 17]);

Print("State 18: ", sState.state[shift + 18]);

Print("State 19: ", sState.state[shift + 19]);

Print("State 20: ", sState.state[shift + 20]);

Print("State 21: ", sState.state[shift + 21]);

Print("State 22: ", sState.state[shift + 22]);

Print("State 23: ", sState.state[shift + 23]);

Print("State 24: ", sState.state[shift + 24]);

Print("State 25: ", sState.state[shift + 25]);

Print("State 26: ", sState.state[shift + 26]);

Print("Tenkan Sen: ", tenkan);

Print("Kijun Sen: ", kijun);

Print("Senkou Span A: ", senkasa);

Print("Senkou Span B: ", senkb);

}

bState.AssignArray(sState.state);


Files:
 
JimReaper #:
Understood! Thank you Very Much!

I also added some indicators and Parameters, total 27 BarDescr.... Momentum, Bands & Ichimoku Kinko Hyo =)

int OnInit()

{

Set symbol and refresh

if(! Symb.Name(_Symbol))

return INIT_FAILED;

Symb.Refresh();

//---

if(! RSI. Create(Symb.Name(), TimeFrame, RSIPeriod, RSIPrice))

return INIT_FAILED;

//---

if(! CCI.Create(Symb.Name(), TimeFrame, CCIPeriod, CCIPrice))

return INIT_FAILED;

//---

if(! ATR. Create(Symb.Name(), TimeFrame, ATRPeriod))

return INIT_FAILED;

//---

if(! MACD. Create(Symb.Name(), TimeFrame, FastPeriod, SlowPeriod, SignalPeriod, MACDPrice))

return INIT_FAILED;

//---

if (! Momentum.Create(Symb.Name(), TimeFrame, MomentumMaPeriod, MomentumApplied))

return INIT_FAILED;

Initialize the Ichimoku Kinko Hyo indicator

If (! Ichimoku.Create(Symb.Name(), TimeFrame, Ichimokutenkan_senPeriod, Ichimokukijun_senPeriod, Ichimokusenkou_span_bPeriod))

return INIT_FAILED;

//---

if (! Bands.Create(Symb.Name(), TimeFrame, BandsMaPeriod, BandsMaShift, BandsDeviation, BandsApplied))

return INIT_FAILED;

//---

if(! RSI. BufferResize(HistoryBars) || ! CCI.BufferResize(HistoryBars) ||

! ATR. BufferResize(HistoryBars) || ! MACD. BufferResize(HistoryBars))

{

PrintFormat("%s -> %d", __FUNCTION__, __LINE__);

return INIT_FAILED;

}

//---


void OnTick()

{

//---

if(! IsNewBar())

return;

//---

int bars = CopyRates(Symb.Name(), TimeFrame, iTime(Symb.Name(), TimeFrame, 1), HistoryBars, Rates);

if(! ArraySetAsSeries(Rates, true))

return;

//---

RSI. Refresh();

CCI.Refresh();

ATR. Refresh();

MACD. Refresh();

Symb.Refresh();

Momentum.Refresh();

Bands.Refresh();

Symb.RefreshRates();

Refresh Ichimoku values for the current bar

Ichimoku.Refresh();

--- History Data

float atr = 0;

for (int b = 0; b < (int)HistoryBars; b++)

{

float open = (float)Rates[b].open;

float close = (float)Rates[b].close;

float rsi = (float)RSI. Main(b);

float cci = (float)CCI.Main(b);

atr = (float)ATR. Main(b);

float macd = (float)MACD. Main(b);

float sign = (float)MACD. Signal(b);

float mome = (float)Momentum.Main(b);

float bandzup = (float)Bands.Upper(b);

float bandzb = (float)Bands.Base(b);

float bandzlo = (float)Bands.Lower(b);

float tenkan = (float)Ichimoku.TenkanSen(0); Use the calculated value

float kijun = (float)Ichimoku.KijunSen(1); Use the calculated value

float senkasa = (float)Ichimoku.SenkouSpanA(2); Use the calculated value

float senkb = (float)Ichimoku.SenkouSpanB(3); Use the calculated value

Check for EMPTY_VALUE and division by zero

if (rsi == EMPTY_VALUE || cci == EMPTY_VALUE || atr == EMPTY_VALUE || macd == EMPTY_VALUE ||

sign == EMPTY_VALUE || mome == EMPTY_VALUE || bandzup == EMPTY_VALUE || bandzb == EMPTY_VALUE || bandzb == EMPTY_VALUE ||

bandzlo == EMPTY_VALUE || tenkan == EMPTY_VALUE || kijun == EMPTY_VALUE || senkasa == EMPTY_VALUE || senkasa == EMPTY_VALUE ||

senkb == EMPTY_VALUE || kijun == 0.0 || senkb == 0.0)

{

continue;

}

Ensure buffers are not resized within the loop

int shift = b * BarDescr;

sState.state[shift] = (float)(Rates[b].close - open);

sState.state[shift + 1] = ((float)(Rates[b].close - open) + (tenkan - kijun)) / 2.0f;

sState.state[shift + 2] = (float)(Rates[b].high - open);

sState.state[shift + 3] = (float)(Rates[b].low - open);

sState.state[shift + 4] = (float)(Rates[b].high - close);

sState.state[shift + 5] = (float)(Rates[b].low - close);

sState.state[shift + 6] = (tenkan - kijun);

sState.state[shift + 7] = (float)(Rates[b].tick_volume / 1000.0f);

sState.state[shift + 8] = ((float)(Rates[b].high) - (float)(Rates[b].low));

sState.state[shift + 9] = (bandzup - bandzlo);

sState.state[shift + 10] = rsi;

sState.state[shift + 11] = cci;

sState.state[shift + 12] = atr;

sState.state[shift + 13] = macd;

sState.state[shift + 14] = sign;

sState.state[shift + 15] = mome;

sState.state[shift + 16] = (float)(Rates[b].open - tenkan);

sState.state[shift + 17] = (float)(Rates[b].open - kijun);

sState.state[shift + 18] = (float)(Rates[b].open - bandzb);

sState.state[shift + 19] = (float)(Rates[b].open - senkasa);

sState.state[shift + 20] = (float)(Rates[b].open - senkb);

sState.state[shift + 21] = (float)(Rates[b].close - tenkan);

sState.state[shift + 22] = (float)(Rates[b].close - kijun);

sState.state[shift + 23] = (float)(Rates[b].close - bandzb);

sState.state[shift + 24] = (float)(Rates[b].close - senkasa);

sState.state[shift + 25] = (float)(Rates[b].close - senkb);

sState.state[shift + 26] = senkasa - senkb;

//---

RSI.Refresh();

CCI.Refresh();

ATR.Refresh();

MACD.Refresh();

Symb.Refresh();

Momentum.Refresh();

Bands.Refresh();

Symb.RefreshRates();

// Refresh Ichimoku values for the current bar

Ichimoku.Refresh();

//---

Print("State 0: ", sState.state[shift]);

Print("State 1: ", sState.state[shift + 1]);

Print("State 2: ", sState.state[shift + 2]);

Print("State 3: ", sState.state[shift + 3]);

Print("State 4: ", sState.state[shift + 4]);

Print("State 5: ", sState.state[shift + 5]);

Print("State 6: ", sState.state[shift + 6]);

Print("State 7: ", sState.state[shift + 7]);

Print("State 8: ", sState.state[shift + 8]);

Print("State 9: ", sState.state[shift + 9]);

Print("State 10: ", sState.state[shift + 10]);

Print("State 11: ", sState.state[shift + 11]);

Print("State 12: ", sState.state[shift + 12]);

Print("State 13: ", sState.state[shift + 13]);

Print("State 14: ", sState.state[shift + 14]);

Print("State 15: ", sState.state[shift + 15]);

Print("State 16: ", sState.state[shift + 16]);

Print("State 17: ", sState.state[shift + 17]);

Print("State 18: ", sState.state[shift + 18]);

Print("State 19: ", sState.state[shift + 19]);

Print("State 20: ", sState.state[shift + 20]);

Print("State 21: ", sState.state[shift + 21]);

Print("State 22: ", sState.state[shift + 22]);

Print("State 23: ", sState.state[shift + 23]);

Print("State 24: ", sState.state[shift + 24]);

Print("State 25: ", sState.state[shift + 25]);

Print("State 26: ", sState.state[shift + 26]);

Print("Tenkan Sen: ", tenkan);

Print("Kijun Sen: ", kijun);

Print("Senkou Span A: ", senkasa);

Print("Senkou Span B: ", senkb);

}

bState.AssignArray(sState.state);


JimReaper - How many cycles did you study your version before getting the result in your picture? (data collection - training). And how long did it take?


What is your computer configuration (processor, video card, RAM)?


Thank you

 
Dear guys, it takes me around 8 hours to collect from 5 agents. I used 8 cores processors. Is it too slow or normal? Plz share.
 
JimReaper #:
Hi Dmitriy, I did it by using this... will it have the same effect?

input ENUM_TIMEFRAMES TimeFrame = PERIOD_H1;

input double MinProfit = 10;
input int Agent = 1;
input int Optimisation = 1;

then set agent to 5 and Optimisation to 20
Total of 100...


Hi Jim

I see Agent referenced in the code, but I don't see  Optimisation. Was there a further addition to the code, which you made, to use this new parameter?
Thanks
Paul
Reason: