Cloud sync errors - page 3

 
Clock:

This sounds like a great idea and I'm thankful for that hint.

However, 3 things about that:

1) As I mentioned above, I also have the problem of the "endless" loop, but since I understood from this thread that "endless loop" is just the best guess for "one event took longer than ten minutes" I accept that it might be my code. I use quite complex indicators, and since (at least I think so) they calculate their whole history when their handle is being created, this might (on slow computers) take more than ten minutes.

2) However! Usually my cloud crashed after 10-15 Minutes. But last night, it worked perfectly for 8 hours. No single crash, although I didn't change the code at all. Weird!

3) And most important, because related to your approach: When you reject an agent based on it's memory, the agent (and for that the whole cloud) don't crash, I get that. But I don't think, that a more powerful machine will try the same parameter-set again, so you basically lose optimization datapoints, am I correct? Would you say, this is the price we'll have to pay?


Will be curious to see, if my agents still work once I'm back from work...

Hi Clock,

1) Firstly, unless you're using Indicators on say, 10 years of 1m data and the Indicators are incredibly complex, I'd be very surprised in this day and age of processing power that anything would take even 5 minutes on a NORMAL system. The reason I stress normal here is that I still suspect there are a number of agents in the cloud that are running on machines that are either extremely loaded up or simply have a bad case of the Windows (post traumatic stress) slowdown blues. And it only takes one dud agent to kill your optimisation....

2) I had exactly the same as you - i.e. after an optimisation was started, a number of the cloud agents would return results without any problems. Then after 5-20 minutes or quite a bit longer sometimes, an agent would throw the dreaded error and BANG - end of optimisation. And I also had the occasional optimisation where it completed without any problems. VERY frustrating as you don't have any access whatsoever to the agents log files, system details, CPU usage etc to be able to see what's going on.

3) That's a very interesting point you make. From my understanding of things, the optimiser only considers a particular combination of parameters "used" when it's got the results for that particular combination of parameters although I could be wrong with this. Perhaps someone at MetaQuotes could comment on this point?

Anyway, I hope you're making progress! :)

 
angevoyageur:
How many agents are available when you reject all those with less than 32G of ram ?

Hi,

It does seem like a large amount of RAM for consumer-type PC's but the cloud doesn't seem to have any problems finding machines with these specs. When I start an optimisation, the optimiser easily finds the initial 64 agents and then ramps up pretty quickly to 128 (depending on the parameter set configuration of course). I initially tried 8GB - the optimisation ran for longer and often completed but I still regularly had an agent produce the error and as a result, kill the optimisation. I then tried 16GB - again, it was better but not flawless. I didn't bother with trying 24GB - thought I'd go straight to 32GB and see what happened. :) And voilà - flawless optimisations.

I wanted to play around a lot more and see if I could hone the agent configuration requirements a bit better but when you're getting charged for playing around, the incentive quickly disappears.  :)

 
cowil:

Hi,

It does seem like a large amount of RAM for consumer-type PC's but the cloud doesn't seem to have any problems finding machines with these specs. When I start an optimisation, the optimiser easily finds the initial 64 agents and then ramps up pretty quickly to 128 (depending on the parameter set configuration of course). I initially tried 8GB - the optimisation ran for longer and often completed but I still regularly had an agent produce the error and as a result, kill the optimisation. I then tried 16GB - again, it was better but not flawless. I didn't bother with trying 24GB - thought I'd go straight to 32GB and see what happened. :) And voilà - flawless optimisations.

I wanted to play around a lot more and see if I could hone the agent configuration requirements a bit better but when you're getting charged for playing around, the incentive quickly disappears.  :)

It would be interesting to have some return from Metaquotes. If a machine with 16G ram isn't enough to run some optimization there is something to investigate. If I well understood, when you run your optimization locally you don't have any problem, why then when using the cloud there is a need for so much memory ?
 
angevoyageur:
It would be interesting to have some return from Metaquotes. If a machine with 16G ram isn't enough to run some optimization there is something to investigate. If I well understood, when you run your optimization locally you don't have any problem, why then when using the cloud there is a need for so much memory ?

I have absolutely no idea. My local machine is a 8GB i7 processor machine, on which MT5 installed 8 local agents (it's obviously only a 4 core processor but with Hyper-Threading, Windows and thus MT5 see it of course as an 8 core processor). When an optimisation is being carried out, the agents appear to use about 400MB of memory each, which obviously works out to about 3.2 GB of required memory for the 8 agents. Nowhere near 32 GB....

The other thing I was thinking about that might actually be the root cause of this issue, is the fact that one "bad" cloud agent terminates the whole optimisation. What may in fact be happening is that when the cloud server allocates agents for an optimisation job (without memory requirements being stated), the same "bad" agent(s) is/are selected. When the memory requirements are stated in OnInit(), the "bad" agents are bypassed because the boxes they are running on don't meet the requirements and only good agents are selected. Thinking about it, I suspect this is probably more the case.

And yep, I've registered this issue with MetaQuotes but haven't heard anything back as yet. 

 

If OnInit (or any other function) is executed for more than 10 minutes even on a slow agent, it is considered as an endless loop harmful for the MQL5 Cloud Network (Note that there is no such limitation for the local and remote agents).

For this kind of situations, we have implemented the return code INIT_AGENT_NOT_SUITABLE for the OnInit function. Using it, a cloud network user can check and reject unsuitable agents at the very start of a test run.

You can consider this comment as an official reply to your Service Desk ticket. We know that you are acquainted with the information given above.

In addition: In any case, any function is considered to be abnormal, ineffective and nonoptimal if its execution takes more than 10 minutes even on a slowest PC.

 
MetaQuotes:

If OnInit (or any other function) is executed for more than 10 minutes even on a slow agent, it is considered as an endless loop harmful for the MQL5 Cloud Network (Note that there is no such limitation for the local and remote agents).

For this kind of situations, we have implemented the return code INIT_AGENT_NOT_SUITABLE for the OnInit function. Using it, a cloud network user can check and reject unsuitable agents at the very start of a test run.

You can consider this comment as an official reply to your Service Desk ticket. We know that you are acquainted with the information given above.

In addition: In any case, any function is considered to be abnormal, ineffective and nonoptimal if its execution takes more than 10 minutes even on a slowest PC.

Hi MetaQuotes,

Firstly, thanks very much for your comments - much appreciated.

The problem that I face (and others apparently if you look back on this posting) is that when an optimisation is carried out using my local agents, the optimisation runs fine - i.e. the "percentage complete" status on each agent steadily increases as the optimisation progresses. If the OnTick() event handler in my Expert contained any code that would take more than even a minute (let alone 10 minutes) to complete, surely those "percentage complete" percentages would pause at these times? Shouldn't I be seeing pauses of 10 minutes (or more) in these status percentages if my code contained the forms of endless looping that the cloud agent errors allude to?

 

Well, after many hours of whittling down my Expert, it appears that I've found a/the source of the problems with the OnInit() or OnTick() "endless loop" errors - this being the SymbolInfoInteger() command. I don't know if SymbolInfoDouble() or SymbolInfoTick() cause the same problems as I haven't as yet had a chance to experiment any further. If anyone wants to try this out, run the following Expert in the Optimiser, using cloud agents:

/+------------------------------------------------------------------+
//|                                              MultiSymbolTest.mq5 |
//|                                                                  |
//+------------------------------------------------------------------+
input double var1 = 45;
input double var2 = 54;

input bool onInit = true;
input bool onTick = false;


//+------------------------------------------------------------------+
//| expert initialization function                                   |
//+------------------------------------------------------------------+
int OnInit() { 

    
    if (onInit) {
    
        string pairsToTrade[] = {"AUDUSD","EURJPY","EURUSD","GBPUSD","AUDJPY","USDJPY","AUDCAD"};
        for (int i=0; i<ArraySize(pairsToTrade); i++) {
            int digits = SymbolInfoInteger(pairsToTrade[i], SYMBOL_DIGITS);
            if (digits == -1)
                return(INIT_FAILED);
        }
    }           

    // Return...
    return(INIT_SUCCEEDED);
}



//+------------------------------------------------------------------+
//| expert start function                                            |
//+------------------------------------------------------------------+
void OnTick() {

    if (onTick) {
    
        string pairsToTrade[] = {"AUDUSD","EURJPY","EURUSD","GBPUSD","AUDJPY","USDJPY","AUDCAD"};
        for (int i=0; i<ArraySize(pairsToTrade); i++) {
            int digits = SymbolInfoInteger(pairsToTrade[i], SYMBOL_DIGITS);
            if (digits == -1)
                return;
        }
    }           

    ExpertRemove();
}    

Select whether you want to test either OnInit() or OnTick(), give var1 and var2 sufficient start/step/stop values to generate about 1000 combinations (probably less will do but this is what I've been using) and fire up the optimiser. In approximately 10 minutes, you'll see the "endless loop detected" error.

Oh, and the reason I put the "ExpertRemove()" at the end of OnTick() was to show that it only takes one iteration of OnTick() to generate the error. 

Needless to say, I've reported this to the service desk as well... 

 
Oh, and I forgot to mention that for whatever reason, the memory fix I provided above seems to fix the issue most of the time but not all. How/why/what this works, I don't know. It must tickle up something in the bowels of MT5... :)
 

I can confirm this issue :

2013.05.20 14:22:31    MQL5 Cloud Europe 2    genetic pass (0, 22) tested with error "endless loop detected in OnInit function, expert rejected by MQL5 Cloud Network" in 602 sec (PR 140)

 
Need to think...