Question for developers - using all computational cores during optimisation - page 4

 
Boris Egorov:

You will go far with such a message... Slava, by the way, is one of the main developers of MT, he knows how the algorithm works.

I'm telling you, just give me the optimisation set of the standard EA, which leads to the same results. You had some parameters optimised before, now others. Perhaps the automatic switch to genetics with all that it implies has triggered.

Add some constructiveness and the problem will be solved much faster.

 
Andrey Khatimlianskii:

If you really want to change and not just grumble (like me), why not give the developers a reproducible example where kernels are disabled/standby?

Maybe based on the standard EA (if it's also playable), but with as much detail as possible, so they can replicate the behaviour for themselves.

ga

is that clearer?

only local agents are used, 6 out of 8 are enabled, 3 fall off immediately after the first batch of jobs
 
Sergey Chalyshev:

is that clearer?

only local agents are used, 6 out of 8 are enabled, 3 fall off immediately after the first batch of jobs

It's much more constructive that way.

Attach the tester's log and the log of one of the agents who finished early:


 
Andrey Khatimlianskii:

This is much more constructive.

Attach the tester log and the log of one of the agents who finished early:


Log of the tester, the working agent and the one that failed:

Files:
 
Sergey Chalyshev:

Log of tester, working agent and failed agent:

Now we wait for @Slava's response

Looks like genetics stopped engaging some of the cores after generation 3:

01:00:50.723    Tester  Best result 5681.165275 produced at generation 1. Next generation 4

Figured there was no point?

 

>Slava, by the way, one of the main developers of MT

Well then Slava - all hope is on you, we pray and we raise our voices .... help us from non-working network agents :-)

I would also like to thank Andrey Khatimlianskii for the logs

 
Boris Egorov:

>Slava, by the way, one of the main developers of MT

Well then Slava - all hope is on you, we pray and we raise our voices .... help us from non-working network agents :-)

I would also like to thank Andrey Khatimlianskii for the logs

We are working on it. Renat on page 2 promised
 
Andrey Khatimlianskii:

Now we wait for @Slava's response

Looks like genetics stopped engaging some of the cores after generation 3:

Figured there was no point?

No.

There's more in the log.

NQ      3       01:02:43.436    Tester  stopped by user

Confirmed by agent logs

FL      0       01:02:43.434    127.0.0.1       tester forced to stop
JJ      0       01:02:43.439    Tester  29 of 85 passes processed (29 successfully finished) in 0:00:06.976
 

I would like to point out that there are actually two problems with downtime.

With genetics there is a waiting period for the end of the generation calculation. It is not clear whether rebalancing of the job package is possible in this case.

With slow optimization, downtime of previously freed agents can be avoided by dynamically reallocating jobs. Developers did not do this, and now jobs are being distributed at the beginning of optimization. They did not do this because the same distribution algorithm is applied when using cloud agents, and taking jobs away from them is "inappropriate". It is worth separating the methodology for on-premises and cloud agents.

In the meantime, the developers have, relatively recently, improved the methodology slightly, leaving a small reserve for agents who have finished work early. Unfortunately, this doesn't always save the day. In addition, this reserve is the remainder of the division of tasks by the number of agents, so it can be equal to zero.

 
Slava:

No.

There is another entry in the log

Confirmed by the agent's logs.

So that's afterwards, at the end. The agents dropped out earlier, at 01:00:50, and you can see it in the log and the video.