EA Genetic optimization needs better load balancing for local network farm agents

 

I brought back two old PCs to my home yesterday to extend my local farm and accelerate EA optimization. I was expecting the PC to each bring in about 10% more processing power and shorten the optimization time accordingly.

However, the old computers being old quad cores and using DDR2 actually slowed down the optimization, tremendously. During genetic optimization, the master PC would share a batch of 512 runs across all computers/agents with no regard for their speed. The result was that the best computers would run for a while, and then wait for about 50% of the time after the older computers to finish. On the next batch, I would have expected the master computer to send fewer tasks to the slower ones, so that everyone finishes at the same time, but no! The old computer got the same amount of work, and the better computers spent a bunch of time idling.

Genetic EA optimization needs a better load balancing algorithm. For example, a simple function that monitors the average run time and dispatches the number of tasks accordingly. Even better, if a cache could be created with a history of the performance of local farm agents, MT5 would remember the performance from past optimizations and spread the load correctly right from the start.

I hope this is the right place to share ideas like these. Cheers,


Marc

 
Marc-Antoine Lalonde:

I brought back two old PCs to my home yesterday to extend my local farm and accelerate EA optimization. I was expecting the PC to each bring in about 10% more processing power and shorten the optimization time accordingly.

However, the old computers being old quad cores and using DDR2 actually slowed down the optimization, tremendously. During genetic optimization, the master PC would share a batch of 512 runs across all computers/agents with no regard for their speed. The result was that the best computers would run for a while, and then wait for about 50% of the time after the older computers to finish. On the next batch, I would have expected the master computer to send fewer tasks to the slower ones, so that everyone finishes at the same time, but no! The old computer got the same amount of work, and the better computers spent a bunch of time idling.

Genetic EA optimization needs a better load balancing algorithm. For example, a simple function that monitors the average run time and dispatches the number of tasks accordingly. Even better, if a cache could be created with a history of the performance of local farm agents, MT5 would remember the performance from past optimizations and spread the load correctly right from the start.

I hope this is the right place to share ideas like these. Cheers,


Marc

I have not done this in the MetaTrader scenario however have done a lot of load-balancing and optimization in enterprise systems. So I was wondering whether it could be possible to install an external load balancer in front of your servers?

Worth checking if that is an option

 
No, that is not the type of load balancing requested.

Your suggestion is about network load balancing, while the MT5 optimization run is about computational power and the composition of distributed batches to the agents.

Can't be done with a network load balancing approach.
 
Dominik Christian Egert #:
No, that is not the type of load balancing requested.

Your suggestion is about network load balancing, while the MT5 optimization run is about computational power and the composition of distributed batches to the agents.

Can't be done with a network load balancing approach.

Not exactly - application level load balancing is what I am talking about.

As I said I have not looked into how MetaTrader sends requests out to its distributed servers but if you have information on how it does, please share as it would be interesting to see if this is feasible

 
It is not feasible.

There is no load balancer capable of reading the mql own application layer protocol and making decisions based on that, which would give you an approximate of the runtime for an agent to complete a test run.

Maybe, before stating any answer, some research on what has been asked, would have stopped your thinking process before giving an advice, which is simply out of scope.

Let's see what you will answer to my other post...

Edit:

Application level load balancing still takes place on a network based approach.
 
To clarify the process involved:

A "master" Terminal, which gathers all network agents, will compose workload batches and distribute these to the agents available.

If for some reason an agent is done with it's work, it would need to inform the master for it to be done.

Now the master would need to retreat work load from agents which still have some undone work, and then reassign these workloads to the free agents.

Show me a load balancer that's capable of doing that.
 
"Maybe, before stating any answer, some research on what has been asked, would have stopped your thinking process before giving an advice..."

Actually I did search but did not find anything so I put forward the question (rather than advice) in case someone with more knowledge could help explore the area. From your response I thought you may have some useful material on this so I asked if you could share it, please do if you have.

Secondly, if you read my original post, I did not give advice, I posed a question - is that not the purpose of a discussion forum?

Thirdly, from your statement about "Application level load balancing still takes place on a network based approach" I really wonder what experience you have in scaling large complex systems, please do share that too in case it helps the conversation.


 

Please, before suggesting anything complicated zoom out and look at it sensibly (instead of deeply). There is no way for GA to know beforehand how long the passes will take at least for the first generation to balance the load TIME wise.

Example. within the optimized parameters there is Timeframe parameter. The range is set from day to 1 minute timeframe. the date set is say 2 years. It is obvious to us the minute timeframe takes longer to compute then the daily timeframe. There is no way for the GA to know. Therefor when dealing out the tasks to variouse agents even on the same hardware with same specs, there will be a situation where one core is done waiting for the rest of the generation to complete.

There is no fix.

 
Enrique Dangeroux #:

Please, before suggesting anything complicated zoom out and look at it sensibly (instead of deeply). There is no way for GA to know beforehand how long the passes will take at least for the first generation to balance the load TIME wise.

Example. within the optimized parameters there is Timeframe parameter. The range is set from day to 1 minute timeframe. the date set is say 2 years. It is obvious to us the minute timeframe takes longer to compute then the daily timeframe. There is no way for the GA to know. Therefor when dealing out the tasks to variouse agents even on the same hardware with same specs, there will be a situation where one core is done waiting for the rest of the generation to complete.

There is no fix.


A better and more versatile answer than mine. Thank you.
 
R4tna C #:

Actually I did search but did not find anything so I put forward the question (rather than advice) in case someone with more knowledge could help explore the area. From your response I thought you may have some useful material on this so I asked if you could share it, please do if you have.

Secondly, if you read my original post, I did not give advice, I posed a question - is that not the purpose of a discussion forum?

Thirdly, from your statement about "Application level load balancing still takes place on a network based approach" I really wonder what experience you have in scaling large complex systems, please do share that too in case it helps the conversation.




I understood "... worth checking if that is an option" as an advice to go forward on the OPs issue.

I cannot see your question, somehow. Maybe you could give a hint on what you have asked.

I gave all details relevant to the ongoing exchange of information, and all I know about that matter.

My experience in scaling will not provide any forther details to the matter of genetic algorithms.

But to be clear and underline my experience, scaling in massive environments can take place a different levels, all of which include networked equipment. No matter if you scale a system on OSI layer 2, 3, 4, 5, 6 or seven. All, without exception, include any type of node, and therefore must be linked to each other, utilizing either shared memory, shared solid storage, interprocess synchronization on infiniband, Fibre Channel, iSCSI, networking, non NUMA Board links, PCI Express... You name it.

All serve one purpose, synchronisation and data sharing.

Depending on the level, you can use a device in front of your resources, or on those resources. This all is only differenciated by your definition of a unit. A unit, representing a node.

A node, defined by the layer of abstraction within the whole system.

So I node can be a CPU core, a Server, a group of servers, a Datacenter, it depends on your definition. And depending on this definition, you get to scale and this scaling can be done on different levels.

Take a look at how you define availability in SLAs. That's how it is done by industry standard.

None of these can be applied to MT. Because the level at which you could scale is not accessible to us users.

So, it can happen, one single CPU core blocks the foregoing of the whole genetic algorithm, because the underlying implementation of that GA is "turn based" or generation based. It is not floating, ongoing and does not support overlapping generations.

Sad, but that's how it is.

Btw, the GA of MT is limited to 64 bit, which is considered (most probably) the smallest production GA out there...

But you could implement, like I did, your own GA, and utilize the terminals ability to distribute jobs over a network to nodes, to create your own GA environment.

Then you also have control over distributing your jobs as you like.
 
Enrique Dangeroux #:

Please, before suggesting anything complicated zoom out and look at it sensibly (instead of deeply). There is no way for GA to know beforehand how long the passes will take at least for the first generation to balance the load TIME wise.

Example. within the optimized parameters there is Timeframe parameter. The range is set from day to 1 minute timeframe. the date set is say 2 years. It is obvious to us the minute timeframe takes longer to compute then the daily timeframe. There is no way for the GA to know. Therefor when dealing out the tasks to variouse agents even on the same hardware with same specs, there will be a situation where one core is done waiting for the rest of the generation to complete.

There is no fix.

The problem is my view of these systems is the "zoom out" view, having spent decades tuning much larger systems with a lot more options and a lot more throughput, and comparatively a lot less time working with MetaTrader which I am still learning about. So please understand if I ask a question (and it was question rather than a suggestion or advice) it really is to learn more, which I now have done having finally found the MetaTrader application options and related documentation.

I can see what you mean, it is pretty limited in terms of options  and is no where near as sophisticated as other scalable distributed processing stacks, but hey, nothing to get so excited about, as Mr Egert seems to like to do in his responses...