OpenCL: internal implementation tests in MQL5 - page 4

 
WChas:
If I understand correctly, 1 GPU is one very powerful agent? Is it possible to disable CPU agents in that case (due to their low speed relative to video)? And again: is it possible to have two ATIs without crossfire?

AMD strongly discourages using Crossfire for OpenCL - since with Crossfire there are actually two GPUs but the driver HAS to divide ONE graphics job between them. On the contrary, for OpenCL 1.1, there is as yet no way for the video card driver itself to divide a single OpenCL job between the two GPUs (see above).

The same for nvidia SLI.

 

How will the inclusion of OpenCL affect the cost of computation in the cloud?

The cost of an agent with GPU will be higher than the cost of an agent without GPU, since the estimated time of the first agent will be significantly shorter?

Given that only one agent on the local machine will be given a GPU, it turns out that the cost of different agents on the same local machine will be different (and pre-calculated)?

 
hrenfx:

How will the inclusion of OpenCL affect the cost of computing in the cloud?

Will the cost of an agent with GPU be higher than the cost of an agent without GPU because the computation time of the first agent will be significantly shorter?

Given that only one agent on the local machine will be given GPU, it turns out that the cost of different agents on the same local machine will be different (and pre-calculated)?

" When performing a task, the work done by a test agent is counted as the product of its PR by the Time spent. "
 
I'm not confused about OpenCL 1.0 - you have to be really risky to use it for double in the presence of serious driver problems. In fact, the terminal will, on detecting old drivers, disable OpenCL and display a message telling you to upgrade to the latest versions. Otherwise, hangs are inevitable even during the most harmless operations.

By default, the terminal/agent chooses the most powerful GPU device by its feature set at startup. So far, there is no idea of selecting devices from MQL5 - this will only complicate the code and lead to additional errors.

Instead, there is a more beautiful idea in the form of automatic allocation of each physical GPU to separate agents, which will allow to use them to their full potential.
 

Let's say we have an EX5 (using OpenCL) that optimises 20 times faster on agents with GPU than without GPU.

We run optimization on the cloud. It is evident that it is most beneficial (in terms of time taken) to run optimization first of all on those agents that have a physical GPU. Giving them the bulk of the enumeration options. It is equivalent to the fact that their PR is 20 times higher. BUT, their PR without using GPU is the same. I.e. a new PR calculation, PR_gpu, needs to be entered.

Mischek:
" When a task is performed, the work done by the test agent is counted as the product of its PR by the Time spent. "

It follows from this formula that if there is no PR_gpu, the cost of all optimization on the cloud with GPU will be cheaper than without.

Unfortunately, introducing an alternative calculation of PR - single test pass of the optimised EX5 file contains a huge number of pitfalls, due to which it is impossible to use it universally.

 
hrenfx:


It follows from this formula that if there is no PR_gpu, the cost of the whole optimization on the cloud with GPU will be cheaper than without.

Unfortunately, introducing an alternative PR calculation - a single test pass of an optimised EX5 file contains a huge number of pitfalls which make it impossible to use universally.

without going into details, which I don't know, if the actual PR will not be revised, there is no point in turning in to the cloud with GPU

Also, if you introduce a new concept, and you love it) For example, if you are introducinga new concept, which you like to do, it is better to define it immediately or guess that it is about the user side in this case.

 

Currently PR = Const Koef / time taken to complete the benchmark task.

The cost of optimisation is equal to the number of benchmark tasks that could be computed in the time that the optimisation lasted. That is, the cost of computation does not depend on how the cloud allocates tasks among agents. But the final optimization duration depends on proper allocation, which is the most important metric.

It is clear that the cloud allocates tasks to agents in proportion to the pre-computed PR to reduce the computation time (but not the cost - CONST).

 
Of course, we will automatically increase PR when the GPU is actually used. But first we will release the beta in public tests.

OpenCL tasks will of course be given to agents with the right GPU first.
 
hrenfx:

the cost of the calculation does not depend on how the cloud allocates tasks between agents.

Unfortunately, introducing an alternative PR calculation - a single test run of an optimized EX5 file - contains a huge number of pitfalls that make it impossible to use universally.

Since the cost to the optimizer is always constant, but its duration strongly depends on an adequate (for the given task) PR calculation, perhaps we should introduce the optimizer to enter his EX5 code PR_calculate on his conscience. Where the optimizer, based on features of his algorithm, will set the distributive PR that is most suitable for his task.

For example, if the task is purely computational, the emphasis in PR_calculate will be on mathematics.

If the task handles huge amounts of data, the emphasis will be on the memory handling speed.

If the GPU is used a bit - display it in your PR_calculate.

If the GPU is used everywhere in EX5 - write an appropriate PR_calculate (with appropriate use of GPU).

Under this scheme, those who rent out power to the cloud do not lose anything, and those who use the cloud power get a chance to speed up their calculations significantly.

If PR_calculate is not specified, the already accepted universal reference task is used.

P.S. PR_calculate is only used to allocate tasks in the cloud. For cost calculation, the same old PR is still used.

P.P.S. There are, of course, pitfalls - PR_calculate needs to be pre-run on all free agents. PR_calculate may be written in error - looped. So for the time it takes to calculate PR_calculate, you also have to charge money according to the classic PR scheme. Etc.

 
hrenfx:

Since the cost to the optimizer is always constant, but its duration strongly depends on adequate (for the given task) calculation of PR, perhaps it is worth introducing to the conscience of the optimizer input to his EX5 code PR_calculate. Where the optimizer, based on features of his algorithm, will set the distributive PR that is most suitable for his task.

Unfortunately, this option when the programmer calculates PR for himself will not work.