OpenCL: internal implementation tests in MQL5 - page 50

 
Mathemat:
This is what the trick is all about. The argument inside CLContextCreate() cannot explicitly select CPU if there is at least one external GPU.

It shouldn't be like this - https://www.mql5.com/ru/docs/opencl/clcontextcreate

Parameters

device

[in] Number of OpenCL device in the system, in order. One of values can be specified instead of a specific number: CL_USE_ANY - any available OpenCL-enabled device can be used; CL_USE_GPU_ONLY- OpenCL emulation is disabled and only dedicated OpenCL devices (video cards) can be used.

Документация по MQL5: Работа с OpenCL / CLContextCreate
Документация по MQL5: Работа с OpenCL / CLContextCreate
  • www.mql5.com
Работа с OpenCL / CLContextCreate - Документация по MQL5
 
Rosh: It shouldn't be like this - https://www.mql5.com/ru/docs/opencl/clcontextcreate

I think so, it shouldn't be. But where to get this number and then choose the right one?

https://www.mql5.com/ru/docs/opencl/clgetinfointeger? But there is only one property - number of devices, CL_DEVICE_COUNT.

I have one in my system, that's why it returns 1.

Документация по MQL5: Работа с OpenCL / CLGetInfoInteger
Документация по MQL5: Работа с OpenCL / CLGetInfoInteger
  • www.mql5.com
Работа с OpenCL / CLGetInfoInteger - Документация по MQL5
 
Remember to upgrade to the latest 619 build and recompile programs. The OpenCL initialisation principle has changed in this build.
 
Mathemat:

I think so, it shouldn't be. But where do you get this number and then choose the right one?



You should try specifying CL_USE_GPU_ONLY, then the video card will be taken and the number is not needed.
 
Rosh: You should try specifying CL_USE_GPU_ONLY, then the video card is taken and the number is not needed.
This is understandable. But I want to specify CPU - with a discrete graphics card. How do I do that?
 

Awesome! So the HD4200 is not an OpenCL device, is it?

Looks like there really is pure CPU emulation going on here. I like it!

What if I could teach i3 to do it...

I am beginning to vaguely suspect why AMD APP SDK does not work properly in MT5 on i3: not all cores are real, and AMD has no hypertrading technology. Probably, that is why the driver inserts in a funny way.

Or maybe the trick is in MT5 which formally recognizes the device but actually is of no use?

 
Mathemat:
Here is the trick. The argument inside CLContextCreate() cannot explicitly select a CPU if there is at least one external GPU.

This is on CPU (in my case it's device 1):

2012.04.08 21:03:01     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      CpuTime/GpuTime = 74.73506433823529
2012.04.08 21:03:01     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      Result on Cpu МахResult==4.54091 at 2233 pass
2012.04.08 21:03:01     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      Соunt inticators = 16; Count history bars = 50000; Count pass = 4096
2012.04.08 21:03:01     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      CPU time = 325247 ms
2012.04.08 20:57:36     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      Result on Gpu МахResult==4.54091 at 2233 pass
2012.04.08 20:57:36     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      Соunt inticators = 16; Count history bars = 50000; Count pass = 4096
2012.04.08 20:57:36     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      GPU time = 4352 ms
2012.04.08 20:57:32     ParallelTester_00-02-(16 x7x3) (USDJPY,M30)      OpenCL init OK!

Pulled CLContextCreate(device) parameter into script parameter. You can poke around all the options.

// Maybe there is still a secret stinking combination ? :))

 

2012.04.08 22:01:08    Terminal    CPU: GenuineIntel  Intel(R) Pentium(R) CPU G840 @ 2.80GHz with OpenCL 1.2 (2 units, 2793 MHz, 8040 Mb, version 2.0 (sse2))

2012.04.08 22:05:59    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    CpuTime/GpuTime = 26.95192501511792
2012.04.08 22:05:59    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    Result on Cpu МахResult==4.98137 at 1628 pass
2012.04.08 22:05:59    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    Соunt inticators = 16; Count history bars = 50000; Count pass = 4096
2012.04.08 22:05:59    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    CPU time = 267417 ms
2012.04.08 22:01:32    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    Result on Gpu МахResult==4.98137 at 1628 pass
2012.04.08 22:01:32    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    Соunt inticators = 16; Count history bars = 50000; Count pass = 4096
2012.04.08 22:01:32    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    GPU time = 9922 ms
2012.04.08 22:01:22    ParallelTester_00-02-316x7x3j_080412 (EURUSD,H1)    OpenCL init OK!


Thanks, I didn't know that...

Well, I had no doubt that the ratio would be around 3 to 1: you have 3 times as many cores, 74.735/26.952 ~ 2.77. And that's good news: more cores means more gain! AMD has bailed out here as well - due to the number of cores!

But I wonder what others will have - say, someone with an i5 Sandy Bridge?

And if someone has a Bulldozer e, it could be really cool... although it might not, because it's a bit of a FPU problem.

 
MetaDriver:

And in this test, for some reason, my iron tore yours.

2012.04.09 01:09:36        ParallelTester_00-02-316x7x3j (EURUSD,H1)          CpuTime/GpuTime = 161.0007722007722

2012.04.09 01:09:36    ParallelTester_00-02-316x7x3j (EURUSD,H1)    Result on Cpu МахResult==4.85831 at 2497 pass
2012.04.09 01:09:36    ParallelTester_00-02-316x7x3j (EURUSD,H1)    Соunt inticators = 16; Count history bars = 50000; Count pass = 4096
2012.04.09 01:09:36    ParallelTester_00-02-316x7x3j (EURUSD,H1)    CPU time = 208496 ms
2012.04.09 01:06:08    ParallelTester_00-02-316x7x3j (EURUSD,H1)    Result on Gpu МахResult==4.85831 at 2497 pass
2012.04.09 01:06:08    ParallelTester_00-02-316x7x3j (EURUSD,H1)    Соunt inticators = 16; Count history bars = 50000; Count pass = 4096
2012.04.09 01:06:08    ParallelTester_00-02-316x7x3j (EURUSD,H1)    GPU time = 1295 ms

2012.04.09 01:06:07        ParallelTester_00-02-316x7x3j (EURUSD,H1)           OpenCL init OK!


And here is this test on OpenCL CPU (all 4 cores are 100% loaded)

2012.04.09 01:11:15    ParallelTester_00-02-316x7x3j (EURUSD,H1)    GPU time = 68547 ms

2012.04.09 01:10:07    ParallelTester_00-02-316x7x3j (EURUSD,H1)    OpenCL init OK!


GPU has 480 flies and CPU has 4, the difference is 120 times, that is, 120 times more calculations can be done at once, but the CPU lagged behind the GPU only 68547/1295=52.9 times. That means the CPU thread is faster than the GPU thread, but the GPU only wins because it can perform more simultaneous tasks. Do I get it right?


Scaling on CPU is a lot less than 100% (about 76%), instead of 4x only 3x acceleration 208496/68547=3.04

Although, the specifics of this test may be affecting it.

 

Andrei, I think you've got something wrong: MD only posted this test here on the CPU, without the graphics card, i.e. directly selecting the CPU emulation. As I understand it, your first test is on the graphics card.

And the result of your OpenCL emulating graphics card (your second test) seems to be suspiciously low even in comparison with mine (I had about 10 seconds on "GPU").

If I've misunderstood something, correct me.