OpenCL: internal implementation tests in MQL5 - page 25

 
Ashes:

2012.03.05 17:43:16 Terminal CPU: GenuineIntel Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz with OpenCL 1.1 (4 units, 3092 MHz, 4008 Mb, version 2.0)

2012.03.05 17:45:23 ParallelTester_00-01x (EURUSD,M1) CpuTime/GpuTime = 0.734767766287369

[...] Maybe I have "grenades of the wrong system"? (I mean AMD SDK version).

Awesome. I bought the core, mother, memory and cooler from the most usual shop I found on the net. Totally cost about 6k. I guess they've slipped me a pebble from the future by accident.

Maybe it's because of the SDK. I don't think it's worth a fault with the memory, by default it's DDR-III only. And single-channel or not, it certainly affects it, but not that terribly.

 
Mathemat:

Interesting. The fyords card is stronger (GeForce GT 440), and the calculation time is an order of magnitude longer. ...

I wonder if calculation speed depends on resolution (I have 1920x1080), because the desktop also eats something (Aero theme)?

Maybe that's why it takes longer to calculate.

 
fyords: And I wonder if the calculation speed depends on the resolution (I have 1920x1080), because the desktop also eats something too (Aero theme)?

Yes, it might. What does your Windows Aero performance index show? Here's mine:


I don't need any of that bow-tie stuff, so it's turned off.

P.S. Checked. No effect whatsoever.

 

Last test (3.12.2011) passed normally, but now some glitch comes up, but the score is 6.8 for video. Maybe that's the problem (the glitch). New hardware is not installed. May be a problem with the new hardware, then I'll wait for a new release, there again I'll do a test.

 

x32

x32

x64

x64

cpu-z

1

2

 

I added dumb genetics and looped it to maximize. I got an optimizer fitting a single layer lattice on simulated data equivalent to 500 days of history on five-minute open prices (16 indicators per input).

Please test.

At this point I consider agitation successfully completed, because the optimizer test is insufficient. ;)

Who has a slowdown (compared to others) - now you know what to save for ;-)

So, my results:

2012.03.06 03:44:23     ParallelOptimazer_00-02 (EURUSD,M30)    Full time of optimization == 14 sec 305 ms
2012.03.06 03:44:23     ParallelOptimazer_00-02 (EURUSD,M30)    Optimization is closing. Best result == 1.91356 at 92 generation.
2012.03.06 03:44:23     ParallelOptimazer_00-02 (EURUSD,M30)    Generation 92: MaxResult==1.91356
2012.03.06 03:44:23     ParallelOptimazer_00-02 (EURUSD,M30)    Generation 91: MaxResult==1.91356
2012.03.06 03:44:23     ParallelOptimazer_00-02 (EURUSD,M30)    Generation 90: MaxResult==1.91356
2012.03.06 03:44:23     ParallelOptimazer_00-02 (EURUSD,M30)    Generation 89: MaxResult==1.91356
.............
.........

Let me explain for the last time on my fingers: This is a full optimization cycle of a primitive neural network Expert Advisor.

// Well, almost an Expert Advisor. Additional calculations in the code of THIS "Expert Advisor", so that it calculates profits realistically enough,

// will not exceed (in account time. with smart implementation) those calculations that are already there. I.e. the slowdown can be, say, twofold.

 
MetaDriver:

I added dumb genetics and looped in maximisation. I got an optimiser fitting a one-layer lattice on simulated data equivalent to 500 days of history on five-minute opening prices (16 indicators per input).

Please test.

...

Thank you.))

...

NJ 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:06 Generation 36: MaxResult==2.29423
NO 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:08 Generation 37: MaxResult==2.29426
FE 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:10 Generation 38: MaxResult==2.29426
PJ 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:14 Generation 39: MaxResult==2.29427
HO 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:14 Generation 40: MaxResult==2.29427
QE 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:14 Generation 41: MaxResult==2.29427
EJ 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:15 Generation 42: MaxResult==2.29427
LP 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:15 Generation 43: MaxResult==2.29427
KE 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:15 Generation 44: MaxResult==2.29427
DI 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:15 Optimization is closing. Best result == 2.29427 at 44 generation.
KO 0 ParallelOptimazer_00-02 (GBPJPY,M5) 03:46:15 Full time of optimization == 81 sec 917 ms 
 

MetaDriver, thank you very much. You've done your canvassing job with an A. At the same time you gave me an opportunity to check my "horseless" configuration without discrete graphics card and be very, very surprised.

Ran your optimizer. It goes like this:

2012.03.05 23:00:12     Terminal        CPU: GenuineIntel  Intel(R) Pentium(R) CPU G840 @ 2.80 GHz with OpenCL 1.1 (2 units, 2793 MHz, 7912 Mb, version 2.0)
2012.03.06 04:24:34     ParallelOptimazer_00-02 (EURUSD,H1)     Full time of optimization == 58 sec 141 ms
2012.03.06 04:24:34     ParallelOptimazer_00-02 (EURUSD,H1)     Optimization is closing. Best result == 1.87689 at 60 generation.
2012.03.06 04:24:34     ParallelOptimazer_00-02 (EURUSD,H1)     Generation 60: MaxResult==1.87689
2012.03.06 04:24:33     ParallelOptimazer_00-02 (EURUSD,H1)     Generation 59: MaxResult==1.87689
2012.03.06 04:24:32     ParallelOptimazer_00-02 (EURUSD,H1)     Generation 58: MaxResult==1.87689
2012.03.06 04:24:31     ParallelOptimazer_00-02 (EURUSD,H1)     Generation 57: MaxResult==1.87689

Assuming that each generation counts for about the same amount of time, we get that compared to your Ferrari, my "tazi" is about 6.5-7 times slower, just like last time (58.14 sec per 60 generations ~ 0.97 sec/generation).

2 Graff: If you could disable the discrete graphics for the duration of the test and run the test on the integrated graphics of the stone, it would be very interesting (at the same time and look at Intel's progress in this area in the transition from Lynnfield to Sandy Bridge).

But to do so, you would have to download AMD's SDK for OpenCL. It will not make it worse for your discrete card (it has its own driver), but you can get OpenCL 1.1 on your PC, although I am not 100% sure it will work.

But if you refuse, I will not be offended.

 

Re-post from the mql4 forum.

Colleagues, you're going to have a lot of confusion and hitches with OpenCL. Don't expect easy results.

There are a lot of options because OpenCL is a software technology that is hinged on the video driver. In fact, the video driver becomes a small operating system. Everything that hangs on it along the way: UltraVNC, MSI afterbufner, Agnitum OutPost web-control interactive and a thousand other programs all can hinder the normal operation of OpenCL.

That said, even if you manage to make OpenCL work for simple threaded calculations, there is still another insurmountable obstacle to overcome, namely: for complex calculations - the technological (32-bit partial IEEE support) and operational(loss of precision when overclocking a game card) precision of the GPU is still not enough for serious scientific calculations. Besides, whereas nVidia GPUs have 64-bit double precision processing on almost all modern video cards, AMD video cards have it only on some top-of-the-line series. nVidia also has a bug, but different - they are in league with Microsoft and therefore their notorious CUDA (and OpenCL) do not actually work on e.g. Server 2003, but at the same time work fine on Server 2008 and even on old Win XP - only due to some marketing considerations of Microsoft.

OpenCL is for fast streaming inaccurate 32-bit computations like convolution or filtering.
Документация по MQL5: Основы языка / Типы данных / Приведение типов
Документация по MQL5: Основы языка / Типы данных / Приведение типов
  • www.mql5.com
Основы языка / Типы данных / Приведение типов - Документация по MQL5
 
AlexEros:Loss of accuracy when overclocking a game card

Who will overclock it? Get a 10-15% gain in calculation speed, but with the risk of a calculation error due to the death of just one bee? Well, we're not playing a game here, where not drawing hundred vertices doesn't affect anything much...

OpenCL is for fast streaming imprecise 32-bit calculations such as convolution or filtering.

Let's make a caveat. Calculations must be really massive and heavy to consider the loss of precision significant. Integration is a massive summation/ multiplication, it's a heavy case.

It is one thing to add two numbers, multiply by a third (each having 32-bit computation accuracy sufficient for practical calculations) and write the result to some cell as a result of one loop iteration and not use it in any other way inside the GPU.

Another thing is to calculate pi in a billion iterations using the slowest convergent Leibniz series (this series is popularly used to demonstrate OpenMP technology):


In the second case, when using 32-bit numbers, there is a real danger of losing accuracy, because a billion numbers are added together. Terver says that in 99.7% of cases, the resulting error will not exceed the accuracy of one number multiplied by 2*10^5. Now that's serious - and not always.

No one is stopping a developer from making several precise calculations on an honest CPU to estimate real errors.

It seems to me that with reasonable precautions this technology can be used. And if you use modern processors from both manufacturers (AMD APU Llano or even Intel Sandy Bridge), you may forget about discrete graphics card for a while: what difference does it make if I am 100 or only 25 times faster... And no bees will die, as I'm not going to overclock the processor either.