How to use OpenCL - I managed to get X200X speed increase on my card compared to a single core - General

Vladimir Gomonov 2012.03.01 15:06 #201

Urain:

Yeah, I think I get it, you're not satisfied with

1. the complication of algorithms and memory overruns from the application

2. and you want to be able to offset at the copy stage.

So you don't have to copy 100000 elements and then do 998000 offsets.

3. But we should leave the variant with offset, which we have now, because it allows us not to copy one and the same data many times, but to take them for a new task from an already existing CL-buffer with a new offset.

1. no. I don't like wasting time on extra copying. Although, if we use float, we will have to copy into the gap anyway.

2. Yes.

3. Yes.

Vladimir Gomonov 2012.03.01 22:09 #202

First pancakes: https://www.mql5.com/ru/forum/138292/page7#601897

Сторонникам всемирного заговора и полным параноикам, ну или просто любителям по управлять ценой, посвящается! ;) - MQL4 форум

www.mql5.com

Сторонникам всемирного заговора и полным параноикам, ну или просто любителям по управлять ценой, посвящается! ;) - MQL4 форум

Anatoli Kazharski 2012.03.02 01:22 #203

MetaDriver:

First pancakes: https://www.mql5.com/ru/forum/138292/page7#601897

Nice. Impressive. Delicious pancakes.

//---

Will there be an article on OpenCL? I haven't gotten close to this topic in practice yet, but it would be very interesting to read it in perspective. Or at least a couple of example scripts in the help on how to use it. There's not enough information to go around.

When MetaTrader 5 Web Opening and Closing Positions Chaikin Oscillator - Oscillators

Vladimir Gomonov 2012.03.04 19:20 #204

I managed to get x200x speed increase on my card compared to a single CPU core.

2012.03.04 23:01:32 ParallelTester_00-01 x (EURUSD,D1)  CpuTime/GpuTime = 216.0292397660819
2012.03.04 23:01:32 ParallelTester_00-01 x (EURUSD,D1)  Result on Cpu МахResult==1.3431 at 819 pass
2012.03.04 23:01:32 ParallelTester_00-01 x (EURUSD,D1)  Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.04 23:01:32 ParallelTester_00-01 x (EURUSD,D1)  CPU time = 36941 ms
2012.03.04 23:00:55 ParallelTester_00-01 x (EURUSD,D1)  Result on Gpu МахResult==1.3431 at 819 pass
2012.03.04 23:00:55 ParallelTester_00-01 x (EURUSD,D1)  Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.04 23:00:55 ParallelTester_00-01 x (EURUSD,D1)  GPU time = 171 ms
2012.03.04 23:00:55 ParallelTester_00-01 x (EURUSD,D1)  OpenCL init OK!

Please test and post results.

CPU: AuthenticAMD AMD Phenom(tm) II X6 1100 T Processor with OpenCL 1.1 (6 units, 3311 MHz, 16345 Mb, version 2.0)
GPU: Advanced Micro Devices, Inc. Cayman with OpenCL 1.1 (20 units, 750 MHz, 1024 Mb, version CAL 1.4.1664 (VM))

If card is not pulling memory, reduce history (CountBars) or number of passes (CountPass) which is less desirable.

Trailer of multichannel tester

Files:

ParallelTester_00-01x.mq5 14 kb

How to Participate - Deposits and withdrawals - Task Manager - For

Dmitriy Parfenovich 2012.03.04 20:30 #205

2012.03.04 22:24:07     ParallelTester_00-01 x (EURUSD,D1)       OpenCL init OK!
2012.03.04 22:24:08     ParallelTester_00-01 x (EURUSD,D1)       GPU time = 1513 ms
2012.03.04 22:24:08     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.04 22:24:08     ParallelTester_00-01 x (EURUSD,D1)       Result on Gpu МахResult==1.80839 at 1002 pass
2012.03.04 22:24:52     ParallelTester_00-01 x (EURUSD,D1)       CPU time = 44055 ms
2012.03.04 22:24:52     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.04 22:24:52     ParallelTester_00-01 x (EURUSD,D1)       Result on Cpu МахResult==1.80839 at 1002 pass
2012.03.04 22:24:52     ParallelTester_00-01 x (EURUSD,D1)       CpuTime/GpuTime = 29.11764705882353

2012.03.04 22:27:16     Terminal        GPU: NVIDIA Corporation GeForce GT 440 with OpenCL 1.1 (2 units, 1660 MHz, 1024 Mb, version 295.73)
2012.03.04 22:27:16     Terminal        CPU: AuthenticAMD AMD Athlon(tm) II X4 630 Processor with OpenCL 1.1 (4 units, 2812 MHz, 2048 Mb, version 2.0)

Even on my not-so-great hardware, the gain is visible. Useful test. Thank you. (chuckles)

Renat Fatkhullin 2012.03.04 21:25 #206

My result, an acceleration of 133 times:

2012.03.04 23:23:30     ParallelTester_00-01 x (EURUSD,D1)       CpuTime/GpuTime = 133.8285714285714
2012.03.04 23:23:30     ParallelTester_00-01 x (EURUSD,D1)       Result on Cpu МахResult==1.24101 at 1079 pass
2012.03.04 23:23:30     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.04 23:23:30     ParallelTester_00-01 x (EURUSD,D1)       CPU time = 18736 ms
2012.03.04 23:23:11     ParallelTester_00-01 x (EURUSD,D1)       Result on Gpu МахResult==1.24101 at 1079 pass
2012.03.04 23:23:11     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.04 23:23:11     ParallelTester_00-01 x (EURUSD,D1)       GPU time = 140 ms
2012.03.04 23:23:11     ParallelTester_00-01 x (EURUSD,D1)       OpenCL init OK!

2012.03.04 23:21:47     Terminal        CPU: GenuineIntel  Intel(R) Core(TM) i7-2600 CPU @ 3.40 GHz with OpenCL 1.1 (8 units, 3392 MHz, 16366 Mb, version 2.0)
2012.03.04 23:21:47     Terminal        GPU: Advanced Micro Devices, Inc. Barts with OpenCL 1.1 (14 units, 900 MHz, 1024 Mb, version CAL 1.4.1664 (VM))

Sceptic Philozoff 2012.03.04 21:46 #207

Amazingly, on a single core, the i7 is twice as fast as the X6 1100T - at comparable frequencies (the i7 is around 3.8 GHz, the 1100T is 3.7). It's understandable that these are such calculations, but the difference in CPU speed on a low thread is monstrous.

Core CPI - Canada Algorithmic Trading, Trading Robots Producer Output Prices -

Vladimir Gomonov 2012.03.04 22:08 #208

Mathemat:
It's amazing that on a single core the i7 runs 2 times faster than the X6 1100T - at comparable frequencies (the i7 is around 3.8 GHz, the 1100T is 3.7). It's understandable that this is such computing, but the difference in CPU speed on a low thread is monstrous.

Doing a lot of thinking, reading google.

I've been scratching my head.

Either they have the most advanced mql-compiler-optimizer clamped for themselves and don't give it to us, or I don't know anymore.

It doesn't work like that. "I don't believe it!" (c) KSS.

And most likely they have codogenerator optimized for Intel.

It's an outrage anyway! I'll complain to the UN.

Market Facilitation Index - Market Facilitation Index - Market Facilitation Index -

Aleksey Lebedev 2012.03.04 22:29 #209

junior Intel)

2012.03.05 02:03:33     ParallelTester_00-01 x (EURUSD,D1)       OpenCL init OK!
2012.03.05 02:03:33     ParallelTester_00-01 x (EURUSD,D1)       GPU time = 234 ms
2012.03.05 02:03:33     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.05 02:03:33     ParallelTester_00-01 x (EURUSD,D1)       Result on Gpu МахResult==1.03434 at 315 pass
2012.03.05 02:04:01     ParallelTester_00-01 x (EURUSD,D1)       CPU time = 27471 ms
2012.03.05 02:04:01     ParallelTester_00-01 x (EURUSD,D1)       Соunt inticators = 16; Count history bars = 144000; Count pass = 1280
2012.03.05 02:04:01     ParallelTester_00-01 x (EURUSD,D1)       Result on Cpu МахResult==1.03434 at 315 pass
2012.03.05 02:04:01     ParallelTester_00-01 x (EURUSD,D1)       CpuTime/GpuTime = 117.3974358974359

2012.03.05 01:54:17     Terminal        GPU: NVIDIA Corporation GeForce GT 520 with OpenCL 1.1 (1 units, 1620 MHz, 512 Mb, version 285.62)

I couldn't find a line about the CPU in the logs for some reason.

Intel Celeron G530 2.4GHz

Dmitriy Parfenovich 2012.03.04 22:40 #210

Here's where I don't get it:

I have a GeForce GT 440 with OpenCL 1.1 (2 units, 1660 MHz, 1024 Mb, version 295.73) GPU time = 1513 ms

I have GeForce GT 520 with OpenCL 1.1 (1 units, 1620 MHz, 512 Mb, version 285.62) GPU time = 234 ms

How is this possible?

Here GeForce GT 440 and here GeForce GT 520 compared specifications, mine is by all parameters more, but the runtime is 6.5 times more.

NVIDIA GeForce GT 440 | NVIDIA

www.nvidia.ru

Установи GeForce в свой ПК для максимальной производительности и разгони свою цифровую жизнь. Смотри потоковые HD фильмы и выводи фотографии высокого разрешения без задержек. Создай дома 3D кинотеатр с помощью Blu-Ray 3D™ и NVIDIA® GeForce®. Разгони свои любимые приложения и раздвинь границы возможностей интернет с новым поколением браузеров с...

Price Calculation - MQL5 Spreads - For Advanced Fibonacci Time Zones -

OpenCL: internal implementation tests in MQL5 - page 21