To the world conspiracy plotters and complete paranoiacs, or simply those who like to control the price, you are welcome! ;) - page 7

 
Mathemat:
Yeah, you're bragging about your GPGPU calculations. Any acceleration?

Where does the notebook get its acceleration? It takes more time to copy to buffer than to calculate.

It's just a test run. There will be acceleration when the calculation is relatively heavy.

I'm just adding a script to calculate the correlation with a given set of patterns (512 patterns in parallel) for the whole history, and I think there will be a hundredfold gain (by estimation). The autopsy will show how it really will be.

 

And anyway, I'm not bragging, I'm popularising OpenCL. There! :)

It's also fun.

 
Mathemat:

How's that - any acceleration?

That's from yazzy!!!

2012.03.02 01:15:10     Tester-512_Test_001 (EURUSD,M1) СPU time = 7223 ms
2012.03.02 01:15:10     Tester-512_Test_001 (EURUSD,M1) Result on Cpu МахResult==1.01871 at 49 pass
2012.03.02 01:15:03     Tester-512_Test_001 (EURUSD,M1) GPU time = 312 ms
2012.03.02 01:15:03     Tester-512_Test_001 (EURUSD,M1) Result on Gpu МахResult==1.01871 at 49 pass
2012.03.02 01:15:02     Tester-512_Test_001 (EURUSD,M1) OpenCL init OK!

The result is the same (correctness check), but the time is very different. In this case, 23 times the difference. Anyway - it was worth it.

I ran a single layer perceptron through a history of 144000 bars, 512 passes in one go. I liked it. ;)

 
MetaDriver: In this case, 23 times the difference. Anyway, it was worth it.
That's great!
 
Mathemat:
That's pretty cool.

As it turns out, typical times are even slightly better. Here's a look at the replays:

2012.03.02 01:26:59     Tester-512_Test_001 (EURUSD,M1) СPU time = 7238 ms
2012.03.02 01:26:59     Tester-512_Test_001 (EURUSD,M1) Result on Cpu МахResult==1.80004 at 320 pass
2012.03.02 01:26:51     Tester-512_Test_001 (EURUSD,M1) GPU time = 281 ms
2012.03.02 01:26:51     Tester-512_Test_001 (EURUSD,M1) Result on Gpu МахResult==1.80004 at 320 pass
2012.03.02 01:26:51     Tester-512_Test_001 (EURUSD,M1) OpenCL init OK!
2012.03.02 01:26:48     Tester-512_Test_001 (EURUSD,M1) СPU time = 7270 ms
2012.03.02 01:26:48     Tester-512_Test_001 (EURUSD,M1) Result on Cpu МахResult==1.48404 at 207 pass
2012.03.02 01:26:41     Tester-512_Test_001 (EURUSD,M1) GPU time = 281 ms
2012.03.02 01:26:41     Tester-512_Test_001 (EURUSD,M1) Result on Gpu МахResult==1.48404 at 207 pass
2012.03.02 01:26:41     Tester-512_Test_001 (EURUSD,M1) OpenCL init OK!

25-plus times. Even very aha. :)

That was an 8-entry neuron. Now check out the 16-entry one:

2012.03.02 01:32:32     Tester-512_Test_001 (EURUSD,M1) СPU time = 14618 ms
2012.03.02 01:32:32     Tester-512_Test_001 (EURUSD,M1) Result on Cpu МахResult==1.22936 at 78 pass
2012.03.02 01:32:18     Tester-512_Test_001 (EURUSD,M1) GPU time = 327 ms
2012.03.02 01:32:18     Tester-512_Test_001 (EURUSD,M1) Result on Gpu МахResult==1.22936 at 78 pass
2012.03.02 01:32:17     Tester-512_Test_001 (EURUSD,M1) OpenCL init OK!
2012.03.02 01:32:01     Tester-512_Test_001 (EURUSD,M1) СPU time = 14618 ms
2012.03.02 01:32:01     Tester-512_Test_001 (EURUSD,M1) Result on Cpu МахResult==1.21085 at 143 pass
2012.03.02 01:31:46     Tester-512_Test_001 (EURUSD,M1) GPU time = 327 ms
2012.03.02 01:31:46     Tester-512_Test_001 (EURUSD,M1) Result on Gpu МахResult==1.21085 at 143 pass
2012.03.02 01:31:46     Tester-512_Test_001 (EURUSD,M1) OpenCL init OK!

That's 45 times the difference.

That's right - the heavier the computation, the less the constant overhead (sending arrays back and forth).

 

And that's 1024 passes in parallel:

2012.03.02 01:45:04     Tester-512_Test_001 (EURUSD,M1) СPU time = 29282 ms
2012.03.02 01:45:04     Tester-512_Test_001 (EURUSD,M1) Result on Cpu МахResult==0.73802 at 802 pass
2012.03.02 01:44:35     Tester-512_Test_001 (EURUSD,M1) GPU time = 327 ms
2012.03.02 01:44:35     Tester-512_Test_001 (EURUSD,M1) Result on Gpu МахResult==0.73802 at 802 pass
2012.03.02 01:46:36     Tester-512_Test_001 (EURUSD,M1) СPU time = 29265 ms
2012.03.02 01:46:36     Tester-512_Test_001 (EURUSD,M1) Result on Cpu МахResult==1.58618 at 821 pass
2012.03.02 01:46:06     Tester-512_Test_001 (EURUSD,M1) GPU time = 328 ms
2012.03.02 01:46:06     Tester-512_Test_001 (EURUSD,M1) Result on Gpu МахResult==1.58618 at 821 pass
2012.03.02 01:46:06     Tester-512_Test_001 (EURUSD,M1) OpenCL init OK!

Note - on GPU there is no difference at all (there are 1280 cores, i.e. the whole task fits in one pass).

The check on CPU, of course, is done sequentially.

So, the required hundredfold is practically achieved: the difference is 89.5474 times (!)

 

Yes, heavy in-loop calculations are the best for OpenCL.

Don't you or someone you know have an A8-3850 based computator lying around? It has 400 graphics pipelines (integrated), by the way!

 
Mathemat:

Don't you or someone you know have an A8-3850 based compute around? It has 400 GPUs (integrated), by the way!

Is there any confusion? Here's http://kazan.kompiko.info/prices hop.php?desc_id=111255

Seems to be nothing special, four cores only, not a peep about the conveyor.

 

You can read the description of the rock here. I highly doubt that this rock (or rather its GPU part) does not support OpenCL.

From the point of view of CPU, the stone is not very good. But it has decent integrated video and in "good" cases you can count on tens of times speed-up without using any discrete video monsters. Isn't it an economical supercomputer, eh?

And "five" is such, and it turns out that power of multicore stone is almost useless. Not counting, of course, optimization: what difference does it make to you how many cores you have, 4 or 6, if optimization has to run 24 hours a day? For that matter, run Cloud even on a not so fastest honest dual core Celeron G530...

P.S. I'm not an AMD fan, if anything. I'm just trying to calculate where all this AMD mess will eventually lead to.

 

Volodya, please run this script and report the results.

Thanks to MQL5 for the help.

Files: