Which is faster - Floating-Point or Integer arithmetic? - page 12

 

On further research into the more modern CPUs and following up on several sources, I did find that several experts on the matter consider that one has to be very careful in applying the assumption that the floating point arithmetic seems to be executed faster than Integer arithmetic.

As a general rule, the Integer arithmetic is ALWAYS faster, but due to something called "speculative execution" and the pipeline effect, the end results could end-up making floating point seem faster. This is probably the reason why the CPUs with more cores displayed these favorable results. In other words, since these CPUs were not being pushed to the full potential, the extra available processing power was used to carry out extra anticipated calculations and in so doing optimise the results for the floating point arithmetic. When the CPU has less cores or when it is under high utilization, this effect is less pronounced or noticeable.

Please note that my explanation of these facts described here are still quite "amateurish", as I am still learning about how all this works.

So, what seemed to be for a great deal of this thread, the idea that floating point is faster, it was in fact just "smoke and mirrors" due to hardware optimisation. Even though there was an obvious attempt by the coder to make sure that the test code not fall into the trap of compiler optimisation, we did however fall into the trap of hardware optimisation.

So, the conclusion here, is that Integer arithmetic is ALWAYS faster, but the various levels of optimisation, both at the compiler level as well as hardware level, can make floating point seem faster. The only true way to know is to test the EA in the various environments that one wants it to perform well, in order to evaluate which will be the better choice:

  • For VPS environments, where one one wants to cut costs and where the underlying hardware can actually be under quite high usage stress, it seems that Integer arithmetic will be your ultimate ally.
  • During back-tests, especially during optimisations, here again due to the extra strain on the CPU, Integer arithmetic can also be a better choice.
  • For more lax environments or on super powerful CPU's, floating point will almost certainly work faster.

So what ever the reason for needing the performance boost in your EA, take the time to test it out in the correct environment so that you can evaluate what the best or faster method will be for you.

Overall, if your objective is to pack as many EAs as you can on a single machine, then using Integer arithmetic will be your better choice - but don't just accept that - test it for yourself and find out!

NB! One more very important fact! Given the recent vulnerability issues with speculative execution, and the work-around fixes and patches for both hardware and software to prevent attacks via this method, the floating-point advantage could be severely limited, making the Integer arithmetic even more attractive. This is however, is just as supposition, as I currently do not know how these patches and fixes will affect said performance, but it is a possibility! Only time will tell!

 

Overall, if your objective is to pack as many EAs as you can on a single machine, then using Integer arithmetic will be your better choice - but don't just accept that - test it for yourself and find out!

We're jumping the gun big-time! In order to come to this conclusion we need a way to quantify the number of int operations it takes on a compact VPS (the only scenario where int outperformed) to offset a single double->int conversion. 

if((number_of_int_operations_on_former_doubles) > (cost_of_double_conversion * number_of_doubles_converted))
   optimization = true;
else
   optimization = false;
 
Fernando Carreiro:

Good news, as I was able to replicate your results on the VPS with only a single core:

...

However, the test did solve the "weird" riddle, as now we know that the number of cores severely affects the results. For 2 cores or less, it seems that Integer arithmetic is much faster, but on a 4 core the floating point is faster (but not by much). The number of cores alone may not be the true reason, but maybe another factor is the cause, but it is an indication of the "weird" results.

...

So I ran the scripts (1E8 iterations), once again :-D

We can easily when the script started, finished and that all 4 logical processors where used at 100%. I also checked the MT5 threads, and I confirm there was only 1 additional thread while the script was running.

Conclusion: 1 thread can used 100% of all cores. I was not aware about that.


What I couldn't understand was why the execution time is multiplied by 100 when the iterations are just multiplied by 10 (hardcoded in script, so recompiled between the 2 runs).

2018.01.16 17:54:06.223 224626_2 (NZDUSD,H1) <int>: 37 ms for 10000000 iterations

2018.01.16 17:54:06.251 224626_2 (NZDUSD,H1) <double>: 27 ms for 10000000 iterations


2018.01.16 17:59:14.606 224626_2 (NZDUSD,H1) <int>: 3545 ms for 100000000 iterations

2018.01.16 17:59:18.672 224626_2 (NZDUSD,H1) <double>: 4062 ms for 100000000 iterations

So I changed the script a bit to use an input parameter to select the iterations count, instead of it being hardcoded. And surprise :

2018.01.16 18:04:02.855 224626_2 (NZDUSD,H1) <int>: 34 ms for 10000000 iterations

2018.01.16 18:04:02.887 224626_2 (NZDUSD,H1) <double>: 31 ms for 10000000 iterations


2018.01.16 18:03:53.974 224626_2 (NZDUSD,H1) <int>: 413 ms for 100000000 iterations

2018.01.16 18:03:54.423 224626_2 (NZDUSD,H1) <double>: 449 ms for 100000000 iterations

So there was also an MT5 compiler issue. :-)

P.S: My system workload was very low during these tests.

 
Alain Verleyen:

So I ran the scripts (1E8 iterations), once again :-D

We can easily when the script started, finished and that all 4 logical processors where used at 100%. I also checked the MT5 threads, and I confirm there was only 1 additional thread while the script was running.

Conclusion: 1 thread can used 100% of all cores. I was not aware about that.


What I couldn't understand was why the execution time is multiplied by 100 when the iterations are just multiplied by 10 (hardcoded in script, so recompiled between the 2 runs).

So I changed the script a bit to use an input parameter to select the iterations count, instead of it being hardcoded. And surprise :

So there was also an MT5 compiler issue. :-)

P.S: My system workload was very low during these tests.

Maybe it is the speculative execution at play here that consumes all the threads/cores.

Instead of using an CArray, try using standard arrays instead, and see how it holds up both for the CPU utilization as well as the hard-coded vs parameter versions.

 
Fernando Carreiro:

On further research into the more modern CPUs and following up on several sources, I did find that several experts on the matter consider that one has to be very careful in applying the assumption that the floating point arithmetic seems to be executed faster than Integer arithmetic.

As a general rule, the Integer arithmetic is ALWAYS faster, but due to something called "speculative execution" and the pipeline effect, the end results could end-up making floating point seem faster. This is probably the reason why the CPUs with more cores displayed these favorable results. In other words, since these CPUs were not being pushed to the full potential, the extra available processing power was used to carry out extra anticipated calculations and in so doing optimise the results for the floating point arithmetic. When the CPU has less cores or when it is under high utilization, this effect is less pronounced or noticeable.

Please note that my explanation of these facts described here are still quite "amateurish", as I am still learning about how all this works.

So, what seemed to be for a great deal of this thread, the idea that floating point is faster, it was in fact just "smoke and mirrors" due to hardware optimisation. Even though there was an obvious attempt by the coder to make sure that the test code not fall into the trap of compiler optimisation, we did however fall into the trap of hardware optimisation.

So, the conclusion here, is that Integer arithmetic is ALWAYS faster, but the various levels of optimisation, both at the compiler level as well as hardware level, can make floating point seem faster. The only true way to know is to test the EA in the various environments that one wants it to perform well, in order to evaluate which will be the better choice:

  • For VPS environments, where one one wants to cut costs and where the underlying hardware can actually be under quite high usage stress, it seems that Integer arithmetic will be your ultimate ally.
  • During back-tests, especially during optimisations, here again due to the extra strain on the CPU, Integer arithmetic can also be a better choice.
  • For more lax environments or on super powerful CPU's, floating point will almost certainly work faster.

So what ever the reason for needing the performance boost in your EA, take the time to test it out in the correct environment so that you can evaluate what the best or faster method will be for you.

Overall, if your objective is to pack as many EAs as you can on a single machine, then using Integer arithmetic will be your better choice - but don't just accept that - test it for yourself and find out!

NB! One more very important fact! Given the recent vulnerability issues with speculative execution, and the work-around fixes and patches for both hardware and software to prevent attacks via this method, the floating-point advantage could be severely limited, making the Integer arithmetic even more attractive. This is however, is just as supposition, as I currently do not know how these patches and fixes will affect said performance, but it is a possibility! Only time will tell!

From my last tests (see above). I would conclude it's not worth to worry about it. When you see the difference, speed should not be a criterium to choose between int and double if you have arithmetic operations to code. Other criteria mentioned in this thread are more important, and also subjectivity of the coder.

With, maybe, an exception on an heavy loaded (and "old" ?) computer, it has to be tested and confirmed.

 
Fernando Carreiro:

Maybe it is the speculative execution at play here that consumes all the threads/cores.

Instead of using an CArray, try using standard arrays instead, and see how it holds up both for the CPU utilization as well as the hard-coded vs parameter versions.

Humm...not sure what we could learn with such new test. It's a good thing that all the available CPU "power" is used. Or ... ?

PS: I you have a code using standard arrays I will happily run it, but I think all is clear for me.

 

hello,

a bump, I can see you are comparing a 32 bit type to a 64 bit, which is not a fair comparison (apples to oranges :)), although I can see doubles scores are equal to ints. That would suggest that we are dealing with "real" 64 bit x86 CPUs. For example, in the page I post below, we can see that CPUs 5 years ago where half fast at 64 bit types, compared to 32 bit (int speed were equal to float, long speed to double)


http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/ 

Integer and Floating-Point Arithmetic Speed vs Precision
  • nicolas.limare.net
IMPORTANT: Useful feedback revealed that some of these measures are seriously flawed. A major update is on the way.
 
Demos Stogios:

hello,

a bump, I can see you are comparing a 32 bit type to a 64 bit, which is not a fair comparison (apples to oranges :)), although I can see doubles scores are equal to ints. That would suggest that we are dealing with "real" 64 bit x86 CPUs. For example, in the page I post below, we can see that CPUs 5 years ago where half fast at 64 bit types, compared to 32 bit (int speed were equal to float, long speed to double)


http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/ 

Not sure where you saw a 32 bit compared to a 64 bit ? Maybe I missed it. All my tests were under 64 bits, and Fernando run tests on both 32 and 64 bits (MT4/MT5) but didn't compared them as far as I know. And all the CPUs are 64 bits.

 
Alain Verleyen:

Not sure where you saw a 32 bit compared to a 64 bit ? Maybe I missed it. All my tests were under 64 bits, and Fernando run tests on both 32 and 64 bits (MT4/MT5) but didn't compared them as far as I know. And all the CPUs are 64 bits.


Sorry, I mean ints are 32bit and doubles are 64bit, I am not talking about MetaTrader version. So we can not compare int arithmetic to double arithmetic, but int to float or (long) long to double

As for the CPUs, from the link I posted it is obvious that early 64bit CPUs are somehow not actual 64bit machines, as their 64bit performance, maybe due to memory bandwidth issues or whatever, is lacking compared to 32bit . That holds for x86 CPUs, some CPUs from IBM did not have that characteristic 
 
Demos Stogios: Sorry, I mean ints are 32bit and doubles are 64bit, I am not talking about MetaTrader version. So we can not compare int arithmetic to double arithmetic, but int to float or (long) long to doubleAs for the CPUs, from the link I posted it is obvious that early 64bit CPUs are somehow not actual 64bit machines, as their 64bit performance, maybe due to memory bandwidth issues or whatever, is lacking compared to 32bit . That holds for x86 CPUs, some CPUs from IBM did not have that characteristic 

OK! Here it is and I will leave it up to you to draw your own conclusions:

// MetaTrader 5 x64 build 1745 - Windows 10 (build 16299) x64, IE 11, UAC, Intel Core i7-4790T  @ 2.70GHz, Memory: 9042 / 16274 Mb, Disk: 353 / 893 Gb, GMT+0

2018.01.17 11:39:33.401 224626_3 (EURUSD.m,H1)  <double>: 201 ms for 10000^2 iterations (result = 99740000 sum=3.29216e+12)
2018.01.17 11:39:33.659 224626_3 (EURUSD.m,H1)  <float>:  258 ms for 10000^2 iterations (result = 99740000 sum=3.29216e+12)
2018.01.17 11:39:33.896 224626_3 (EURUSD.m,H1)  <long>:   236 ms for 10000^2 iterations (result = 99740000 sum=3292155440000)
2018.01.17 11:39:34.107 224626_3 (EURUSD.m,H1)  <int>:    211 ms for 10000^2 iterations (result = 99740000 sum=3292155440000)

2018.01.17 11:40:01.820 224626_3 (EURUSD.m,H1)  <double>: 20109 ms for 100000^2 iterations (result = 9982000000 sum=3.27964e+14)
2018.01.17 11:40:27.365 224626_3 (EURUSD.m,H1)  <float>:  25545 ms for 100000^2 iterations (result = 9982000000 sum=3.27964e+14)
2018.01.17 11:40:50.187 224626_3 (EURUSD.m,H1)  <long>:   22821 ms for 100000^2 iterations (result = 9982000000 sum=327963572900000)
2018.01.17 11:41:10.165 224626_3 (EURUSD.m,H1)  <int>:    19977 ms for 100000^2 iterations (result = 9982000000 sum=327963572900000)
Files:
224626_3.mq5  4 kb