Alternative implementations of standard functions/approaches

 

NormalizeDouble

#define  EPSILON (1.0 e-7 + 1.0 e-13)
#define  HALF_PLUS  (0.5 + EPSILON)

double MyNormalizeDouble( const double Value, const int digits )
{
  // Добавление static ускоряет код в три раза (Optimize=0)!
  static const double Points[] = {1.0 e-0, 1.0 e-1, 1.0 e-2, 1.0 e-3, 1.0 e-4, 1.0 e-5, 1.0 e-6, 1.0 e-7, 1.0 e-8};

  return((int)((Value > 0) ? Value / Points[digits] + HALF_PLUS : Value / Points[digits] - HALF_PLUS) * Points[digits]);
}

ulong Bench( const int Amount = 1.0 e8 )
{
  double Price = 1.23456;
  const double point = 0.00001;
  
  const ulong StartTime = GetMicrosecondCount();
  
  int Tmp = 0;
  
  for (int i = 0; i < Amount; i++)
  {
    Price = NormalizeDouble(Price + point, 5); // замените на MyNormalizeDouble и почувствуйте разницу
    
    // Если убрать, то общее время выполнения будет нулевым при любом Amount (Optimize=1) - круто! В варианте NormalizeDouble оптимизации такой не будет.  
    if (i + i > Amount + Amount)
      return(0);
  }
  
  return(GetMicrosecondCount() - StartTime);
}

void OnStart( void )
{
  Print(Bench());
    
  return;
};

The result is 1123275 and 1666643 in favour of MyNormalizeDouble (Optimize=1). Without optimization, it is four times faster (in memory).


 

If you replace

static const double Points[] = {1.0 e-0, 1.0 e-1, 1.0 e-2, 1.0 e-3, 1.0 e-4, 1.0 e-5, 1.0 e-6, 1.0 e-7, 1.0 e-8};

to the switch variant, you can see the quality of the switch implementation in numbers.

 

Consider the cleaned up version of the script with NormalizeDouble:

#define  EPSILON (1.0 e-7 + 1.0 e-13)
#define  HALF_PLUS  (0.5 + EPSILON)
//+------------------------------------------------------------------+
//|                                                                  |
//+------------------------------------------------------------------+
double MyNormalizeDouble(const double Value,const int digits)
  {
   static const double Points[]={1.0 e-0,1.0 e-1,1.0 e-2,1.0 e-3,1.0 e-4,1.0 e-5,1.0 e-6,1.0 e-7,1.0 e-8};

   return((int)((Value > 0) ? Value / Points[digits] + HALF_PLUS : Value / Points[digits] - HALF_PLUS) * Points[digits]);
  }
//+------------------------------------------------------------------+
//|                                                                  |
//+------------------------------------------------------------------+
ulong BenchStandard(const int Amount=1.0 e8)
  {
   double       Price=1.23456;
   const double point=0.00001;
   const ulong  StartTime=GetMicrosecondCount();
//---
   for(int i=0; i<Amount;i++)
     {
      Price=NormalizeDouble(Price+point,5);
     }
   
   Print("Result: ",Price);   // специально выводим результат, чтобы цикл не оптимизировался в ноль
//---
   return(GetMicrosecondCount() - StartTime);
  }
//+------------------------------------------------------------------+
//|                                                                  |
//+------------------------------------------------------------------+
ulong BenchCustom(const int Amount=1.0 e8)
  {
   double       Price=1.23456;
   const double point=0.00001;
   const ulong  StartTime=GetMicrosecondCount();
//---
   for(int i=0; i<Amount;i++)
     {
      Price=MyNormalizeDouble(Price+point,5);
     }
   
   Print("Result: ",Price);   // специально выводим результат, чтобы цикл не оптимизировался в ноль
//---
   return(GetMicrosecondCount() - StartTime);
  }
//+------------------------------------------------------------------+
//|                                                                  |
//+------------------------------------------------------------------+
void OnStart(void)
  {
   Print("Standard: ",BenchStandard()," msc");
   Print("Custom:   ",BenchCustom(),  " msc");
  }

Results:

Custom:   1110255 msc
Result:   1001.23456

Standard: 1684165 msc
Result:   1001.23456

Immediate remarks and explanations:

  1. static is necessary here so that the compiler takes this array outside the function and doesn't construct it on the stack every time the function is called. The C++ compiler does the same.
    static const double Points
  2. To prevent the compiler from throwing the loop away because it is useless, we should use the results of calculations. For example, Print the variable Price.

  3. There is an error in your function - boundaries of digits are not checked, which may easily lead to array overruns.

    For example, call it as MyNormalizeDouble(Price+point,10) and catch the error:
    array out of range in 'BenchNormalizeDouble.mq5' (19,45)
    
    The method of speeding up by not checking is acceptable, but not in our case. We must handle any erroneous data input.

  4. Let's add a simple condition for an index greater than 8. To simplify the code, replace the type of the variable digits with uint, to make one comparison for >8 instead of an additional condition <0
    //+------------------------------------------------------------------+
    //|                                                                  |
    //+------------------------------------------------------------------+
    double MyNormalizeDouble(const double Value,uint digits)
      {
       static const double Points[]={1.0 e-0,1.0 e-1,1.0 e-2,1.0 e-3,1.0 e-4,1.0 e-5,1.0 e-6,1.0 e-7,1.0 e-8};
    //---
       if(digits>8)
          digits=8;
    //---
       return((int)((Value > 0) ? Value / Points[digits] + HALF_PLUS : Value / Points[digits] - HALF_PLUS) * Points[digits]);
      }
    

  5. Let's run the code and... We are surprised!
    Custom:   1099705 msc
    Result:   1001.23456
    
    Standard: 1695662 msc
    Result:   1001.23456
    
    Your code has overtaken the standard NormalizeDouble function even more!

    Moreover, addition of the condition even reduces the time (actually it is within the error margin). Why is there such a difference in speed?

  6. All this has to do with a standard error of performance testers.

    When writing tests you should keep in mind the full list of optimizations that can be applied by the compiler. You need to be clear about what input data you are using and how it will be destroyed when you write a simplified sample test.

    Let's evaluate and apply the whole set of optimizations that our compiler does, step by step.

  7. Let's start with constant propagation - this is one of the important mistakes you made in this test.

    You have half of your input data as constants. Let's rewrite the example with their propagation in mind.

    ulong BenchStandard(void)
      {
       double      Price=1.23456;
       const ulong StartTime=GetMicrosecondCount();
    //---
       for(int i=0; i<1.0 e8;i++)
         {
          Price=NormalizeDouble(Price + 0.00001,5);
         }
    
       Print("Result: ",Price);
    //---
       return(GetMicrosecondCount() - StartTime);
      }
    
    ulong BenchCustom(void)
      {
       double      Price=1.23456;
       const ulong StartTime=GetMicrosecondCount();
    //---
       for(int i=0; i<1.0 e8;i++)
         {
          Price=MyNormalizeDouble(Price + 0.00001,5);
         }
    
       Print("Result: ",Price," ",1.0 e8);
    //---
       return(GetMicrosecondCount() - StartTime);
      }
    
    After launching it, nothing has changed - it must be so.

  8. Go on - inline your code (our NormalizeDouble cannot be inlined)

    This is what your function will look like in reality after inevitable inline. Saving on calls, saving on array fetches, checks are removed due to constant analysis:
    ulong BenchCustom(void)
      {
       double              Price=1.23456;
       const ulong         StartTime=GetMicrosecondCount();
    //---
       for(int i=0; i<1.0 e8;i++)
         {
          //--- этот код полностью вырезается, так как у нас заведомо константа 5
          //if(digits>8)
          //   digits=8;
          //--- распространяем переменные и активно заменяем константы
          if((Price+0.00001)>0)
             Price=int((Price+0.00001)/1.0 e-5+(0.5+1.0 e-7+1.0 e-13))*1.0 e-5;
          else
             Price=int((Price+0.00001)/1.0 e-5-(0.5+1.0 e-7+1.0 e-13))*1.0 e-5;
         }
    
       Print("Result: ",Price);
    //---
       return(GetMicrosecondCount() - StartTime);
      }
    
    I didn't summarise pure constants so as not to waste time. they are all guaranteed to collapse at compile time.

    Run the code and get the same time as in the original version:
    Custom:   1149536 msc
    Standard: 1767592 msc
    
    don't mind the chattering of numbers - at the level of microseconds, timer error and floating load on the computer, this is within normal limits. the proportion is fully maintained.

  9. Look at the code you actually started testing because of the fixed source data.

    Since the compiler has a very powerful optimization, your task was effectively simplified.


  10. So how should you test for performance?

    By understanding how the compiler works, you need to prevent it from applying pre-optimizations and simplifications.

    For example, let's make the digits parameter variable:

    #define  EPSILON (1.0 e-7 + 1.0 e-13)
    #define  HALF_PLUS  (0.5 + EPSILON)
    //+------------------------------------------------------------------+
    //|                                                                  |
    //+------------------------------------------------------------------+
    double MyNormalizeDouble(const double Value,uint digits)
      {
       static const double Points[]={1.0 e-0,1.0 e-1,1.0 e-2,1.0 e-3,1.0 e-4,1.0 e-5,1.0 e-6,1.0 e-7,1.0 e-8};
    //---
       if(digits>8)
          digits=8;
    //---   
       return((int)((Value > 0) ? Value / Points[digits] + HALF_PLUS : Value / Points[digits] - HALF_PLUS) * Points[digits]);
      }
    //+------------------------------------------------------------------+
    //|                                                                  |
    //+------------------------------------------------------------------+
    ulong BenchStandard(const int Amount=1.0 e8)
      {
       double       Price=1.23456;
       const double point=0.00001;
       const ulong  StartTime=GetMicrosecondCount();
    //---
       for(int i=0; i<Amount;i++)
         {
          Price=NormalizeDouble(Price+point,2+(i&15));
         }
    
       Print("Result: ",Price);   // специально выводим результат, чтобы цикл не оптимизировался в ноль
    //---
       return(GetMicrosecondCount() - StartTime);
      }
    //+------------------------------------------------------------------+
    //|                                                                  |
    //+------------------------------------------------------------------+
    ulong BenchCustom(const int Amount=1.0 e8)
      {
       double       Price=1.23456;
       const double point=0.00001;
       const ulong  StartTime=GetMicrosecondCount();
    //---
       for(int i=0; i<Amount;i++)
         {
          Price=MyNormalizeDouble(Price+point,2+(i&15));
         }
    
       Print("Result: ",Price);   // специально выводим результат, чтобы цикл не оптимизировался в ноль
    //---
       return(GetMicrosecondCount() - StartTime);
      }
    //+------------------------------------------------------------------+
    //|                                                                  |
    //+------------------------------------------------------------------+
    void OnStart(void)
      {
       Print("Standard: ",BenchStandard()," msc");
       Print("Custom:   ",BenchCustom()," msc");
      }
    
    Run it and... we get the same speed result as before.

    Your code gains about 35% as before.

  11. So why is it so?

    We still cannot save ourselves from optimization due to inlining. Saving 100 000 000 calls by passing data through the stack into our function NormalizeDouble, which is similar in implementation, might well give the same speed increase.

    There is another suspicion that our NormalizeDouble has not been implemented in the direct_call mechanism when loading the function relocation table in MQL5 program.

    We'll check it in the morning and if so, we'll move it to direct_call and check the speed again.

Here is a study of NormalizeDouble.

Our MQL5 compiler has beaten our system function, which shows its adequacy when compared to the speed of C++ code.

 
fxsaber:

If you replace

to the switch variant, you can see the quality of the switch implementation in numbers.

You are confusing direct indexed access to a static array by a constant index (which degenerates into a constant from a field) and switch.

Switch can't really compete with such a case. Switch has several frequently used optimizations of the form:

  • "notoriously ordered and short values are put into a static array and indexed" - the simplest and fastest, can compete with the static array, but not always
  • "several arrays by ordered and close chunks of values with zone boundary checks" - this already has a brake
  • "we check too few values through if" - no speed, but it is the programmer's own fault, he uses switch inappropriately
  • "very sparse ordered table with binary search" - very slow for the worst cases

In fact, the best strategy for switch is when the developer deliberately tried to make a compact set of values in the lower set of numbers.

 
Renat Fatkhullin:

Consider the cleaned up version of the script with NormalizeDouble:

Results:


Immediate remarks and explanations:

  1. static is needed here for the compiler to put this array outside the function and not build it on the stack at each function call. The C++ compiler does the same thing.
With "Optimize=0" this is the case. With "Optimize=1", you can even throw it out - the optimizer compiler is smart, as it turns out.
  1. To prevent the compiler from throwing out the loop due to its uselessness, we must use the results of calculations. For example, Print the variable Price.
What a cool trick!
  1. There is an error in your function that does not check the bounds of digits, which can easily lead to array overruns.

    For example, call it as MyNormalizeDouble(Price+point,10) and catch the error:
    The method of speeding up by not checking is acceptable, but not in our case. We must handle any erroneous data input.

  2. Let's add a simple condition about the index greater than 8. To simplify the code, let's replace the type of the variable digits with uint, to make one comparison for >8 instead of additional condition <0
It seems to be more optimal!
double MyNormalizeDouble( const double Value, const uint digits )
{
  static const double Points[] = {1.0 e-0, 1.0 e-1, 1.0 e-2, 1.0 e-3, 1.0 e-4, 1.0 e-5, 1.0 e-6, 1.0 e-7, 1.0 e-8};
  const double point = digits > 8 ? 1.0 e-8 : Points[digits];

  return((int)((Value > 0) ? Value / point + HALF_PLUS : Value / point - HALF_PLUS) * point);
}
  1. This is a standard error of performance testers.

    When writing tests we should keep in mind the full list of optimizations that can be applied by the compiler. You need to be clear about what input data you are using and how it will be destroyed when you write a simplified sample test.
  2. So how should you test for performance?

    By understanding how the compiler works, you need to prevent it from applying pre-optimizations and simplifications.

    For example, let's make the digits parameter variable:
Thank you very much for thorough explanations of how to correctly prepare the compiler's performance measurements! Really didn't take into account the possibility of optimizing the constant.

This is the NormalizeDouble study.

Our MQL5 compiler beat our system function, which shows its adequacy when compared to the speed of C++ code.

Yes, this result is a matter of pride.
 
Renat Fatkhullin:

You are confusing direct indexed access to a static array by a constant index (which degenerates into a constant from a field) and switch.

Switch can't really compete with such a case. Switch has some commonly used optimizations of the kind:

  • The "deliberately ordered and short values are put into a static array and indexed by switch" is the simplest and fastest, and can compete with a static array, but not always.

This is just such a case of ordering.

In fact, the best strategy for switch is when the developer has deliberately tried to make a compact set of values in the bottom set of numbers.

Tried it on a 32 bit system. There, replacement of the switch in the above example caused serious lags. I haven't tested it on a new machine.
 
fxsaber:

Here's just such a case of orderliness.

We have to check it separately, but later.


Tried it on a 32 bit system. The change to switch in the example above caused serious braking. I haven't checked it on the new machine.

There are actually two compiled programs in every MQL5: a simplified one for 32 bits and one maximally optimized for 64 bits. In 32 bit MT5 the new optimizer doesn't apply at all and the code for 32 bit operations is as simple as MQL4 in MT4.

All the efficiency of the compiler that can generate code ten times faster only when executed in the 64-bit version of MT5: https://www.mql5.com/ru/forum/58241

We are fully focused on 64-bit versions of the platform.

 

On the subject of NormalizeDouble there is this nonsense

Forum on trading, automated trading systems and strategy testing

How do I go through an enumeration consistently?

fxsaber, 2016.08.26 16:08

There is this note in the function description

This is only true for symbols which have minimum price step 10^N, where N is integer and not positive. If the minimum price step has a different value, then normalizing the price levels before OrderSend is a meaningless operation that will return false OrderSend in most cases.


It's a good idea to correct outdated representations in the help.

NormalizeDouble is completely discredited. Not only slow implementation, but also meaningless on multiple exchange symbols (e.g. RTS, MIX, etc.).

NormalizeDouble was originally created by you for Order* operations. Mainly for prices and lots. But non-standard TickSize and VolumeStep appeared. And the function is simply obsolete. Because of this they write slow code. An example from the standard library
double CTrade::CheckVolume(const string symbol,double volume,double price,ENUM_ORDER_TYPE order_type)
  {
//--- check
   if(order_type!=ORDER_TYPE_BUY && order_type!=ORDER_TYPE_SELL)
      return(0.0);
   double free_margin=AccountInfoDouble(ACCOUNT_FREEMARGIN);
   if(free_margin<=0.0)
      return(0.0);
//--- clean
   ClearStructures();
//--- setting request
   m_request.action=TRADE_ACTION_DEAL;
   m_request.symbol=symbol;
   m_request.volume=volume;
   m_request.type  =order_type;
   m_request.price =price;
//--- action and return the result
   if(!::OrderCheck(m_request,m_check_result) && m_check_result.margin_free<0.0)
     {
      double coeff=free_margin/(free_margin-m_check_result.margin_free);
      double lots=NormalizeDouble(volume*coeff,2);
      if(lots<volume)
        {
         //--- normalize and check limits
         double stepvol=SymbolInfoDouble(symbol,SYMBOL_VOLUME_STEP);
         if(stepvol>0.0)
            volume=stepvol*(MathFloor(lots/stepvol)-1);
         //---
         double minvol=SymbolInfoDouble(symbol,SYMBOL_VOLUME_MIN);
         if(volume<minvol)
            volume=0.0;
        }
     }
   return(volume);
  }

Well, you can not do so clumsily! It could be many times faster, forgetting about NormalizeDouble.

double NormalizePrice( const double dPrice, double dPoint = 0 )
{
  if (dPoint == 0) 
    dPoint = ::SymbolInfoDouble(::Symbol(), SYMBOL_TRADE_TICK_SIZE);

  return((int)((dPrice > 0) ? dPrice / dPoint + HALF_PLUS : dPrice / dPoint - HALF_PLUS) * dPoint);
}

And for the same volume then do

volume = NormalizePrice(volume, stepvol);

For prices do

NormalizePrice(Price, TickSize)

It seems correct to add something similar to overload the NormalizeDouble standard. Where the second parameter "digits" will be a double instead of int.

 

By 2016, most C++ compilers have arrived at the same levels of optimisation.

MSVC makes one wonder about the improvements with every update, and Intel C++ as a compiler has merged - it still hasn't recovered from its "internal error" on large projects.

Another of our improvements in the compiler in the 1400 build is that it is faster at compiling complex projects.

 

On topic. You have to create alternatives to the standard functions, because they sometimes give you the wrong output. Here's an example of SymbolInfoTick alternative

// Получение тика, который на самом деле вызвал крайнее событие NewTick
bool MySymbolInfoTick( const string Symb, MqlTick &Tick, const uint Type = COPY_TICKS_ALL )
{
  MqlTick Ticks[];
  const int Amount = ::CopyTicks(Symb, Ticks, Type, 0, 1);
  const bool Res = (Amount > 0);
  
  if (Res)
    Tick = Ticks[Amount - 1];
  
  return(Res);
}

// Возвращает в точности то, что SymbolInfoTick
bool CloneSymbolInfoTick( const string Symb, MqlTick &Tick )
{
  MqlTick TickAll, TickTrade, TickInfo;
  const bool Res = (MySymbolInfoTick(Symb, TickAll) &&
                    MySymbolInfoTick(Symb, TickTrade, COPY_TICKS_TRADE) &&
                    MySymbolInfoTick(Symb, TickInfo, COPY_TICKS_INFO));
  
  if (Res)
  {
    Tick = TickInfo;

    Tick.time = TickAll.time;
    Tick.time_msc = TickAll.time_msc;
    Tick.flags = TickAll.flags;
    
    Tick.last = TickTrade.last;
    Tick.volume = TickTrade.volume;    
  }
  
  return(Res);
}

You may call SymbolInfoTick on each event NewTick in the tester and sum up volume-field to know stock turnover. But no, you can not! We have to make a much more logical MySymbolInfoDouble.

 
fxsaber:

On the subject of NormalizeDouble there is this nonsense

NormalizeDouble was originally created by you for Order* operations. Mainly for prices and lots. But non-standard TickSize and VolumeStep appeared. And the function is simply obsolete. Because of this they write slow code. Here is an example from the standard library

Well, it cannot be so clumsy! You can make it many times faster forgetting about NormalizeDouble.

And for the same volume do

For prices do

It seems correct to add something like this as an overload to the NormalizeDouble standard. Where the second parameter "digits" will be a double instead of int.

You can optimize everything around it.

This is an endless process. But in 99% of cases it is economically unprofitable.