Looking for insight to AVX support by MQL5

 

Hello there,

I've been experimenting with AVX (and its derivatives AVX2 + AVX512) support in MQL5. - Assuming, and to some extend verified, AVX is being utilized by the compiler, if enabled, we should be able to see significant speed increase on some types of operations. - Due to the lack of documentation, I have taken simple assumptions, required to make any approach to "get access" to SIMD (Single Instruction Multiple Data == AVX) operations.

In this case I focused on a simple vectorf data type, since it seemse to be sensible to assume, this data type has the closest possibility to be utilized as an input to a SIMD operation. - To proof actually the compiler uses SIMD, I have written following code:


//+------------------------------------------------------------------+
//|                                                     AVX_Test.mq5 |
//+------------------------------------------------------------------+

#define ENUM_DEINIT_REASON  mqlplus_ENUM_DEINIT_REASON
enum ENUM_DEINIT_REASON
{
    mqlplus_REASON_PROGRAM      = REASON_PROGRAM,
    mqlplus_REASON_REMOVE       = REASON_REMOVE,
    mqlplus_REASON_RECOMPILE    = REASON_RECOMPILE,
    mqlplus_REASON_CHARTCHANGE  = REASON_CHARTCHANGE,
    mqlplus_REASON_CHARTCLOSE   = REASON_CHARTCLOSE,
    mqlplus_REASON_PARAMETERS   = REASON_PARAMETERS,
    mqlplus_REASON_ACCOUNT      = REASON_ACCOUNT,
    mqlplus_REASON_TEMPLATE     = REASON_TEMPLATE,
    mqlplus_REASON_INITFAILED   = REASON_INITFAILED,
    mqlplus_REASON_CLOSE        = REASON_CLOSE
};

//+------------------------------------------------------------------+
//| Expert initialization function                                   |
//+------------------------------------------------------------------+
int OnInit()
{
    // CPU instruction set support
    printf("\nCPU: %s; SIMD support: %s\nEX5 compiled for: %s", TerminalInfoString(TERMINAL_CPU_NAME), TerminalInfoString(TERMINAL_CPU_ARCHITECTURE), __CPU_ARCHITECTURE__);


    // Vector calculation
    
        vectorf vector_a = { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 };
        vectorf vector_b = { 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0 };
        vectorf vector_r = vector_a + vector_b;
        Print(vector_r);    
    
        const ulong start = GetMicrosecondCount();
        for(int cnt = 0; cnt < 8192; cnt++)
        {
            vector_r = vector_r + vector_a;
        }
        const ulong stop = GetMicrosecondCount();
        Print(vector_r);
        printf("Execution time: %llu micros", stop - start);

    // Warmup done
    printf("%s", "Warmup finished");

    // Return
    return(INIT_SUCCEEDED);
}


//+------------------------------------------------------------------+
//| Expert deinitialization function                                 |
//+------------------------------------------------------------------+
void OnDeinit(const int reason)
{
    const ENUM_DEINIT_REASON exit_reason = (ENUM_DEINIT_REASON)reason;
    printf("Expert exit reason: %s", EnumToString(exit_reason));

    return;
}


//+------------------------------------------------------------------+
//| Expert tick function                                             |
//+------------------------------------------------------------------+
void OnTick()
{
    // Vector calculation
    
        vectorf vector_a = { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 };
        vectorf vector_b = { 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0 };
        vectorf vector_r = vector_a + vector_b;
        Print(vector_r);    
    
        const ulong start_a = GetMicrosecondCount();
        for(int cnt = 0; cnt < 8192; cnt++)
        {
            vector_r += vector_a;
        }
        const ulong stop_a = GetMicrosecondCount();
        Print(vector_r);
        printf("Execution time (r += a): %llu micros", stop_a - start_a);


        const ulong start_b = GetMicrosecondCount();
        for(int cnt = 0; cnt < 8192; cnt++)
        {
            vector_r = vector_r + vector_b;
        }
        const ulong stop_b = GetMicrosecondCount();
        Print(vector_r);
        printf("Execution time (r = r + b): %llu micros", stop_b - start_b);


    // Kill expert
    ExpertRemove();

    // Return
    return;
}
//+------------------------------------------------------------------+


By the print out of the results, I think I actually made the compiler use (at least) once SIMD instructions to optimize the operations. - Actually it is impossible to identify and validate this, but given the speedup, it should not be related to efficient-core/performance-core differences.

The results measured had an increase in speed, though I am not able to make out how it actually works, or better said, how the compiler applies SIMD optimizations.

See screenshot:



These are the results of 4 different runs, each freshly compiled with the desired optimization.  It can be seen in the third run, the optimization was actually applied by the compiler. If you take a look at the second run, the optimization is only partially applied, obviously.

I would like to get some more insight into this subject, as it is currently a more or less an undocumented feature, and there is no way to figure out how to utilize this feature set provided by the CPUs.

Some feedback by an MQL dev with more insight would be very nice. - Currently I have the impression, this feature is more or less broken, as it is.

Also as a sidenote, AVX2 inherently had been released to add support for integer data types, which is not available through mql5 vector data type. - Also it would be nice to know how to gain access to the available SIMD set of instructions in a determined way.


PS: An enumeration for the CPU architecture would be a nice addition as well.

 - Here is a manual and incomplete implementation of such:

#define ENUM_CPU_TYPE           mqlplus_ENUM_CPU_TYPE
#define CPU_TYPE_UNKNOWN            mqlplus_CPU_TYPE_UNKNOWN
#define CPU_TYPE_INTEL              mqlplus_CPU_TYPE_INTEL
#define CPU_TYPE_AMD                mqlplus_CPU_TYPE_AMD
enum ENUM_CPU_TYPE
{
    mqlplus_CPU_TYPE_UNKNOWN    = 0,
    mqlplus_CPU_TYPE_INTEL      = 1,
    mqlplus_CPU_TYPE_AMD        = 2
};


#define ENUM_CPU_ARCHITECTURE   mqlplus_ENUM_CPU_ARCHITECTURE
#define CPU_ARCHITECTURE_UNKNOWN    mqlplus_CPU_ARCHITECTURE_UNKNOWN
#define CPU_ARCHITECTURE_X64        mqlplus_CPU_ARCHITECTURE_X64
#define CPU_ARCHITECTURE_AVX        mqlplus_CPU_ARCHITECTURE_AVX
#define CPU_ARCHITECTURE_AVX2       mqlplus_CPU_ARCHITECTURE_AVX2
#define CPU_ARCHITECTURE_AVX512     mqlplus_CPU_ARCHITECTURE_AVX512
enum ENUM_CPU_ARCHITECTURE
{
    mqlplus_CPU_ARCHITECTURE_UNKNOWN    = 0,
    mqlplus_CPU_ARCHITECTURE_X64        = 1,
    mqlplus_CPU_ARCHITECTURE_AVX        = 2,
    mqlplus_CPU_ARCHITECTURE_AVX2       = 3,
    mqlplus_CPU_ARCHITECTURE_AVX512     = 4
};


#define ENUM_EX5_ARCHITECTURE   mqlplus_ENUM_EX5_ARCHITECTURE
#define EX5_ARCHITECTURE_UNKNOWN    mqlplus_EX5_ARCHITECTURE_UNKNOWN
#define EX5_ARCHITECTURE_X64        mqlplus_EX5_ARCHITECTURE_X64
#define EX5_ARCHITECTURE_AVX        mqlplus_EX5_ARCHITECTURE_AVX
#define EX5_ARCHITECTURE_AVX2       mqlplus_EX5_ARCHITECTURE_AVX2
#define EX5_ARCHITECTURE_AVX512     mqlplus_EX5_ARCHITECTURE_AVX512
enum ENUM_EX5_ARCHITECTURE
{
    mqlplus_EX5_ARCHITECTURE_UNKNOWN    = 0,
    mqlplus_EX5_ARCHITECTURE_X64        = 1,
    mqlplus_EX5_ARCHITECTURE_AVX        = 2,
    mqlplus_EX5_ARCHITECTURE_AVX2       = 3,
    mqlplus_EX5_ARCHITECTURE_AVX512     = 4
};


struct mqlplus_cpu_architecture
{
    public:
    // Local storage

        const ENUM_CPU_TYPE         cpu_type;
        const ENUM_CPU_ARCHITECTURE cpu_architecture;
        const ENUM_EX5_ARCHITECTURE program_arch;


    public:
    // Constructor    

        mqlplus_cpu_architecture() :
            cpu_type            (__cpu_id()),
            cpu_architecture    (__cpu_arch()),
            program_arch        (__program_arch())
        { };


    // Operators

        const bool operator==(const ENUM_CPU_TYPE p_in) const           { return(cpu_type == p_in); };
        const bool operator!=(const ENUM_CPU_TYPE p_in) const           { return(cpu_type != p_in); };
        const bool operator>=(const ENUM_CPU_TYPE p_in) const           { return(cpu_type >= p_in); };
        const bool operator<=(const ENUM_CPU_TYPE p_in) const           { return(cpu_type <= p_in); };
        const bool operator> (const ENUM_CPU_TYPE p_in) const           { return(cpu_type >  p_in); };
        const bool operator< (const ENUM_CPU_TYPE p_in) const           { return(cpu_type <  p_in); };

        const bool operator==(const ENUM_CPU_ARCHITECTURE p_in) const   { return(cpu_architecture == p_in); };
        const bool operator!=(const ENUM_CPU_ARCHITECTURE p_in) const   { return(cpu_architecture != p_in); };
        const bool operator>=(const ENUM_CPU_ARCHITECTURE p_in) const   { return(cpu_architecture >= p_in); };
        const bool operator<=(const ENUM_CPU_ARCHITECTURE p_in) const   { return(cpu_architecture <= p_in); };
        const bool operator> (const ENUM_CPU_ARCHITECTURE p_in) const   { return(cpu_architecture >  p_in); };
        const bool operator< (const ENUM_CPU_ARCHITECTURE p_in) const   { return(cpu_architecture <  p_in); };

        const bool operator==(const ENUM_EX5_ARCHITECTURE p_in) const   { return(program_arch == p_in); };
        const bool operator!=(const ENUM_EX5_ARCHITECTURE p_in) const   { return(program_arch != p_in); };
        const bool operator>=(const ENUM_EX5_ARCHITECTURE p_in) const   { return(program_arch >= p_in); };
        const bool operator<=(const ENUM_EX5_ARCHITECTURE p_in) const   { return(program_arch <= p_in); };
        const bool operator> (const ENUM_EX5_ARCHITECTURE p_in) const   { return(program_arch >  p_in); };
        const bool operator< (const ENUM_EX5_ARCHITECTURE p_in) const   { return(program_arch <  p_in); };


    private:
    // Local helper functions

        const ENUM_CPU_TYPE __cpu_id()
        {
            const string cpu_name = TerminalInfoString(TERMINAL_CPU_NAME);
            
            if(StringFind(cpu_name, "Intel") != -1)
            { return(mqlplus_CPU_TYPE_INTEL); }
    
            if(StringFind(cpu_name, "AMD") != -1)
            { return(mqlplus_CPU_TYPE_AMD); }
    
            return(mqlplus_CPU_TYPE_UNKNOWN);
        };
    
        const ENUM_CPU_ARCHITECTURE __cpu_arch()
        {
            const string cpu_arch = TerminalInfoString(TERMINAL_CPU_ARCHITECTURE);
    
            if(StringFind(cpu_arch, "AVX512") != -1)
            { return(mqlplus_CPU_ARCHITECTURE_AVX512); }
    
            if(StringFind(cpu_arch, "AVX2") != -1)
            { return(mqlplus_CPU_ARCHITECTURE_AVX2); }
    
            if(StringFind(cpu_arch, "AVX") != -1)
            { return(mqlplus_CPU_ARCHITECTURE_AVX); }
    
            if(StringFind(cpu_arch, "Regular") != -1)
            { return(mqlplus_CPU_ARCHITECTURE_X64); }
    
            return(mqlplus_CPU_ARCHITECTURE_UNKNOWN);
        };
    
        const ENUM_EX5_ARCHITECTURE __program_arch()
        {
            if(StringFind(__CPU_ARCHITECTURE__, "AVX512") != -1)
            { return(mqlplus_EX5_ARCHITECTURE_AVX512); }
    
            if(StringFind(__CPU_ARCHITECTURE__, "AVX2") != -1)
            { return(mqlplus_EX5_ARCHITECTURE_AVX2); }
    
            if(StringFind(__CPU_ARCHITECTURE__, "AVX") != -1)
            { return(mqlplus_EX5_ARCHITECTURE_AVX); }
    
            if(StringFind(__CPU_ARCHITECTURE__, "Regular") != -1)
            { return(mqlplus_EX5_ARCHITECTURE_X64); }
    
            return(mqlplus_EX5_ARCHITECTURE_UNKNOWN);
        };
} __cpu_architecture;
#define CPU_TYPE            __cpu_architecture.cpu_type
#define CPU_ARCHITECTURE    int(__cpu_architecture.cpu_architecture)
#define EX5_ARCHITECTURE    int(__cpu_architecture.program_arch)


//+------------------------------------------------------------------+
//| Expert initialization function                                   |
//+------------------------------------------------------------------+
int OnInit()
{
    // CPU instruction set support
    printf("\nCPU: %s; SIMD support: %s\nEX5 compiled for: %s", TerminalInfoString(TERMINAL_CPU_NAME), TerminalInfoString(TERMINAL_CPU_ARCHITECTURE), __CPU_ARCHITECTURE__);

    switch(CPU_TYPE)
    {
        case CPU_TYPE_INTEL:            printf("Intel CPU"); break;
        case CPU_TYPE_AMD:              printf("AMD CPU"); break;
        case CPU_TYPE_UNKNOWN:          printf("Unknown CPU"); break;
    }

    switch(CPU_ARCHITECTURE)
    {
        case CPU_ARCHITECTURE_X64:      printf("CPU x64 Regular"); break;
        case CPU_ARCHITECTURE_AVX:      printf("CPU AVX Support"); break;
        case CPU_ARCHITECTURE_AVX2:     printf("CPU AVX2 Support"); break;
        case CPU_ARCHITECTURE_AVX512:   printf("CPU AVX512 Support"); break;
        case CPU_ARCHITECTURE_UNKNOWN:  printf("CPU Unsupported"); break;
    }

    switch(EX5_ARCHITECTURE)
    {
        case EX5_ARCHITECTURE_X64:      printf("EX5 x64 Regular"); break;
        case EX5_ARCHITECTURE_AVX:      printf("EX5 AVX Support"); break;
        case EX5_ARCHITECTURE_AVX2:     printf("EX5 AVX2 Support"); break;
        case EX5_ARCHITECTURE_AVX512:   printf("EX5 AVX512 Support"); break;
        case EX5_ARCHITECTURE_UNKNOWN:  printf("EX5 Unsupported"); break;
    }

    if(CPU_ARCHITECTURE > EX5_ARCHITECTURE)
    { printf("%s", "This CPU supports more than the EX5 binary has been compiled for."); }

    // Return
    return(INIT_SUCCEEDED);
}
 
Thank you for the links, I checked them.

There is only the confirmation to be found mql5 data type vector is to be used, as I assumed.

I will try to link my post to the Russian thread and see if I get any meaningful response.