Русский
preview
Understand and Efficiently use OpenCL API by Recreating built-in support as DLL on Linux (Part 2): OpenCL Simple DLL implementation

Understand and Efficiently use OpenCL API by Recreating built-in support as DLL on Linux (Part 2): OpenCL Simple DLL implementation

MetaTrader 5Examples | 6 April 2023, 15:57
3 123 1
Wasin Thonkaew
Wasin Thonkaew

Contents

Introduction

This part will walk us through abstracting works we've done in Part 1 of the series for a successful standalone test for OpenCL into a DLL which is usable with MQL5 program on MetaTrader 5.

This altogether will prepare us for developing a full-fledge OpenCL as DLL support in the following part to come.


Key Points

Sometimes my article is quite long that readers might get lost during the reading process, so from now I will include Key Points section emphasizing notable points worth to pay attention to.

The following is the key points readers would get from reading this article

  1. How to properly pass string from DLL to MQL5 program. Notice that we need to make sure the encoding is UTF-16 as MetaTrader 5 uses it for printing out via Print().
  2. How to build DLL that is usable by MQL5 program on MetaTrader 5.
  3. How to use key important APIs as offered by OpenCL C++ API mainly in getting platforms/devices information, and executing kernel function from initialization til getting result back.


Simple DLL Implementation

What we need to do is to abstract the code we've done in the previous article into a proper simple library as DLL that we can consume later with MQL5.

The project file structure is as follows

File Definition
Makefile Cross-compilation for both Windows, and Linux via Mingw64. It automatically copy resultant openclsimple.dll into Libraries/ directory as used for DLL search path by MetaTrader 5.
Makefile-g++ Native linux compilation for testing purpose.
openclsimple.h & openclsimple.cpp Main header and implementation of openclsimple DLL library.
util.h & util.cpp Part of openclsimple library, it provides a utility function especially a string conversion for the library.
main.cpp Cross-platform main testing program.
mql5/OpenCLSimple.mqh & mql5/TestCLSimple.mq5 MQL5 header and script testing program on MetaTrader 5


We will have the following 2 function signatures in which we will be implementing as exposed by our DLL

  1. clsimple_listall(char* out, int len, bool utf16=true)
    List all platforms and devices for notable information

  2. clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem)
    Compute a summation of two input arrays then write into output array as specified


Let's start by implementing a header file.
As usual, we will show the full source code first, then we go through chunk by chunk for its explanation.

openclsimple.h

#pragma once

#ifdef WINDOWS
        #ifdef CLSIMPLE_API_EXPORT
                #define CLSIMPLE_API __declspec(dllexport)
        #else
                #define CLSIMPLE_API __declspec(dllimport)
        #endif
#else
        #define CLSIMPLE_API
#endif

/**
 * We didn't define CL_HPP_ENABLE_EXCEPTIONS thus there would be no exceptions thrown
 * from any OpenCL related API.
 */
extern "C" {
        /**
         * List all platforms, and devices available.
         * If there any error occurs during the operation of this function, it will
         * print error onto standard error. The resultant text output is still maintained
         * separately.
         *
         * # Arguments
         * - out - output c-string to be filled
         * - len - length of output c-string to be filled
         * - utf16 - whether or not to convert string to UTF-16 encoding. Default is true.
         *                       If used on MetaTrader 5, this flag should be set to true.
         */
        CLSIMPLE_API void clsimple_listall(char* out, int len, bool utf16=true) noexcept;

        /**
         * Compute a summation of two input arrays then output into 3rd array limiting
         * by the number of elements specified.
         *
         * # Arguments
         * - arr_1 - first read-only array input holding integers
         * - arr_2 - second read-only array input holding integers
         * - arr_3 - output integer array to be filled with result of summation of both arr_1 and arr_2
         * - num_elem - number of element to be processed for both arr_1 and arr_2
         *
         * # Return
         * Returned code for result of operation. 0 means success, otherwise means failure.
         */
        CLSIMPLE_API [[nodiscard]] int clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem) noexcept;
};


#ifdef section would be common to readers by now as it's required for Windows to export functions from DLL. We can see WINDOWS, and CLSIMPLE_API_EXPORT definitions that play key role in explicitly export each function. We will be clear whenever we see code of Makefile later.

extern "C" section wraps the public API for functions that can be called by program.


openclsimple.cpp

#include "openclsimple.h"
#include "util.h"

#define CL_HPP_TARGET_OPENCL_VERSION 120
#define CL_HPP_MINIMUM_OPENCL_VERSION 120

#include <CL/cl2.hpp>

#include <iostream>
#include <vector>
#include <sstream>

#ifdef ENABLE_DEBUG
#include <cstdarg>
#endif

#ifdef ENABLE_DEBUG
const int LOG_BUFFER_SIZE = 2048;
char log_buffer[LOG_BUFFER_SIZE];

inline void DLOG(const char* ctx, const char* format, ...) {
        va_list args;
        va_start(args, format);
        std::vsnprintf(log_buffer, LOG_BUFFER_SIZE-1, format, args);
        va_end(args);

        std::cerr << "[DEBUG] [" << ctx << "] " << log_buffer << std::endl;
}
#else
        #define DLOG(...)
#endif

CLSIMPLE_API void clsimple_listall(char* out, int len, bool utf16) noexcept {
        // Get the platform
        std::vector<cl::Platform> platforms;
        int ret_code = cl::Platform::get(&platforms);
        if (ret_code != CL_SUCCESS) {
                std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl;
                return;
        }

        std::stringstream output_str;

        for (size_t i=0; i<platforms.size(); ++i) {
                auto& p = platforms[i];

                std::string tmp_str;

                ret_code = p.getInfo(CL_PLATFORM_NAME, &tmp_str);
                if (ret_code != CL_SUCCESS)
                        std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl;
                else
                        output_str << "[" << i << "] Platform: " << tmp_str << std::endl;
                
                ret_code = p.getInfo(CL_PLATFORM_VENDOR, &tmp_str);
                if (ret_code != CL_SUCCESS)
                        std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl;
                else
                        output_str << "Vendor: " << tmp_str << std::endl;

                std::vector<cl::Device> devices;
                ret_code = p.getDevices(CL_DEVICE_TYPE_ALL, &devices);
                if (ret_code != CL_SUCCESS) {
                        std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl;
                        continue;
                }

                for (size_t j=0; j<devices.size(); ++j) {
                        const auto& d = devices[j];
                        cl_device_type tmp_device_type;

                        ret_code = d.getInfo(CL_DEVICE_NAME, &tmp_str);
                        if (ret_code != CL_SUCCESS)
                                std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl;
                        else
                                output_str << " -[" << j << "] Device name: " << tmp_str << std::endl;

                        ret_code = d.getInfo(CL_DEVICE_TYPE, &tmp_device_type);
                        if (ret_code != CL_SUCCESS)
                                std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl;
                        else {
                                if (tmp_device_type & CL_DEVICE_TYPE_GPU)
                                        output_str << " -Type: GPU" << std::endl;
                                else if (tmp_device_type & CL_DEVICE_TYPE_CPU)
                                        output_str << " -Type: CPU" << std::endl;
                                else if (tmp_device_type & CL_DEVICE_TYPE_ACCELERATOR)
                                        output_str << " -Type: Accelerator" << std::endl;
                                else
                                        output_str << " -Type: Unknown" << std::endl;
                        }
                }
        }

        // keep a copy of the string from stringstream
        std::string copy_str = output_str.str();
        if (utf16)
                util::str_to_cstr_u16(copy_str, out, len);
        else
                util::str_to_cstr(copy_str, out, len);
}

CLSIMPLE_API int clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem) noexcept {
        cl_int ret_code = CL_SUCCESS;

        // Get the platform
        std::vector<cl::Platform> platforms;
        ret_code = cl::Platform::get(&platforms);
        if (ret_code != CL_SUCCESS) {
                std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl;
                return ret_code;
        }
        DLOG(__FUNCTION__, "found %d platform(s)", platforms.size());

        if (platforms.empty()) {
                std::cerr << "Error found 0 platform." << std::endl;
                return CL_DEVICE_NOT_FOUND;             // reuse this error value
        }

        cl::Platform platform = platforms[0];
        DLOG(__FUNCTION__, "%s", "passed getting platforms");

        // Get the device
        std::vector<cl::Device> devices;
        ret_code = platform.getDevices(CL_DEVICE_TYPE_GPU, &devices);
        if (ret_code != CL_SUCCESS) {
                std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl;
                return ret_code;
        }
        DLOG(__FUNCTION__, "found %d GPU device(s)", devices.size());

        if (devices.empty()) {
                std::cerr << "Error found 0 device." << std::endl;
                return CL_DEVICE_NOT_FOUND;
        }
        cl::Device device = devices[0];
        DLOG(__FUNCTION__, "%s", "passed getting a GPU device");

        // Create the context
        cl::Context context(device);

        DLOG(__FUNCTION__, "%s", "passed creating a context");

        // Create the command queue
        cl::CommandQueue queue(context, device);

        DLOG(__FUNCTION__, "%s", "passed creating command queue");

        // Create the kernel
        std::string kernelCode = "__kernel void add(__global int* a, __global int* b, __global int* c, int size) { "
                                                         "              int i = get_global_id(0);"
                                                         "              if (i < size)"
                                                         "                      c[i] = a[i] + b[i];"
                                                         "}";
        cl::Program::Sources sources;
        sources.push_back({kernelCode.c_str(), kernelCode.length()});
        cl::Program program(context, sources);
        ret_code = program.build({device});
        if (ret_code != CL_SUCCESS) {
                std::cerr << "Error cl::Program::build(), code=" << ret_code << std::endl;
                return ret_code;
        }

        DLOG(__FUNCTION__, "%s", "passed building a kernel program");

        cl::Kernel kernel(program, "add");

        DLOG(__FUNCTION__, "%s", "passed adding kernel function");

        // Create buffers
        cl::Buffer buffer_a(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_1));
        cl::Buffer buffer_b(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_2));
        cl::Buffer buffer_c(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, arr_3);

        kernel.setArg(0, buffer_a);
        kernel.setArg(1, buffer_b);
        kernel.setArg(2, buffer_c);
        kernel.setArg(3, num_elem);

        DLOG(__FUNCTION__, "%s", "passed setting all arguments");

        // execute the kernel function
        // NOTE: this is a blocking call although enqueuing is async call but the current thread
        // will be blocked until he work is done. Work is done doesn't mean that the result buffer
        // will be written back at the same time.
        //
        ret_code = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(num_elem), cl::NullRange);
        if (ret_code != CL_SUCCESS) {
                std::cerr << "Error cl::CommandQueue::enqueueNDRangeKernel(), code=" << ret_code << std::endl;
                return ret_code;
        }

        // CL_TRUE to make it blocking call
        // it requires for moving data from device back to host
        // NOTE: Important to call this function to make sure the result is sent back to host.
        ret_code = queue.enqueueReadBuffer(buffer_c, CL_TRUE, 0, sizeof(int) * num_elem, arr_3);
        if (ret_code != CL_SUCCESS)
                std::cerr << "Error cl::CommandQueue::enqueueReadBuffer(), code=" << ret_code << std::endl;

        return ret_code;
}


There are 3 main parts to be looking at from the code above.

  1. DLOG utility for debug logging purpose
  2. clsimple_listall()
    • util.h & util.cpp - string conversion utility making it ready to pass string from DLL to MQL5
    1. clsimple_compute()

    DLOG

    #ifdef ENABLE_DEBUG
    #include <cstdarg>
    #endif
    
    #ifdef ENABLE_DEBUG
    const int LOG_BUFFER_SIZE = 2048;
    char log_buffer[LOG_BUFFER_SIZE];
    
    inline void DLOG(const char* ctx, const char* format, ...) {
            va_list args;
            va_start(args, format);
            std::vsnprintf(log_buffer, LOG_BUFFER_SIZE-1, format, args);
            va_end(args);
    
            std::cerr << "[DEBUG] [" << ctx << "] " << log_buffer << std::endl;
    }
    #else
            #define DLOG(...)
    #endif
    

    This is a logging utility that will print out onto standard error output. There is #ifdef guard for conditional checking whenever we build the project with ENABLE_DEBUG supplied or not, if so then we include required header as well as DLOG() would now mean something, and usable. Otherwise, it is just nothing.

    Log buffer is set with fixed size of 2048 bytes per call. We don't expect to have that long debug message, thus it's quite enough for us.


    clsimple_listall()

    A function to list all devices across all platforms. Those devices are available to be used with OpenCL.

    It all starts with cl::Platform to get other information.

            ...
            // Get the platform
            std::vector<cl::Platform> platforms;
            int ret_code = cl::Platform::get(&platforms);
            if (ret_code != CL_SUCCESS) {
                    std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl;
                    return;
            }
            ...

    We will see this pattern of error handling a lot in this function. Firstly, we get a vector of platform. In case of error, we return with return code immediately as we cannot do anything further.

    The success return code for working with OpenCL API is CL_SUCCESS. Notice that in case of error, we will always print error message out onto standard error.

    Iterate all platforms and devices for each to get information we need

            ...
            std::stringstream output_str;
    
            for (size_t i=0; i<platforms.size(); ++i) {
                    ...
    
                    for (size_t j=0; j<devices.size(); ++j) {
                            ...
                    }
            }
            ...

    This function bases on writing the string output to the specified c-string pointer. This means we will be using std::stringstream to avoid having to create a temporary std::string and copying operation every time we need to append a string to the current result we have.

    cl::Platform is the starting point to get other information, each platform contains one or more cl::Device. So we have a double for-loop to do our work.

    Inside the loop

            ...
            for (size_t i=0; i<platforms.size(); ++i) {
                    auto& p = platforms[i];
    
                    // temporary variables to hold temporary platform/device informatin
                    std::string tmp_str;
    
                    ret_code = p.getInfo(CL_PLATFORM_NAME, &tmp_str);
                    if (ret_code != CL_SUCCESS)
                            std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl;
                    else
                            output_str << "[" << i << "] Platform: " << tmp_str << std::endl;
                    
                    ret_code = p.getInfo(CL_PLATFORM_VENDOR, &tmp_str);
                    if (ret_code != CL_SUCCESS)
                            std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl;
                    else
                            output_str << "Vendor: " << tmp_str << std::endl;
    
                    std::vector<cl::Device> devices;
                    ret_code = p.getDevices(CL_DEVICE_TYPE_ALL, &devices);
                    if (ret_code != CL_SUCCESS) {
                            std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl;
                            continue;
                    }
    
                    for (size_t j=0; j<devices.size(); ++j) {
    			cl_device_type tmp_device_type;
                            const auto& d = devices[j];
    
                            ret_code = d.getInfo(CL_DEVICE_NAME, &tmp_str);
                            if (ret_code != CL_SUCCESS)
                                    std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl;
                            else
                                    output_str << " -[" << j << "] Device name: " << tmp_str << std::endl;
    
                            ret_code = d.getInfo(CL_DEVICE_TYPE, &tmp_device_type);
                            if (ret_code != CL_SUCCESS)
                                    std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl;
                            else {
                                    if (tmp_device_type & CL_DEVICE_TYPE_GPU)
                                            output_str << " -Type: GPU" << std::endl;
                                    else if (tmp_device_type & CL_DEVICE_TYPE_CPU)
                                            output_str << " -Type: CPU" << std::endl;
                                    else if (tmp_device_type & CL_DEVICE_TYPE_ACCELERATOR)
                                            output_str << " -Type: Accelerator" << std::endl;
                                    else
                                            output_str << " -Type: Unknown" << std::endl;
                            }
                    }
            }
            ...

    Inside the loop, we have temporary variables namely tmp_str, and tmp_device_type to hold temporary information as acquiring from platform or device.
    Along the way if something goes wrong, we print out error onto standard error, otherwise append the result string onto our output_str.

    Notice that we also print out ordinal index number for both platform, and device. This can be useful if we want users to be specific in choosing which platform, and device to work with without us to find the right device every single time. This is an optional, and served as an idea for future expansion of the library.

    Information that we are looking to get that is enough for making decision later to choose to use it with OpenCL later is as follows

    • CL_PLATFORM_NAME - name of the platform
    • CL_PLATFORM_VENDOR - name of the vendor e.g. AMD, Nvidia, the pocl project, etc
    • CL_DEVICE_NAME - device name e.g. code name of GPU, or name of CPU
    • CL_DEVICE_TYPE - device type e.g. GPU, CPU

    There are whole bunch of information relating to platform, and device. Developers can take a peek at header file namely CL/CL2.h from location of your system's include path i.e. /usr/include/. Excerpted example as follows


    Platform information

    ...
    /* cl_platform_info */
    #define CL_PLATFORM_PROFILE                         0x0900
    #define CL_PLATFORM_VERSION                         0x0901
    #define CL_PLATFORM_NAME                            0x0902
    #define CL_PLATFORM_VENDOR                          0x0903
    #define CL_PLATFORM_EXTENSIONS                      0x0904
    #ifdef CL_VERSION_2_1
    #define CL_PLATFORM_HOST_TIMER_RESOLUTION           0x0905
    #endif
    ...

     

    Device information

    ...
    /* cl_device_info */
    #define CL_DEVICE_TYPE                                   0x1000
    #define CL_DEVICE_VENDOR_ID                              0x1001
    #define CL_DEVICE_MAX_COMPUTE_UNITS                      0x1002
    #define CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS               0x1003
    #define CL_DEVICE_MAX_WORK_GROUP_SIZE                    0x1004
    #define CL_DEVICE_MAX_WORK_ITEM_SIZES                    0x1005
    #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR            0x1006
    #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT           0x1007
    #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT             0x1008
    #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG            0x1009
    #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT           0x100A
    #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE          0x100B
    #define CL_DEVICE_MAX_CLOCK_FREQUENCY                    0x100C
    #define CL_DEVICE_ADDRESS_BITS                           0x100D
    #define CL_DEVICE_MAX_READ_IMAGE_ARGS                    0x100E
    #define CL_DEVICE_MAX_WRITE_IMAGE_ARGS                   0x100F
    #define CL_DEVICE_MAX_MEM_ALLOC_SIZE                     0x1010
    #define CL_DEVICE_IMAGE2D_MAX_WIDTH                      0x1011
    #define CL_DEVICE_IMAGE2D_MAX_HEIGHT                     0x1012
    #define CL_DEVICE_IMAGE3D_MAX_WIDTH                      0x1013
    #define CL_DEVICE_IMAGE3D_MAX_HEIGHT                     0x1014
    #define CL_DEVICE_IMAGE3D_MAX_DEPTH                      0x1015
    #define CL_DEVICE_IMAGE_SUPPORT                          0x1016
    #define CL_DEVICE_MAX_PARAMETER_SIZE                     0x1017
    #define CL_DEVICE_MAX_SAMPLERS                           0x1018
    #define CL_DEVICE_MEM_BASE_ADDR_ALIGN                    0x1019
    #define CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE               0x101A
    #define CL_DEVICE_SINGLE_FP_CONFIG                       0x101B
    #define CL_DEVICE_GLOBAL_MEM_CACHE_TYPE                  0x101C
    #define CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE              0x101D
    #define CL_DEVICE_GLOBAL_MEM_CACHE_SIZE                  0x101E
    #define CL_DEVICE_GLOBAL_MEM_SIZE                        0x101F
    #define CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE               0x1020
    #define CL_DEVICE_MAX_CONSTANT_ARGS                      0x1021
    #define CL_DEVICE_LOCAL_MEM_TYPE                         0x1022
    #define CL_DEVICE_LOCAL_MEM_SIZE                         0x1023
    #define CL_DEVICE_ERROR_CORRECTION_SUPPORT               0x1024
    #define CL_DEVICE_PROFILING_TIMER_RESOLUTION             0x1025
    #define CL_DEVICE_ENDIAN_LITTLE                          0x1026
    #define CL_DEVICE_AVAILABLE                              0x1027
    #define CL_DEVICE_COMPILER_AVAILABLE                     0x1028
    #define CL_DEVICE_EXECUTION_CAPABILITIES                 0x1029
    #define CL_DEVICE_QUEUE_PROPERTIES                       0x102A    /* deprecated */
    #ifdef CL_VERSION_2_0
    #define CL_DEVICE_QUEUE_ON_HOST_PROPERTIES               0x102A
    #endif
    #define CL_DEVICE_NAME                                   0x102B
    #define CL_DEVICE_VENDOR                                 0x102C
    #define CL_DRIVER_VERSION                                0x102D
    #define CL_DEVICE_PROFILE                                0x102E
    #define CL_DEVICE_VERSION                                0x102F
    #define CL_DEVICE_EXTENSIONS                             0x1030
    #define CL_DEVICE_PLATFORM                               0x1031
    #ifdef CL_VERSION_1_2
    #define CL_DEVICE_DOUBLE_FP_CONFIG                       0x1032
    #endif
    ...

    Just that we would need to know which information would be useful to our use case.


    Conversion of string into UTF-16 to be consumed by MetaTrader 5

            ...
            // keep a copy of the string from stringstream
            std::string copy_str = output_str.str();
            if (utf16)
                    util::str_to_cstr_u16(copy_str, out, len);
            else
                    util::str_to_cstr(copy_str, out, len);

    Lastly, string output from output_str needs to be converted into UTF-16 encoding due to MetaTrader 5 uses it to display text onto its Experts tab.


    Now, it's good time to see how util::str_to_cstr and util::str_to_cstr_u16 implemented.
    Note that util.h & util.cpp are never meant to be exposed and used by users of DLL. It's internally used within the library only. Thus it has no need to be compliant to C i.e. export "C" as it it the case for MetaTrader 5 when users consume exported functions from DLL.


    util.h

    #pragma once
    
    #include <string>
    
    namespace util {
            /**
             * Convert via copying from std::string to C-string.
             *
             * # Arguments
             * - str - input string
             * - out - destination c-string pointer to copy the content of string to
             * - len - length of string to copy from
             */
            void str_to_cstr(const std::string& str, char* out, unsigned len);
    
            /**
             * Convert via copying from std::string to UTF-16 string.
             *
             * # Arguments
             * - str - input string
             * - out - destination c-string pointer to copy the content of converted string
             *                 of UTF-16 to
             * - len - length of string to copy from
             */
            void str_to_cstr_u16(const std::string& str, char* out, unsigned len);
    };
    

    Those functions will do the conversion if needed then copy to the destination c-string pointer whose buffer length is the specified one.


    util.cpp

    #include "util.h"
    
    #include <cuchar>
    #include <locale>
    #include <codecvt>
    #include <cstring>
    
    namespace util {
            /* converter of byte character to UTF-16 (2 bytes) */
            std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t> ch_converter;
    
            void str_to_cstr(const std::string& str, char* out, unsigned len) {
                    const char* str_cstr = str.c_str();
                    const size_t capped_len = strlen(str_cstr) <= (len-1) ? strlen(str_cstr) : (len-1);
                    std::memcpy(out, str_cstr, capped_len+1);
            }
    
            void str_to_cstr_u16(const std::string& str, char* out, unsigned len) {
                    const char* str_cstr = str.c_str();
    
                    std::u16string u16_str = ch_converter.from_bytes(str);
                    const char16_t* u16_str_cstr = u16_str.c_str();
                    
                    const size_t capped_len = strlen(str_cstr) <= (len-1) ? strlen(str_cstr) : (len-1);
                    std::memcpy(out, u16_str_cstr, capped_len*2+1);
            }
    };
    

    The lines caps the length of the string to be working with. If length of the string is less than the specified len, then just use the length of the string. Otherwise, use len-1.
    We subtract by 1 to leave a room for a null-terminated character which will be added later in the next line.

    The line multiplies by 2 as UTF-16 has double a size of normal character encoding as used by C++ program which is UTF-8.

    The line constructs a converter for us which will convert

    • from UTF-8 to UTF-16 via std::wstring_convert::from_bytes() function
    • from UTF-16 to UTF-8 via std::wstring_convert::to_bytes() function

    The two template arguments for std::wstring_convert<_Codecvt, _Elem> can be described as follows

    • _Codecvt - the source character encoding to convert from
    • _Elem - the target character encoding to convert to

    Finally we use std::memcpy() to copy stream of bytes from the converted source string to the destination c-string pointer.


    The following is the example output as tested on my machine.
    We will revisit this fully at testing section.

    Example of clsimple_listall() as seen from Journal tab

    Example of calling clsimple_listall() on MetaTrader 5


    clsimple_compute()

    Firstly, let's see the function signature.

    CLSIMPLE_API int clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem) noexcept {
              ...
    }


    The aim is to abstract the code what we've done in the previous part of the series into a function. Fortunately for this case, most of the code can just be moved into a single function.
    We defer fully or at most abstract the code for full-fledge development later. For now, we mostly test DLL to be working properly in full loop with MQL5 on MetaTrader 5.

    Thus the code would be very similar.

    clsimple_compute function accepts the following arguments

    • arr_1 - array of read-only integer numbers
    • arr_2 - array of read-only integer numbers
    • arr_3 - output array for summation of arr_1 and arr_2
    • num_elem - number of elements to process from both input array

    The function returns the return code, not the result of the summation of both arrays.

    In reality we could change it to float or double for the type of input/output array in order to accommodate the price of asset in floating-point format. But we go with simple concept in this implementation.


    Get the platform

            ...
            cl_int ret_code = CL_SUCCESS;
    
            // Get the platform
            std::vector<cl::Platform> platforms;
            ret_code = cl::Platform::get(&platforms);
            if (ret_code != CL_SUCCESS) {
                    std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl;
                    return ret_code;
            }
            DLOG(__FUNCTION__, "found %d platform(s)", platforms.size());
    
            if (platforms.empty()) {
                    std::cerr << "Error found 0 platform." << std::endl;
                    return CL_DEVICE_NOT_FOUND;             // reuse this error value
            }
    
            cl::Platform platform = platforms[0];
            DLOG(__FUNCTION__, "%s", "passed getting platforms");
            ...
    


    What has been improved from previous part of the series is extensive error handling. We print out error message along with return error code.
    The lines with DLOG() can be ignored but it is useful when we build with ENABLE_DEBUG for debugging purpose.

    In this case, we hard-coded to use the first platform. But we can change the function to accept the ordinal index value of which platform to use based on string listing output if call with the first function clsimple_listall().

    Get the device

            ...
            // Get the device
            std::vector<cl::Device> devices;
            ret_code = platform.getDevices(CL_DEVICE_TYPE_GPU, &devices);
            if (ret_code != CL_SUCCESS) {
                    std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl;
                    return ret_code;
            }
            DLOG(__FUNCTION__, "found %d GPU device(s)", devices.size());
    
            if (devices.empty()) {
                    std::cerr << "Error found 0 device." << std::endl;
                    return CL_DEVICE_NOT_FOUND;
            }
            cl::Device device = devices[0];
            DLOG(__FUNCTION__, "%s", "passed getting a GPU device");
            ...


    In this case, we seek to find a GPU device from such platform only, and use the first one that found.


    Create the context

            ...
            // Create the context
            cl::Context context(device);
    
            DLOG(__FUNCTION__, "%s", "passed creating a context");
            ...
    


    Create the command queue

            ...
            // Create the command queue
            cl::CommandQueue queue(context, device);
    
            DLOG(__FUNCTION__, "%s", "passed creating command queue");
            ...
    


    Create the kernel

            ...
            // Create the kernel
            std::string kernelCode = "__kernel void add(__global int* a, __global int* b, __global int* c, int size) { "
                                                             "              int i = get_global_id(0);"
                                                             "              if (i < size)"
                                                             "                      c[i] = a[i] + b[i];"
                                                             "}";
            cl::Program::Sources sources;
            sources.push_back({kernelCode.c_str(), kernelCode.length()});
            cl::Program program(context, sources);
            ret_code = program.build({device});
            if (ret_code != CL_SUCCESS) {
                    std::cerr << "Error cl::Program::build(), code=" << ret_code << std::endl;
                    return ret_code;
            }
    
            DLOG(__FUNCTION__, "%s", "passed building a kernel program");
    
            cl::Kernel kernel(program, "add");
    
            DLOG(__FUNCTION__, "%s", "passed adding kernel function");
            ...
    
    To creating a kernel function, we need to create cl::Program from source string in which we need to construct cl::Program::Sources, then feed it as part of parameters of cl::Kernel's constructor.


    Create buffers

            ...
            // Create buffers
            cl::Buffer buffer_a(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_1));
            cl::Buffer buffer_b(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_2));
            cl::Buffer buffer_c(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, arr_3);
            ...
    

    There are 3 buffers.

    1. buffer_a - first input array. It's read-only, and allocated on the host that allows access from device as well.
    2. buffer_b - second input array. Same as 1.
    3. buffer_c - resultant array. It's write-only, and allocated on the host that allows access from device as well.

    You can refer to the meaning of flags used in creating OpenCL buffer in the previous part of the series.

    Notice that for arr_1, and arr_2 we do const_cast<int*> to remove const out from such variable. This is OK due to we receive const variables into the function. That ensures users that we won't modify anything to them.
    But due to constructor of cl::Buffer that requires to pass in pointer of a certain type, we need to satisfy it. So we trust such constructor to not modify anything. It should behave.


    Set argument to kernel function

            ...
            kernel.setArg(0, buffer_a);
            kernel.setArg(1, buffer_b);
            kernel.setArg(2, buffer_c);
            kernel.setArg(3, num_elem);
    
            DLOG(__FUNCTION__, "%s", "passed setting all arguments");
            ...
    

    Set arguments properly according to the kernel function signature as seen in OpenCL kernel code above.


    Execute the kernel function, and wait for result to be written back

            ...
            // execute the kernel function
            // NOTE: this is a blocking call although enqueuing is async call but the current thread
            // will be blocked until he work is done. Work is done doesn't mean that the result buffer
            // will be written back at the same time.
            //
            ret_code = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(num_elem), cl::NullRange);
            if (ret_code != CL_SUCCESS) {
                    std::cerr << "Error cl::CommandQueue::enqueueNDRangeKernel(), code=" << ret_code << std::endl;
                    return ret_code;
            }
    
            // CL_TRUE to make it blocking call
            // it requires for moving data from device back to host
            // NOTE: Important to call this function to make sure the result is sent back to host.
            ret_code = queue.enqueueReadBuffer(buffer_c, CL_TRUE, 0, sizeof(int) * num_elem, arr_3);
            if (ret_code != CL_SUCCESS)
                    std::cerr << "Error cl::CommandQueue::enqueueReadBuffer(), code=" << ret_code << std::endl;
    
            return ret_code;
    

    Text indicates the global dimension to be used for such kernel to be executed. In this case, it is the number of elements of array input. We specify cl::NullRange for local dimension to let OpenCL automatically determines the value for us.

    It's important to make a call to a function highlighted with red as we need to wait for output (as now stored on the device e.g. GPU) to be written back to the host (our machine). Ignore to do this, we may have a chance that result is not ready to be read yet after returning from this function.
    Notice that such function call is blocking-call as we specified with CL_TRUE.


    Makefile

    .PHONY: all clean openclsimple.dll main.exe
    
    COMPILER := x86_64-w64-mingw32-g++-posix
    FLAGS := -O2 -fno-rtti -std=c++17 -Wall -Wextra
    MORE_FLAGS ?=
    
    all: openclsimple.dll main.exe
            cp -afv $< ~/.mt5/drive_c/Program\ Files/MetaTrader\ 5/MQL5/Libraries/
    
    openclsimple.dll: util.o openclsimple.o
            @# check if symbolic link file to wine's opencl.dll exists, if not then create one
            test -h opencl.dll && echo "opencl.dll exists, no need to create symbolic link again" || ln -s ~/.mt5/drive_c/windows/system32/opencl.dll ./opencl.dll
            $(COMPILER) -shared $(FLAGS) $(MORE_FLAGS) -fPIC -o $@ $^ -L. -lopencl
    
    openclsimple.o: openclsimple.cpp openclsimple.h
            $(COMPILER) $(FLAGS) $(MORE_FLAGS) -DCLSIMPLE_API_EXPORT -DWINDOWS -I. -fPIC -o $@ -c $<
    
    util.o: util.cpp util.h
            $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ -c $<
    
    main.exe: main.cpp openclsimple.dll
            $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -DWINDOWS -o $@ $< -L. -lopenclsimple
    
    clean:
            rm -f openclsimple.dll main.exe opencl.dll util.o openclsimple.o
    

    Line indicates a technique to not print out a comment line when we build by prefixing the comment line with @.

    Line indicates an improvement over Makefile we've done in previous article of the series. Now instead of creating a symlink file pointing to opencl.dll located at wine's prefix i.e. the place of installation of MetaTrader 5 which is at ~/.mt5 in which the problem lies in different user name as part of the home directory path, we dynamically and newly create a symlink file every time user builds. So the symlink file will point to the correct path according to their username and home directory without a need to overwriting a path as pointed by symlink file we packaged and delivered to user.

    Line indicates that we copy the resultant DLL file namely openclsimple.dll to the location of Libraries/ that would be used by MetaTrader 5 to find DLLs in run-time. This saves us ton of time during development.


    Makefile-g++

    .PHONY: all clean libopenclsimple.so main.exe
    
    COMPILER := g++
    FLAGS := -O2 -fno-rtti -std=c++17 -Wall -Wextra
    MORE_FLAGS ?=
    
    all: libopenclsimple.so main.out
    
    libopenclsimple.so: util.o openclsimple.o
            @# NOTE: notice capital letters in -lOpenCL
            $(COMPILER) -shared $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ $^ -lOpenCL
    
    openclsimple.o: openclsimple.cpp openclsimple.h util.h
            $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ -c $<
    
    util.o: util.cpp util.h
            $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ -c $<
    
    main.out: main.cpp libopenclsimple.so
            $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -o $@ $< -L. -lopenclsimple
    
    clean:
            rm -f libopenclsimple.so main.out util.o openclsimple.o
    

    Similarly for Makefile-g++ which intended to be used natively on Linux system for quick testing purpose. Content is similar but with notable difference is that we will be linking with OpenCL library as installed on the system. Its name is different compared to Windows.


    Testing on Linux and for Windows (via Wine)

    We have build system ready. Everything is ready for us to at least test natively on Linux, and on Windows (via Wine).


    Linux

    Execute the following command

    make -f Makefile-g++

    We will have the following output files

    • libopenclsimple.so
    • main.out


    We can execute the test program with the following command

    ./main.out


    We shall see output similar to the following

    Output from testing a test program on Linux built with Makefile-g++

    Output from testing main.out on Linux as built from Makefile-g++

    The output is correct as I have no on-board GPU, but a graphics card, and of course I do have CPU.


    Windows (via Wine)

    Execute the following command

    make

    We will have the following output files

    • openclsimple.dll
    • main.exe


    We can execute the test program with the following command

    WINEPREFIX=~/.mt5 wine ./main.exe


    We shall see output similar to the following

    Ouput from testing a test program on Windows (via Wine)

    Output from testing main.exe for Windows (via Wine)

    The reason we always use WINEPREFIX=~/.mt5 is because that is the wine's prefix where MetaTrader 5 is installed by default. So we test on the same environment as MetaTrader 5 would be running.
    Same output as previously tested on Linux.


    Readers can further take output files built from Makefile to test natively on Windows. It would work and output similar result. This is left as exercise to readers.


    Testing with MetaTrader 5

    We are ready to test with MQL5 on MetaTrader 5 now.


    mql5/OpenCLSimple.mqh

    //+------------------------------------------------------------------+
    //|                                                      OpenCLX.mqh |
    //|                                          Copyright 2022, haxpor. |
    //|                                                 https://wasin.io |
    //+------------------------------------------------------------------+
    #property copyright "Copyright 2022, haxpor."
    #property link      "https://wasin.io"
    
    #import "openclsimple.dll"
    void clsimple_listall(string& out, int len);
    int clsimple_compute(const int& arr_1[], const int& arr_2[], int& arr_3[], int num_elem);
    #import
    


    Notice highlighted text below from the function signature of clsimple_listall() as exposed from DLL, the function itself has 3 arguments

    CLSIMPLE_API void clsimple_listall(char* out, int len, bool utf16=true) noexcept;

    We don't need to include utf16 argument in .mqh file because as per usage with MQL5, we always set such argument to true as we need to convert string to UTF-16 to be printable onto Experts tab of MetaTrader 5.
    Defining only first two parameters is enough.


    mql5/TestCLSimple.mq5

    //+------------------------------------------------------------------+
    //|                                                  TestOpenCLX.mq5 |
    //|                                          Copyright 2022, haxpor. |
    //|                                                 https://wasin.io |
    //+------------------------------------------------------------------+
    #property copyright "Copyright 2022, haxpor."
    #property link      "https://wasin.io"
    #property version   "1.00"
    
    #include "OpenCLSimple.mqh"
    
    #define STR_BUFFER_LEN 2048
    
    //+------------------------------------------------------------------+
    //| Script program start function                                    |
    //+------------------------------------------------------------------+
    void OnStart()
    {
            // 1: test clsimple_listall()
            // construct a string to hold resultant of platforms/devices listing
            string listall_str;
            StringInit(listall_str, STR_BUFFER_LEN, '\0');
    
            // get platforms/devices and print the result
            clsimple_listall(listall_str, STR_BUFFER_LEN);
            Print(listall_str);
    
            // 2: test clsimple_compute()
            int arr_1[];
            int arr_2[];
            int arr_3[];
    
            ArrayResize(arr_1, 10000000);
            ArrayFill(arr_1, 0, ArraySize(arr_1), 1);
    
            ArrayResize(arr_2, 10000000);
            ArrayFill(arr_2, 0, ArraySize(arr_2), 1);
    
            ArrayResize(arr_3, 10000000);
    
            uint start_time = GetTickCount();
    
            int ret_code = clsimple_compute(arr_1, arr_2, arr_3, ArraySize(arr_1));
            if (ret_code != 0) {
                    Print("Error occurs, code=", ret_code);
                    return;
            }
            Print("Elapsed time: " + (string)(GetTickCount() - start_time) + " ms");
    
            bool is_valid = true;
            for (int i=0; i<ArraySize(arr_3); ++i) {
                    if (arr_3[i] != 2) {
                            Print("Something is wrong at index=" + (string)i);
                            is_valid = false;
                    }
            }
    
            if (is_valid) {
                    Print("Passed test");
            }
    }
    

    Notice that in order to receive a string output from DLL (returned as c-string pointer by copying its buffer), we need to define a string variable and initialize its capacity for maximum length we would be supporting.

    The preparation to call clsimple_compute() would need a little more effort. We need to declare arrays of integers input, fill them with proper values, and declare an array of integers used for output. Anyway in reality, we would be reading such input data tick by tick from the asset's price and we just need to clean or prepare data on top of that slightly more before supplying them as part of arguments whenever we call clsimple_compute().

    Finally we validate the result by checking value of each element in output array. If all things went well, it will print out

    Passed test

    So place .mqh into a proper location either at the same location as .mq5 or in Includes/ directory of MetaTrader 5 installation path, then compile such .mq5 source, and finally drag-and-drop the the built program onto a chart on MetaTrader 5.
    We would see the following similar output as seen on Experts tab.

    Output from testing MQL5 program on MetaTrader 5 as seen on Experts tab

    Output from testing MQL5 (a Script type) program on MetaTrader 5.
    Notice that text is shown properly thanks to our working string conversion utility.



    Download Source code

    Readers can download the source code from zip file at the bottom-most of this article, or via github repository at github.com/haxpor/opencl-simple (look at simple/ directory, standalonetest/ is for the previous part of the series).



    Continue next part...

    This part 2 of the whole series just walked us through abstracting away the previous work we've done into a proper library implementation, properly build a DLL that can be consumed by both normal C++ program on Linux, Windows (via Wine, or natively), and MQL5 on MetaTrader 5.

    It also emphasizes on how to properly pass a string from DLL to MQL5 program as we need to convert it to UTF-16 encoding as used by MetaTrader 5 itself for displaying at least on Experts tab. If string displayed on MetaTrader 5 is correct, then we know we have done things correctly.

    Next part in the series, we will dive deeply into OpenCL C++ API to develop a full-fledge feature of OpenCL support as DLL to be used with MQL5.

    Along the process until that time, we will understand and know the requirements for us to be at most working efficiently with OpenCL API, and thus transfer such knowledge into developing a high performance MQL5 program powered by OpenCL.

    Attached files |
    Last comments | Go to discussion (1)
    Shephard Mukachi
    Shephard Mukachi | 5 Jun 2024 at 00:00
    Great article. Looking forward to the next articles in the series, thanks.
    Population optimization algorithms: Gravitational Search Algorithm (GSA) Population optimization algorithms: Gravitational Search Algorithm (GSA)
    GSA is a population optimization algorithm inspired by inanimate nature. Thanks to Newton's law of gravity implemented in the algorithm, the high reliability of modeling the interaction of physical bodies allows us to observe the enchanting dance of planetary systems and galactic clusters. In this article, I will consider one of the most interesting and original optimization algorithms. The simulator of the space objects movement is provided as well.
    Alan Andrews and his methods of time series analysis Alan Andrews and his methods of time series analysis
    Alan Andrews is one of the most famous "educators" of the modern world in the field of trading. His "pitchfork" is included in almost all modern quote analysis programs. But most traders do not use even a fraction of the opportunities that this tool provides. Besides, Andrews' original training course includes a description not only of the pitchfork (although it remains the main tool), but also of some other useful constructions. The article provides an insight into the marvelous chart analysis methods that Andrews taught in his original course. Beware, there will be a lot of images.
    Backpropagation Neural Networks using MQL5 Matrices Backpropagation Neural Networks using MQL5 Matrices
    The article describes the theory and practice of applying the backpropagation algorithm in MQL5 using matrices. It provides ready-made classes along with script, indicator and Expert Advisor examples.
    How to use ONNX models in MQL5 How to use ONNX models in MQL5
    ONNX (Open Neural Network Exchange) is an open format built to represent machine learning models. In this article, we will consider how to create a CNN-LSTM model to forecast financial timeseries. We will also show how to use the created ONNX model in an MQL5 Expert Advisor.