Understand and Efficiently use OpenCL API by Recreating built-in support as DLL on Linux (Part 2): OpenCL Simple DLL implementation
Contents
- Introduction
- Key Points
- Simple DLL Implementation
- openclsimple.h & openclsimple.cpp
- DLOG
- clsimple_listall()
- clsimple_compute()
- util.h & util.cpp
- Makefile
- Makefile-g++
- Testing on Linux and for Windows (via Wine)
- Testing with MetaTrader 5
- mql5/OpenCLSimple.mqh
- mql5/TestCLSimple.mq5
- Download Source code
Introduction
This part will walk us through abstracting works we've done in Part 1 of the series for a successful standalone test for OpenCL into a DLL which is usable with MQL5 program on MetaTrader 5.
This altogether will prepare us for developing a full-fledge OpenCL as DLL support in the following part to come.
Key Points
Sometimes my article is quite long that readers might get lost during the reading process, so from now I will include Key Points section emphasizing notable points worth to pay attention to.
The following is the key points readers would get from reading this article
- How to properly pass string from DLL to MQL5 program. Notice that we need to make sure the encoding is UTF-16 as MetaTrader 5 uses it for printing out via Print().
- How to build DLL that is usable by MQL5 program on MetaTrader 5.
- How to use key important APIs as offered by OpenCL C++ API mainly in getting platforms/devices information, and executing kernel function from initialization til getting result back.
Simple DLL Implementation
What we need to do is to abstract the code we've done in the previous article into a proper simple library as DLL that we can consume later with MQL5.
The project file structure is as follows
File | Definition |
---|---|
Makefile | Cross-compilation for both Windows, and Linux via Mingw64. It automatically copy resultant openclsimple.dll into Libraries/ directory as used for DLL search path by MetaTrader 5. |
Makefile-g++ | Native linux compilation for testing purpose. |
openclsimple.h & openclsimple.cpp | Main header and implementation of openclsimple DLL library. |
util.h & util.cpp | Part of openclsimple library, it provides a utility function especially a string conversion for the library. |
main.cpp | Cross-platform main testing program. |
mql5/OpenCLSimple.mqh & mql5/TestCLSimple.mq5 | MQL5 header and script testing program on MetaTrader 5 |
We will have the following 2 function signatures in which we will be implementing as exposed by our DLL
- clsimple_listall(char* out, int len, bool utf16=true)
List all platforms and devices for notable information
- clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem)
Compute a summation of two input arrays then write into output array as specified
Let's start by implementing a header file.
As usual, we will show the full source code first, then we go through chunk by chunk for its explanation.
openclsimple.h
#pragma once #ifdef WINDOWS #ifdef CLSIMPLE_API_EXPORT #define CLSIMPLE_API __declspec(dllexport) #else #define CLSIMPLE_API __declspec(dllimport) #endif #else #define CLSIMPLE_API #endif /** * We didn't define CL_HPP_ENABLE_EXCEPTIONS thus there would be no exceptions thrown * from any OpenCL related API. */ extern "C" { /** * List all platforms, and devices available. * If there any error occurs during the operation of this function, it will * print error onto standard error. The resultant text output is still maintained * separately. * * # Arguments * - out - output c-string to be filled * - len - length of output c-string to be filled * - utf16 - whether or not to convert string to UTF-16 encoding. Default is true. * If used on MetaTrader 5, this flag should be set to true. */ CLSIMPLE_API void clsimple_listall(char* out, int len, bool utf16=true) noexcept; /** * Compute a summation of two input arrays then output into 3rd array limiting * by the number of elements specified. * * # Arguments * - arr_1 - first read-only array input holding integers * - arr_2 - second read-only array input holding integers * - arr_3 - output integer array to be filled with result of summation of both arr_1 and arr_2 * - num_elem - number of element to be processed for both arr_1 and arr_2 * * # Return * Returned code for result of operation. 0 means success, otherwise means failure. */ CLSIMPLE_API [[nodiscard]] int clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem) noexcept; };
#ifdef section would be common to readers by now as it's required for Windows to export functions from DLL. We can see WINDOWS, and CLSIMPLE_API_EXPORT definitions that play key role in explicitly export each function. We will be clear whenever we see code of Makefile later.
extern "C" section wraps the public API for functions that can be called by program.
openclsimple.cpp
#include "openclsimple.h" #include "util.h" #define CL_HPP_TARGET_OPENCL_VERSION 120 #define CL_HPP_MINIMUM_OPENCL_VERSION 120 #include <CL/cl2.hpp> #include <iostream> #include <vector> #include <sstream> #ifdef ENABLE_DEBUG #include <cstdarg> #endif #ifdef ENABLE_DEBUG const int LOG_BUFFER_SIZE = 2048; char log_buffer[LOG_BUFFER_SIZE]; inline void DLOG(const char* ctx, const char* format, ...) { va_list args; va_start(args, format); std::vsnprintf(log_buffer, LOG_BUFFER_SIZE-1, format, args); va_end(args); std::cerr << "[DEBUG] [" << ctx << "] " << log_buffer << std::endl; } #else #define DLOG(...) #endif CLSIMPLE_API void clsimple_listall(char* out, int len, bool utf16) noexcept { // Get the platform std::vector<cl::Platform> platforms; int ret_code = cl::Platform::get(&platforms); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl; return; } std::stringstream output_str; for (size_t i=0; i<platforms.size(); ++i) { auto& p = platforms[i]; std::string tmp_str; ret_code = p.getInfo(CL_PLATFORM_NAME, &tmp_str); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl; else output_str << "[" << i << "] Platform: " << tmp_str << std::endl; ret_code = p.getInfo(CL_PLATFORM_VENDOR, &tmp_str); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl; else output_str << "Vendor: " << tmp_str << std::endl; std::vector<cl::Device> devices; ret_code = p.getDevices(CL_DEVICE_TYPE_ALL, &devices); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl; continue; } for (size_t j=0; j<devices.size(); ++j) { const auto& d = devices[j]; cl_device_type tmp_device_type; ret_code = d.getInfo(CL_DEVICE_NAME, &tmp_str); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl; else output_str << " -[" << j << "] Device name: " << tmp_str << std::endl; ret_code = d.getInfo(CL_DEVICE_TYPE, &tmp_device_type); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl; else { if (tmp_device_type & CL_DEVICE_TYPE_GPU) output_str << " -Type: GPU" << std::endl; else if (tmp_device_type & CL_DEVICE_TYPE_CPU) output_str << " -Type: CPU" << std::endl; else if (tmp_device_type & CL_DEVICE_TYPE_ACCELERATOR) output_str << " -Type: Accelerator" << std::endl; else output_str << " -Type: Unknown" << std::endl; } } } // keep a copy of the string from stringstream std::string copy_str = output_str.str(); if (utf16) util::str_to_cstr_u16(copy_str, out, len); else util::str_to_cstr(copy_str, out, len); } CLSIMPLE_API int clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem) noexcept { cl_int ret_code = CL_SUCCESS; // Get the platform std::vector<cl::Platform> platforms; ret_code = cl::Platform::get(&platforms); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl; return ret_code; } DLOG(__FUNCTION__, "found %d platform(s)", platforms.size()); if (platforms.empty()) { std::cerr << "Error found 0 platform." << std::endl; return CL_DEVICE_NOT_FOUND; // reuse this error value } cl::Platform platform = platforms[0]; DLOG(__FUNCTION__, "%s", "passed getting platforms"); // Get the device std::vector<cl::Device> devices; ret_code = platform.getDevices(CL_DEVICE_TYPE_GPU, &devices); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl; return ret_code; } DLOG(__FUNCTION__, "found %d GPU device(s)", devices.size()); if (devices.empty()) { std::cerr << "Error found 0 device." << std::endl; return CL_DEVICE_NOT_FOUND; } cl::Device device = devices[0]; DLOG(__FUNCTION__, "%s", "passed getting a GPU device"); // Create the context cl::Context context(device); DLOG(__FUNCTION__, "%s", "passed creating a context"); // Create the command queue cl::CommandQueue queue(context, device); DLOG(__FUNCTION__, "%s", "passed creating command queue"); // Create the kernel std::string kernelCode = "__kernel void add(__global int* a, __global int* b, __global int* c, int size) { " " int i = get_global_id(0);" " if (i < size)" " c[i] = a[i] + b[i];" "}"; cl::Program::Sources sources; sources.push_back({kernelCode.c_str(), kernelCode.length()}); cl::Program program(context, sources); ret_code = program.build({device}); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Program::build(), code=" << ret_code << std::endl; return ret_code; } DLOG(__FUNCTION__, "%s", "passed building a kernel program"); cl::Kernel kernel(program, "add"); DLOG(__FUNCTION__, "%s", "passed adding kernel function"); // Create buffers cl::Buffer buffer_a(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_1)); cl::Buffer buffer_b(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_2)); cl::Buffer buffer_c(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, arr_3); kernel.setArg(0, buffer_a); kernel.setArg(1, buffer_b); kernel.setArg(2, buffer_c); kernel.setArg(3, num_elem); DLOG(__FUNCTION__, "%s", "passed setting all arguments"); // execute the kernel function // NOTE: this is a blocking call although enqueuing is async call but the current thread // will be blocked until he work is done. Work is done doesn't mean that the result buffer // will be written back at the same time. // ret_code = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(num_elem), cl::NullRange); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::CommandQueue::enqueueNDRangeKernel(), code=" << ret_code << std::endl; return ret_code; } // CL_TRUE to make it blocking call // it requires for moving data from device back to host // NOTE: Important to call this function to make sure the result is sent back to host. ret_code = queue.enqueueReadBuffer(buffer_c, CL_TRUE, 0, sizeof(int) * num_elem, arr_3); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::CommandQueue::enqueueReadBuffer(), code=" << ret_code << std::endl; return ret_code; }
There are 3 main parts to be looking at from the code above.
- DLOG utility for debug logging purpose
- clsimple_listall()
- util.h & util.cpp - string conversion utility making it ready to pass string from DLL to MQL5
- clsimple_compute()
DLOG
#ifdef ENABLE_DEBUG #include <cstdarg> #endif #ifdef ENABLE_DEBUG const int LOG_BUFFER_SIZE = 2048; char log_buffer[LOG_BUFFER_SIZE]; inline void DLOG(const char* ctx, const char* format, ...) { va_list args; va_start(args, format); std::vsnprintf(log_buffer, LOG_BUFFER_SIZE-1, format, args); va_end(args); std::cerr << "[DEBUG] [" << ctx << "] " << log_buffer << std::endl; } #else #define DLOG(...) #endif
This is a logging utility that will print out onto standard error output. There is #ifdef guard for conditional checking whenever we build the project with ENABLE_DEBUG supplied or not, if so then we include required header as well as DLOG() would now mean something, and usable. Otherwise, it is just nothing.
Log buffer is set with fixed size of 2048 bytes per call. We don't expect to have that long debug message, thus it's quite enough for us.
clsimple_listall()
A function to list all devices across all platforms. Those devices are available to be used with OpenCL.
It all starts with cl::Platform to get other information.
... // Get the platform std::vector<cl::Platform> platforms; int ret_code = cl::Platform::get(&platforms); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl; return; } ...
We will see this pattern of error handling a lot in this function. Firstly, we get a vector of platform. In case of error, we return with return code immediately as we cannot do anything further.
The success return code for working with OpenCL API is CL_SUCCESS. Notice that in case of error, we will always print error message out onto standard error.
Iterate all platforms and devices for each to get information we need
... std::stringstream output_str; for (size_t i=0; i<platforms.size(); ++i) { ... for (size_t j=0; j<devices.size(); ++j) { ... } } ...
This function bases on writing the string output to the specified c-string pointer. This means we will be using std::stringstream to avoid having to create a temporary std::string and copying operation every time we need to append a string to the current result we have.
cl::Platform is the starting point to get other information, each platform contains one or more cl::Device. So we have a double for-loop to do our work.
Inside the loop
... for (size_t i=0; i<platforms.size(); ++i) { auto& p = platforms[i]; // temporary variables to hold temporary platform/device informatin std::string tmp_str; ret_code = p.getInfo(CL_PLATFORM_NAME, &tmp_str); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl; else output_str << "[" << i << "] Platform: " << tmp_str << std::endl; ret_code = p.getInfo(CL_PLATFORM_VENDOR, &tmp_str); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Platform::getInfo(), code=" << ret_code << std::endl; else output_str << "Vendor: " << tmp_str << std::endl; std::vector<cl::Device> devices; ret_code = p.getDevices(CL_DEVICE_TYPE_ALL, &devices); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl; continue; } for (size_t j=0; j<devices.size(); ++j) { cl_device_type tmp_device_type; const auto& d = devices[j]; ret_code = d.getInfo(CL_DEVICE_NAME, &tmp_str); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl; else output_str << " -[" << j << "] Device name: " << tmp_str << std::endl; ret_code = d.getInfo(CL_DEVICE_TYPE, &tmp_device_type); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::Device::getInfo(), code=" << ret_code << std::endl; else { if (tmp_device_type & CL_DEVICE_TYPE_GPU) output_str << " -Type: GPU" << std::endl; else if (tmp_device_type & CL_DEVICE_TYPE_CPU) output_str << " -Type: CPU" << std::endl; else if (tmp_device_type & CL_DEVICE_TYPE_ACCELERATOR) output_str << " -Type: Accelerator" << std::endl; else output_str << " -Type: Unknown" << std::endl; } } } ...
Inside the loop, we have temporary variables namely tmp_str, and tmp_device_type to hold temporary information as acquiring from platform or device.
Along the way if something goes wrong, we print out error onto standard error, otherwise append the result string onto our output_str.
Notice that we also print out ordinal index number for both platform, and device. This can be useful if we want users to be specific in choosing which platform, and device to work with without us to find the right device every single time. This is an optional, and served as an idea for future expansion of the library.
Information that we are looking to get that is enough for making decision later to choose to use it with OpenCL later is as follows
- CL_PLATFORM_NAME - name of the platform
- CL_PLATFORM_VENDOR - name of the vendor e.g. AMD, Nvidia, the pocl project, etc
- CL_DEVICE_NAME - device name e.g. code name of GPU, or name of CPU
- CL_DEVICE_TYPE - device type e.g. GPU, CPU
There are whole bunch of information relating to platform, and device. Developers can take a peek at header file namely CL/CL2.h from location of your system's include path i.e. /usr/include/. Excerpted example as follows
Platform information
... /* cl_platform_info */ #define CL_PLATFORM_PROFILE 0x0900 #define CL_PLATFORM_VERSION 0x0901 #define CL_PLATFORM_NAME 0x0902 #define CL_PLATFORM_VENDOR 0x0903 #define CL_PLATFORM_EXTENSIONS 0x0904 #ifdef CL_VERSION_2_1 #define CL_PLATFORM_HOST_TIMER_RESOLUTION 0x0905 #endif ...
Device information
... /* cl_device_info */ #define CL_DEVICE_TYPE 0x1000 #define CL_DEVICE_VENDOR_ID 0x1001 #define CL_DEVICE_MAX_COMPUTE_UNITS 0x1002 #define CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 0x1003 #define CL_DEVICE_MAX_WORK_GROUP_SIZE 0x1004 #define CL_DEVICE_MAX_WORK_ITEM_SIZES 0x1005 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR 0x1006 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 0x1007 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 0x1008 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG 0x1009 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 0x100A #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 0x100B #define CL_DEVICE_MAX_CLOCK_FREQUENCY 0x100C #define CL_DEVICE_ADDRESS_BITS 0x100D #define CL_DEVICE_MAX_READ_IMAGE_ARGS 0x100E #define CL_DEVICE_MAX_WRITE_IMAGE_ARGS 0x100F #define CL_DEVICE_MAX_MEM_ALLOC_SIZE 0x1010 #define CL_DEVICE_IMAGE2D_MAX_WIDTH 0x1011 #define CL_DEVICE_IMAGE2D_MAX_HEIGHT 0x1012 #define CL_DEVICE_IMAGE3D_MAX_WIDTH 0x1013 #define CL_DEVICE_IMAGE3D_MAX_HEIGHT 0x1014 #define CL_DEVICE_IMAGE3D_MAX_DEPTH 0x1015 #define CL_DEVICE_IMAGE_SUPPORT 0x1016 #define CL_DEVICE_MAX_PARAMETER_SIZE 0x1017 #define CL_DEVICE_MAX_SAMPLERS 0x1018 #define CL_DEVICE_MEM_BASE_ADDR_ALIGN 0x1019 #define CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE 0x101A #define CL_DEVICE_SINGLE_FP_CONFIG 0x101B #define CL_DEVICE_GLOBAL_MEM_CACHE_TYPE 0x101C #define CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 0x101D #define CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 0x101E #define CL_DEVICE_GLOBAL_MEM_SIZE 0x101F #define CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE 0x1020 #define CL_DEVICE_MAX_CONSTANT_ARGS 0x1021 #define CL_DEVICE_LOCAL_MEM_TYPE 0x1022 #define CL_DEVICE_LOCAL_MEM_SIZE 0x1023 #define CL_DEVICE_ERROR_CORRECTION_SUPPORT 0x1024 #define CL_DEVICE_PROFILING_TIMER_RESOLUTION 0x1025 #define CL_DEVICE_ENDIAN_LITTLE 0x1026 #define CL_DEVICE_AVAILABLE 0x1027 #define CL_DEVICE_COMPILER_AVAILABLE 0x1028 #define CL_DEVICE_EXECUTION_CAPABILITIES 0x1029 #define CL_DEVICE_QUEUE_PROPERTIES 0x102A /* deprecated */ #ifdef CL_VERSION_2_0 #define CL_DEVICE_QUEUE_ON_HOST_PROPERTIES 0x102A #endif #define CL_DEVICE_NAME 0x102B #define CL_DEVICE_VENDOR 0x102C #define CL_DRIVER_VERSION 0x102D #define CL_DEVICE_PROFILE 0x102E #define CL_DEVICE_VERSION 0x102F #define CL_DEVICE_EXTENSIONS 0x1030 #define CL_DEVICE_PLATFORM 0x1031 #ifdef CL_VERSION_1_2 #define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032 #endif ...
Just that we would need to know which information would be useful to our use case.
Conversion of string into UTF-16 to be consumed by MetaTrader 5
... // keep a copy of the string from stringstream std::string copy_str = output_str.str(); if (utf16) util::str_to_cstr_u16(copy_str, out, len); else util::str_to_cstr(copy_str, out, len);
Lastly, string output from output_str needs to be converted into UTF-16 encoding due to MetaTrader 5 uses it to display text onto its Experts tab.
Now, it's good time to see how util::str_to_cstr and util::str_to_cstr_u16 implemented.
Note that util.h & util.cpp are never meant to be exposed and used by users of DLL. It's internally used within the library only. Thus it has no need to be compliant to C i.e. export "C" as it it the case for MetaTrader 5 when users consume exported functions from DLL.
util.h
#pragma once #include <string> namespace util { /** * Convert via copying from std::string to C-string. * * # Arguments * - str - input string * - out - destination c-string pointer to copy the content of string to * - len - length of string to copy from */ void str_to_cstr(const std::string& str, char* out, unsigned len); /** * Convert via copying from std::string to UTF-16 string. * * # Arguments * - str - input string * - out - destination c-string pointer to copy the content of converted string * of UTF-16 to * - len - length of string to copy from */ void str_to_cstr_u16(const std::string& str, char* out, unsigned len); };
Those functions will do the conversion if needed then copy to the destination c-string pointer whose buffer length is the specified one.
util.cpp
#include "util.h" #include <cuchar> #include <locale> #include <codecvt> #include <cstring> namespace util { /* converter of byte character to UTF-16 (2 bytes) */ std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t> ch_converter; void str_to_cstr(const std::string& str, char* out, unsigned len) { const char* str_cstr = str.c_str(); const size_t capped_len = strlen(str_cstr) <= (len-1) ? strlen(str_cstr) : (len-1); std::memcpy(out, str_cstr, capped_len+1); } void str_to_cstr_u16(const std::string& str, char* out, unsigned len) { const char* str_cstr = str.c_str(); std::u16string u16_str = ch_converter.from_bytes(str); const char16_t* u16_str_cstr = u16_str.c_str(); const size_t capped_len = strlen(str_cstr) <= (len-1) ? strlen(str_cstr) : (len-1); std::memcpy(out, u16_str_cstr, capped_len*2+1); } };
The lines caps the length of the string to be working with. If length of the string is less than the specified len, then just use the length of the string. Otherwise, use len-1.
We subtract by 1 to leave a room for a null-terminated character which will be added later in the next line.
The line multiplies by 2 as UTF-16 has double a size of normal character encoding as used by C++ program which is UTF-8.
The line constructs a converter for us which will convert
- from UTF-8 to UTF-16 via std::wstring_convert::from_bytes() function
- from UTF-16 to UTF-8 via std::wstring_convert::to_bytes() function
The two template arguments for std::wstring_convert<_Codecvt, _Elem> can be described as follows
- _Codecvt - the source character encoding to convert from
- _Elem - the target character encoding to convert to
Finally we use std::memcpy() to copy stream of bytes from the converted source string to the destination c-string pointer.
The following is the example output as tested on my machine.
We will revisit this fully at testing section.
Example of calling clsimple_listall() on MetaTrader 5
clsimple_compute()
Firstly, let's see the function signature.
CLSIMPLE_API int clsimple_compute(const int arr_1[], const int arr_2[], int arr_3[], int num_elem) noexcept { ... }
The aim is to abstract the code what we've done in the previous part of the series into a function. Fortunately for this case, most of the code can just be moved into a single function.
We defer fully or at most abstract the code for full-fledge development later. For now, we mostly test DLL to be working properly in full loop with MQL5 on MetaTrader 5.
Thus the code would be very similar.
clsimple_compute function accepts the following arguments
- arr_1 - array of read-only integer numbers
- arr_2 - array of read-only integer numbers
- arr_3 - output array for summation of arr_1 and arr_2
- num_elem - number of elements to process from both input array
The function returns the return code, not the result of the summation of both arrays.
In reality we could change it to float or double for the type of input/output array in order to accommodate the price of asset in floating-point format. But we go with simple concept in this implementation.
Get the platform
... cl_int ret_code = CL_SUCCESS; // Get the platform std::vector<cl::Platform> platforms; ret_code = cl::Platform::get(&platforms); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::get(), code=" << ret_code << std::endl; return ret_code; } DLOG(__FUNCTION__, "found %d platform(s)", platforms.size()); if (platforms.empty()) { std::cerr << "Error found 0 platform." << std::endl; return CL_DEVICE_NOT_FOUND; // reuse this error value } cl::Platform platform = platforms[0]; DLOG(__FUNCTION__, "%s", "passed getting platforms"); ...
What has been improved from previous part of the series is extensive error handling. We print out error message along with return error code.
The lines with DLOG() can be ignored but it is useful when we build with ENABLE_DEBUG for debugging purpose.
In this case, we hard-coded to use the first platform. But we can change the function to accept the ordinal index value of which platform to use based on string listing output if call with the first function clsimple_listall().
Get the device
... // Get the device std::vector<cl::Device> devices; ret_code = platform.getDevices(CL_DEVICE_TYPE_GPU, &devices); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Platform::getDevices(), code=" << ret_code << std::endl; return ret_code; } DLOG(__FUNCTION__, "found %d GPU device(s)", devices.size()); if (devices.empty()) { std::cerr << "Error found 0 device." << std::endl; return CL_DEVICE_NOT_FOUND; } cl::Device device = devices[0]; DLOG(__FUNCTION__, "%s", "passed getting a GPU device"); ...
In this case, we seek to find a GPU device from such platform only, and use the first one that found.
Create the context
... // Create the context cl::Context context(device); DLOG(__FUNCTION__, "%s", "passed creating a context"); ...
Create the command queue
... // Create the command queue cl::CommandQueue queue(context, device); DLOG(__FUNCTION__, "%s", "passed creating command queue"); ...
Create the kernel
... // Create the kernel std::string kernelCode = "__kernel void add(__global int* a, __global int* b, __global int* c, int size) { " " int i = get_global_id(0);" " if (i < size)" " c[i] = a[i] + b[i];" "}"; cl::Program::Sources sources; sources.push_back({kernelCode.c_str(), kernelCode.length()}); cl::Program program(context, sources); ret_code = program.build({device}); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::Program::build(), code=" << ret_code << std::endl; return ret_code; } DLOG(__FUNCTION__, "%s", "passed building a kernel program"); cl::Kernel kernel(program, "add"); DLOG(__FUNCTION__, "%s", "passed adding kernel function"); ...To creating a kernel function, we need to create cl::Program from source string in which we need to construct cl::Program::Sources, then feed it as part of parameters of cl::Kernel's constructor.
Create buffers
... // Create buffers cl::Buffer buffer_a(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_1)); cl::Buffer buffer_b(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, const_cast<int*>(arr_2)); cl::Buffer buffer_c(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * num_elem, arr_3); ...
There are 3 buffers.
- buffer_a - first input array. It's read-only, and allocated on the host that allows access from device as well.
- buffer_b - second input array. Same as 1.
- buffer_c - resultant array. It's write-only, and allocated on the host that allows access from device as well.
You can refer to the meaning of flags used in creating OpenCL buffer in the previous part of the series.
Notice that for arr_1, and arr_2 we do const_cast<int*> to remove const out from such variable. This is OK due to we receive const variables into the function. That ensures users that we won't modify anything to them.
But due to constructor of cl::Buffer that requires to pass in pointer of a certain type, we need to satisfy it. So we trust such constructor to not modify anything. It should behave.
Set argument to kernel function
... kernel.setArg(0, buffer_a); kernel.setArg(1, buffer_b); kernel.setArg(2, buffer_c); kernel.setArg(3, num_elem); DLOG(__FUNCTION__, "%s", "passed setting all arguments"); ...
Set arguments properly according to the kernel function signature as seen in OpenCL kernel code above.
Execute the kernel function, and wait for result to be written back
... // execute the kernel function // NOTE: this is a blocking call although enqueuing is async call but the current thread // will be blocked until he work is done. Work is done doesn't mean that the result buffer // will be written back at the same time. // ret_code = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(num_elem), cl::NullRange); if (ret_code != CL_SUCCESS) { std::cerr << "Error cl::CommandQueue::enqueueNDRangeKernel(), code=" << ret_code << std::endl; return ret_code; } // CL_TRUE to make it blocking call // it requires for moving data from device back to host // NOTE: Important to call this function to make sure the result is sent back to host. ret_code = queue.enqueueReadBuffer(buffer_c, CL_TRUE, 0, sizeof(int) * num_elem, arr_3); if (ret_code != CL_SUCCESS) std::cerr << "Error cl::CommandQueue::enqueueReadBuffer(), code=" << ret_code << std::endl; return ret_code;
Text indicates the global dimension to be used for such kernel to be executed. In this case, it is the number of elements of array input. We specify cl::NullRange for local dimension to let OpenCL automatically determines the value for us.
It's important to make a call to a function highlighted with red as we need to wait for output (as now stored on the device e.g. GPU) to be written back to the host (our machine). Ignore to do this, we may have a chance that result is not ready to be read yet after returning from this function.
Notice that such function call is blocking-call as we specified with CL_TRUE.
Makefile
.PHONY: all clean openclsimple.dll main.exe COMPILER := x86_64-w64-mingw32-g++-posix FLAGS := -O2 -fno-rtti -std=c++17 -Wall -Wextra MORE_FLAGS ?= all: openclsimple.dll main.exe cp -afv $< ~/.mt5/drive_c/Program\ Files/MetaTrader\ 5/MQL5/Libraries/ openclsimple.dll: util.o openclsimple.o @# check if symbolic link file to wine's opencl.dll exists, if not then create one test -h opencl.dll && echo "opencl.dll exists, no need to create symbolic link again" || ln -s ~/.mt5/drive_c/windows/system32/opencl.dll ./opencl.dll $(COMPILER) -shared $(FLAGS) $(MORE_FLAGS) -fPIC -o $@ $^ -L. -lopencl openclsimple.o: openclsimple.cpp openclsimple.h $(COMPILER) $(FLAGS) $(MORE_FLAGS) -DCLSIMPLE_API_EXPORT -DWINDOWS -I. -fPIC -o $@ -c $< util.o: util.cpp util.h $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ -c $< main.exe: main.cpp openclsimple.dll $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -DWINDOWS -o $@ $< -L. -lopenclsimple clean: rm -f openclsimple.dll main.exe opencl.dll util.o openclsimple.o
Line indicates a technique to not print out a comment line when we build by prefixing the comment line with @.
Line indicates an improvement over Makefile we've done in previous article of the series. Now instead of creating a symlink file pointing to opencl.dll located at wine's prefix i.e. the place of installation of MetaTrader 5 which is at ~/.mt5 in which the problem lies in different user name as part of the home directory path, we dynamically and newly create a symlink file every time user builds. So the symlink file will point to the correct path according to their username and home directory without a need to overwriting a path as pointed by symlink file we packaged and delivered to user.
Line indicates that we copy the resultant DLL file namely openclsimple.dll to the location of Libraries/ that would be used by MetaTrader 5 to find DLLs in run-time. This saves us ton of time during development.
Makefile-g++
.PHONY: all clean libopenclsimple.so main.exe COMPILER := g++ FLAGS := -O2 -fno-rtti -std=c++17 -Wall -Wextra MORE_FLAGS ?= all: libopenclsimple.so main.out libopenclsimple.so: util.o openclsimple.o @# NOTE: notice capital letters in -lOpenCL $(COMPILER) -shared $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ $^ -lOpenCL openclsimple.o: openclsimple.cpp openclsimple.h util.h $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ -c $< util.o: util.cpp util.h $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -fPIC -o $@ -c $< main.out: main.cpp libopenclsimple.so $(COMPILER) $(FLAGS) $(MORE_FLAGS) -I. -o $@ $< -L. -lopenclsimple clean: rm -f libopenclsimple.so main.out util.o openclsimple.o
Similarly for Makefile-g++ which intended to be used natively on Linux system for quick testing purpose. Content is similar but with notable difference is that we will be linking with OpenCL library as installed on the system. Its name is different compared to Windows.
Testing on Linux and for Windows (via Wine)
We have build system ready. Everything is ready for us to at least test natively on Linux, and on Windows (via Wine).
Linux
Execute the following command
make -f Makefile-g++
We will have the following output files
- libopenclsimple.so
- main.out
We can execute the test program with the following command
./main.out
We shall see output similar to the following
Output from testing main.out on Linux as built from Makefile-g++
The output is correct as I have no on-board GPU, but a graphics card, and of course I do have CPU.
Windows (via Wine)
Execute the following command
make
We will have the following output files
- openclsimple.dll
- main.exe
We can execute the test program with the following command
WINEPREFIX=~/.mt5 wine ./main.exe
We shall see output similar to the following
Output from testing main.exe for Windows (via Wine)
The reason we always use WINEPREFIX=~/.mt5 is because that is the wine's prefix where MetaTrader 5 is installed by default. So we test on the same environment as MetaTrader 5 would be running.
Same output as previously tested on Linux.
Readers can further take output files built from Makefile to test natively on Windows. It would work and output similar result. This is left as exercise to readers.
Testing with MetaTrader 5
We are ready to test with MQL5 on MetaTrader 5 now.
mql5/OpenCLSimple.mqh
//+------------------------------------------------------------------+ //| OpenCLX.mqh | //| Copyright 2022, haxpor. | //| https://wasin.io | //+------------------------------------------------------------------+ #property copyright "Copyright 2022, haxpor." #property link "https://wasin.io" #import "openclsimple.dll" void clsimple_listall(string& out, int len); int clsimple_compute(const int& arr_1[], const int& arr_2[], int& arr_3[], int num_elem); #import
Notice highlighted text below from the function signature of clsimple_listall() as exposed from DLL, the function itself has 3 arguments
CLSIMPLE_API void clsimple_listall(char* out, int len, bool utf16=true) noexcept;
We don't need to include utf16 argument in .mqh file because as per usage with MQL5, we always set such argument to true as we need to convert string to UTF-16 to be printable onto Experts tab of MetaTrader 5.
Defining only first two parameters is enough.
mql5/TestCLSimple.mq5
//+------------------------------------------------------------------+ //| TestOpenCLX.mq5 | //| Copyright 2022, haxpor. | //| https://wasin.io | //+------------------------------------------------------------------+ #property copyright "Copyright 2022, haxpor." #property link "https://wasin.io" #property version "1.00" #include "OpenCLSimple.mqh" #define STR_BUFFER_LEN 2048 //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ void OnStart() { // 1: test clsimple_listall() // construct a string to hold resultant of platforms/devices listing string listall_str; StringInit(listall_str, STR_BUFFER_LEN, '\0'); // get platforms/devices and print the result clsimple_listall(listall_str, STR_BUFFER_LEN); Print(listall_str); // 2: test clsimple_compute() int arr_1[]; int arr_2[]; int arr_3[]; ArrayResize(arr_1, 10000000); ArrayFill(arr_1, 0, ArraySize(arr_1), 1); ArrayResize(arr_2, 10000000); ArrayFill(arr_2, 0, ArraySize(arr_2), 1); ArrayResize(arr_3, 10000000); uint start_time = GetTickCount(); int ret_code = clsimple_compute(arr_1, arr_2, arr_3, ArraySize(arr_1)); if (ret_code != 0) { Print("Error occurs, code=", ret_code); return; } Print("Elapsed time: " + (string)(GetTickCount() - start_time) + " ms"); bool is_valid = true; for (int i=0; i<ArraySize(arr_3); ++i) { if (arr_3[i] != 2) { Print("Something is wrong at index=" + (string)i); is_valid = false; } } if (is_valid) { Print("Passed test"); } }
Notice that in order to receive a string output from DLL (returned as c-string pointer by copying its buffer), we need to define a string variable and initialize its capacity for maximum length we would be supporting.
The preparation to call clsimple_compute() would need a little more effort. We need to declare arrays of integers input, fill them with proper values, and declare an array of integers used for output. Anyway in reality, we would be reading such input data tick by tick from the asset's price and we just need to clean or prepare data on top of that slightly more before supplying them as part of arguments whenever we call clsimple_compute().
Finally we validate the result by checking value of each element in output array. If all things went well, it will print out
Passed test
So place .mqh into a proper location either at the same location as .mq5 or in Includes/ directory of MetaTrader 5 installation path, then compile such .mq5 source, and finally drag-and-drop the the built program onto a chart on MetaTrader 5.
We would see the following similar output as seen on Experts tab.
Output from testing MQL5 (a Script type) program on MetaTrader 5.
Notice that text is shown properly thanks to our working string conversion utility.
Download Source code
Readers can download the source code from zip file at the bottom-most of this article, or via github repository at github.com/haxpor/opencl-simple (look at simple/ directory, standalonetest/ is for the previous part of the series).
Continue next part...
This part 2 of the whole series just walked us through abstracting away the previous work we've done into a proper library implementation, properly build a DLL that can be consumed by both normal C++ program on Linux, Windows (via Wine, or natively), and MQL5 on MetaTrader 5.
It also emphasizes on how to properly pass a string from DLL to MQL5 program as we need to convert it to UTF-16 encoding as used by MetaTrader 5 itself for displaying at least on Experts tab. If string displayed on MetaTrader 5 is correct, then we know we have done things correctly.
Next part in the series, we will dive deeply into OpenCL C++ API to develop a full-fledge feature of OpenCL support as DLL to be used with MQL5.
Along the process until that time, we will understand and know the requirements for us to be at most working efficiently with OpenCL API, and thus transfer such knowledge into developing a high performance MQL5 program powered by OpenCL.
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use