Регрессионные модели библиотеки Scikit-learn и их экспорт в ONNX

MetaQuotes | 8 ноября, 2023

ONNX (Open Neural Network Exchange) представляет собой формат для описания и обмена моделями машинного обучения, обеспечивая возможность переноса моделей между различными фреймворками машинного обучения. В глубоком обучении и нейронных сетях часто используются числа типа данных float32. Они широко применяются, поскольку обычно обеспечивают приемлемую точность и эффективность для обучения моделей глубокого обучения.

Некоторые классические модели машинного обучения трудно представить в виде ONNX-операторов, поэтому для их реализации в ONNX были введены дополнительные ML-операторы (ai.onnx.ml). Следует отметить, что согласно спецификации ONNX, ключевые операторы из этого набора (LinearRegressor, SVMRegressor, TreeEnsembleRegressor) могут принимать различные типы входных данных(tensor(float), tensor(double), tensor(int64), tensor(int32)), но на выходе всегда возвращают тип tensor(float). При этом параметризация этих операторов также производится при помощи чисел float, что может ограничивать точность вычислений, особенно если для определения параметров оригинальной модели использовались числа double.

Это может привести к потере точности при преобразовании моделей или использовании различных типов данных в процессе конвертации и обработки данных в ONNX. Здесь многое зависит от конвертера, как мы увидим далее, для некоторых моделей удается обойти эти ограничения и обеспечить полную переносимость ONNX-моделей и работать с ними в double без потери точности. Важно учитывать эти особенности при работе с моделями и их представлении в ONNX, особенно в случаях, когда точность представления данных имеет значение.

Scikit-learn является одной из наиболее популярных и широко используемых библиотек для машинного обучения в сообществе Python. Она предлагает широкий выбор алгоритмов, простой в использовании интерфейс и хорошую документацию. В предыдущей статье "Классификационные модели библиотеки Scikit-learn и их экспорт в ONNX" рассматривались модели классификации.

В данной статье мы рассмотрим применение регрессионных моделей пакета Scikit-learn, рассчитаем их параметры с точностью double для тестового набора данных, попробуем их сконвертировать в ONNX-формат для float и double и использовать полученные модели в программах на MQL5. Также мы сравним точность работы оригинальных моделей и их ONNX-версий для float и double. Кроме того, мы рассмотрим ONNX-представление регресионных моделей, это позволит лучше понять их внутреннее устройство и принцип работы.

Содержание

If it bothers you, welcome to contribute

На форуме разработчиков ONNX Runtime один из пользователей сообщил о том, что при исполнении модели через ONNX Runtime он столкнулся с ошибкой "[ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for the node LinearRegressor:LinearRegressor(1)":

Hi all, getting this error when trying to inferance a linear regression model. PLease help me resolve this.

Сообщение об ошибке "NOT_IMPLEMENTED : Could not find an implementation for the node LinearRegressor:LinearRegressor(1)" с форума разработчиков ONNX Runtime

Ответ разработчика:

It is because we only implemented it for float32, not float64. But your model needs float64.

See:
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/core/providers/cpu/ml/linearregressor.cc#L16

If it bothers you, welcome to contribute.

В ONNX-модели пользователя вызывается оператор ai.onnx.ml.LinearRegressor с double (float64), и сообщение об ошибке возникает из-за того, что в ONNX Runtime поддержка оператора LinearRegressor() с double не реализована.

Согласно спецификации оператора ai.onnx.ml.LinearRegressor, входной тип данных double возможен (T: tensor(float), tensor(double), tensor(int64), tensor(int32)), однако разработчики сознательно отказались от его реализации.

Причина в том, что на выходе всегда возвращается значение Y : tensor(float). Более того, расчетные параметры задаются в числах float (coefficients : list of floats, intercepts : list of floats).

Получается, что при расчетах в double этот оператор понижает точность до float и его реализация в расчетах на double имеет сомнительную ценность.

Описание оператора ai.onnx.ml.LinearRegressor

Таким образом, понижение точности до float в параметрах и выходном значении делают невозможной полноценную работу ai.onnx.ml.LinearRegressor с числами double (float64). Вероятно, по этой причине разработчики ONNXRuntime решили отказаться от его реализации для типа double.

Способ "добавления поддержки double" разработчики показали в комментариях к коду (выделено желтым).

В ONNX Runtime его расчет производится при помощи класса LinearRegressor (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/ml/linearregressor.h).

Параметры оператора coefficients_ и intercepts_ храняться в виде std::vector<float>:

#pragma once

#include "core/common/common.h"
#include "core/framework/op_kernel.h"
#include "core/util/math_cpuonly.h"
#include "ml_common.h"

namespace onnxruntime {
namespace ml {

class LinearRegressor final : public OpKernel {
 public:
  LinearRegressor(const OpKernelInfo& info);
  Status Compute(OpKernelContext* context) const override;

 private:
  int64_t num_targets_;
  std::vector<float> coefficients_;
  std::vector<float> intercepts_;
  bool use_intercepts_;
  POST_EVAL_TRANSFORM post_transform_;
};

}  // namespace ml
}  // namespace onnxruntime

Реализация расчета LinearRegressor (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/ml/linearregressor.cc)

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

#include "core/providers/cpu/ml/linearregressor.h"
#include "core/common/narrow.h"
#include "core/providers/cpu/math/gemm.h"

namespace onnxruntime {
namespace ml {

ONNX_CPU_OPERATOR_ML_KERNEL(
    LinearRegressor,
    1,
    // KernelDefBuilder().TypeConstraint("T", std::vector<MLDataType>{
    //                                            DataTypeImpl::GetTensorType<float>(),
    //                                            DataTypeImpl::GetTensorType<double>()}),
    KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    LinearRegressor);

LinearRegressor::LinearRegressor(const OpKernelInfo& info)
    : OpKernel(info),
      intercepts_(info.GetAttrsOrDefault<float>("intercepts")),
      post_transform_(MakeTransform(info.GetAttrOrDefault<std::string>("post_transform", "NONE"))) {
  ORT_ENFORCE(info.GetAttr<int64_t>("targets", &num_targets_).IsOK());
  ORT_ENFORCE(info.GetAttrs<float>("coefficients", coefficients_).IsOK());

  // use the intercepts_ if they're valid
  use_intercepts_ = intercepts_.size() == static_cast<size_t>(num_targets_);
}

// Use GEMM for the calculations, with broadcasting of intercepts
// https://github.com/onnx/onnx/blob/main/docs/Operators.md#Gemm
//
// X: [num_batches, num_features]
// coefficients_: [num_targets, num_features]
// intercepts_: optional [num_targets].
// Output: X * coefficients_^T + intercepts_: [num_batches, num_targets]
template <typename T>
static Status ComputeImpl(const Tensor& input, ptrdiff_t num_batches, ptrdiff_t num_features, ptrdiff_t num_targets,
                          const std::vector<float>& coefficients,
                          const std::vector<float>* intercepts, Tensor& output,
                          POST_EVAL_TRANSFORM post_transform,
                          concurrency::ThreadPool* threadpool) {
  const T* input_data = input.Data<T>();
  T* output_data = output.MutableData<T>();

  if (intercepts != nullptr) {
    TensorShape intercepts_shape({num_targets});
    onnxruntime::Gemm<T>::ComputeGemm(CBLAS_TRANSPOSE::CblasNoTrans, CBLAS_TRANSPOSE::CblasTrans,
                                      num_batches, num_targets, num_features,
                                      1.f, input_data, coefficients.data(), 1.f,
                                      intercepts->data(), &intercepts_shape,
                                      output_data,
                                      threadpool);
  } else {
    onnxruntime::Gemm<T>::ComputeGemm(CBLAS_TRANSPOSE::CblasNoTrans, CBLAS_TRANSPOSE::CblasTrans,
                                      num_batches, num_targets, num_features,
                                      1.f, input_data, coefficients.data(), 1.f,
                                      nullptr, nullptr,
                                      output_data,
                                      threadpool);
  }

  if (post_transform != POST_EVAL_TRANSFORM::NONE) {
    ml::batched_update_scores_inplace(gsl::make_span(output_data, SafeInt<size_t>(num_batches) * num_targets),
                                      num_batches, num_targets, post_transform, -1, false, threadpool);
  }
  return Status::OK();
}

Status LinearRegressor::Compute(OpKernelContext* ctx) const {
  Status status = Status::OK();

  const auto& X = *ctx->Input<Tensor>(0);
  const auto& input_shape = X.Shape();

  if (input_shape.NumDimensions() > 2) {
    return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Input shape had more than 2 dimension. Dims=",
                           input_shape.NumDimensions());
  }

  ptrdiff_t num_batches = input_shape.NumDimensions() <= 1 ? 1 : narrow<ptrdiff_t>(input_shape[0]);
  ptrdiff_t num_features = input_shape.NumDimensions() <= 1 ? narrow<ptrdiff_t>(input_shape.Size())
                                                            : narrow<ptrdiff_t>(input_shape[1]);
  Tensor& Y = *ctx->Output(0, {num_batches, num_targets_});
  concurrency::ThreadPool* tp = ctx->GetOperatorThreadPool();

  auto element_type = X.GetElementType();

  switch (element_type) {
    case ONNX_NAMESPACE::TensorProto_DataType_FLOAT: {
      status = ComputeImpl<float>(X, num_batches, num_features, narrow<ptrdiff_t>(num_targets_), coefficients_,
                                  use_intercepts_ ? &intercepts_ : nullptr,
                                  Y, post_transform_, tp);

      break;
    }
    case ONNX_NAMESPACE::TensorProto_DataType_DOUBLE: {
      // TODO: Add support for 'double' to the scoring functions in ml_common.h
      // once that is done we can just call ComputeImpl<double>...
      // Alternatively we could cast the input to float.
    }
    default:
      status = ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Unsupported data type of ", element_type);
  }

  return status;
}

}  // namespace ml
}  // namespace onnxruntime

Получается, что есть вариант использовать числа double как входные значения и расчет оператора производить c float параметрами. Также можно было бы понизить точность входных данных до float. Однако все эти варианты нельзя считать нормальным решением.

Спецификация оператора ai.onnx.ml.LinearRegressor ограничивает возможность полноценной работы с числами double, поскольку параметры и выходное значение ограничены типом float.

Аналогичная ситуация и с другими ML-операторами ONNX, например ai.onnx.ml.SVMRegressor и ai.onnx.ml.TreeEnsembleRegressor.

Таким образом, все разработчики сред исполнения ONNX-моделей в double сталкиваются с этим ограничением спецификации. Вариантом решения может быть расширение спецификации ONNX (или добавление аналогичных операторов LinearRegressor64, SVMRegressor64 и TreeEnsembleRegressor64 с параметрами и выходными значениями double), однако пока эта проблема остается нерешенной.

Здесь многое зависит от конвертера в ONNX, для моделей с расчетом в double лучше отказаться от использования этих операторов (однако это возможно не всегда), в данном случае с моделью пользователя конвертер в ONNX сработал не лучшим образом.

Как мы увидим далее, конвертеру sklearn-onnx удается обойти ограничение LinearRegressor: для ONNX-моделей double вместо него используются ONNX-операторы MatMul() и Add(). Благодаря такому способу множество регрессионных моделей библиотеки Scikit-learn удается успешно конвертировать в ONNX-модели с расчетом в double, и эти модели показывают сохранение точности оригинальных моделей в double.

1. Тестовый набор данных

Для запуска примеров понадобится установить Python (мы использовали версию 3.10.8), дополнительные библиотеки (pip install -U scikit-learn numpy matplotlib onnx onnxruntime skl2onnx) и указать путь к Python в редакторе MetaEditor (в меню Tools->Options->Compilers->Python).

В качестве тестового набора данных будем использовать сгенерированные значения функции y = 4*X + 10*sin(X*0.5).

Для отображения графика подобной функции откроем MetaEditor, создадим файл RegressionData.py, скопируем текст скрипта и запустим его, нажав на кнопку "Compile".

Скрипт отображения тестового набора данных

# RegressionData.py
# The code plots the synthetic data, used for all regression models
# Copyright 2023, MetaQuotes Ltd.
# https://mql5.com

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

# set the figure size
plt.figure(figsize=(8,5))

# plot the initial data for regression
plt.scatter(X, y, label='Regression Data', marker='o')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Regression data')
plt.show()

В результате будет выведен график функции, которую мы будем использовать для проверки методов регрессии.

Рис.1. Функция для тестирования методов регрессии

2. Модели регрессии

Цель задачи регрессии заключается в поиске такой математической функции или модели, которая наилучшим образом описывает зависимость между признаками и целевой переменной, чтобы предсказать числовое значение для новых данных. Это позволяет сделать прогнозы, оптимизировать решения и принимать информированные решения на основе данных.

Рассмотрим основные модели регресии пакета scikit-learn.

2.0. Список моделей регрессии пакета Scikit-learn

Для вывода списка доступных моделей регрессии Scikit-learn можно использовать скрипт:

# ScikitLearnRegressors.py
# The script lists all the regression algorithms available inb scikit-learn
# Copyright 2023, MetaQuotes Ltd.
# https://mql5.com

# print Python version
from platform import python_version  
print("The Python version is ", python_version()) 

# print scikit-learn version
import sklearn
print('The scikit-learn version is {}.'.format(sklearn.__version__))

# print scikit-learn regression models
from sklearn.utils import all_estimators

regressors = all_estimators(type_filter='regressor')
for index, (name, RegressorClass) in enumerate(regressors, start=1):
    print(f"Regressor {index}: {name}")

Результат:

The Python version is 3.10.8
The scikit-learn version is 1.3.2.
Regressor 1: ARDRegression
Regressor 2: AdaBoostRegressor
Regressor 3: BaggingRegressor
Regressor 4: BayesianRidge
Regressor 5: CCA
Regressor 6: DecisionTreeRegressor
Regressor 7: DummyRegressor
Regressor 8: ElasticNet
Regressor 9: ElasticNetCV
Regressor 10: ExtraTreeRegressor
Regressor 11: ExtraTreesRegressor
Regressor 12: GammaRegressor
Regressor 13: GaussianProcessRegressor
Regressor 14: GradientBoostingRegressor
Regressor 15: HistGradientBoostingRegressor
Regressor 16: HuberRegressor
Regressor 17: IsotonicRegression
Regressor 18: KNeighborsRegressor
Regressor 19: KernelRidge
Regressor 20: Lars
Regressor 21: LarsCV
Regressor 22: Lasso
Regressor 23: LassoCV
Regressor 24: LassoLars
Regressor 25: LassoLarsCV
Regressor 26: LassoLarsIC
Regressor 27: LinearRegression
Regressor 28: LinearSVR
Regressor 29: MLPRegressor
Regressor 30: MultiOutputRegressor
Regressor 31: MultiTaskElasticNet
Regressor 32: MultiTaskElasticNetCV
Regressor 33: MultiTaskLasso
Regressor 34: MultiTaskLassoCV
Regressor 35: NuSVR
Regressor 36: OrthogonalMatchingPursuit
Regressor 37: OrthogonalMatchingPursuitCV
Regressor 38: PLSCanonical
Regressor 39: PLSRegression
Regressor 40: PassiveAggressiveRegressor
Regressor 41: PoissonRegressor
Regressor 42: QuantileRegressor
Regressor 43: RANSACRegressor
Regressor 44: RadiusNeighborsRegressor
Regressor 45: RandomForestRegressor
Regressor 46: RegressorChain
Regressor 47: Ridge
Regressor 48: RidgeCV
Regressor 49: SGDRegressor
Regressor 50: SVR
Regressor 51: StackingRegressor
Regressor 52: TheilSenRegressor
Regressor 53: TransformedTargetRegressor
Regressor 54: TweedieRegressor
Regressor 55: VotingRegressor

Для удобства в этом списке моделей регрессии они выделены разными цветами. Модели, которые требуют базовых моделей, выделены серым цветом, в то время как остальные модели могут использоваться самостоятельно. Отметим, что зеленым цветом отмечены модели, которые удалось успешно экспортировать в формат ONNX, красным цветом отмечены модели, при конвертации которых в текущей версии scikit-learn 1.2.2 возникают ошибки. Синим цветом выделены методы, которые не подходят для рассматриваемой нами тестовой задачи.

Для анализа качества моделей используются метрики регрессии, это функции от истинных и предсказанных значений. В языке MQL5 доступно несколько различных метрик, подробнее о них в статье "Оценка ONNX-моделей при помощи регрессионных метрик".

В этой статье для сравнения качества различных моделей будут использоваться 3 метрики:

Коэффициент детерминации R-квадрат (R-squared, R2);
Средняя абсолютная ошибка (Mean Absolute Error, MAE);
Среднеквадратичная ошибка (Mean Squared Error, MSE).

2.1. Регрессионные модели библиотеки Scikit-learn, которые конвертируются в ONNX-модели float и double

В этом разделе приводятся модели регрессии, которые успешно конвертируются в ONNX в форматах float и double.

Все рассматриваемые далее регрессионные модели приведены в виде:

Описание модели, принцип работы, преимущества и ограничения
Скрипт на Python для создания модели, ее экспорта в ONNX-файлы в форматах float и double, и исполнения полученных моделей через ONNX Runtime в Python. Для оценки качества оригинальной и ONNX-моделей используются метрики R^2, MAE, MSE, рассчитываемые при помощи sklearn.metrics.
Скрипт MQL5 для исполнения ONNX-моделей (float и double) через ONNX Runtime, метрики рассчитываются через RegressionMetric().
ONNX-представление моделей в Netron для float и double.

2.1.1. sklearn.linear_model.ARDRegression

ARDRegression (Automatic Relevance Determination Regression) - это метод регрессии, который предназначен для решения задачи регрессии, при этом автоматически определяет важность (релевантность) признаков и устанавливает их веса в процессе обучения модели.

ARDRegression позволяет обнаруживать и использовать только наиболее важные признаки для построения регрессионной модели, что может быть полезным в случаях, когда имеется большое количество признаков.

Принцип работы ARDRegression:

Линейная регрессия: ARDRegression основана на линейной регрессии, где предполагается линейная зависимость между независимыми переменными (признаками) и целевой переменной.
Автоматическое определение важности признаков: Основное отличие ARDRegression заключается в том, что он автоматически определяет, какие признаки наиболее важны для предсказания целевой переменной. Это достигается путем введения априорных распределений (регуляризации) над весами которые позволяют модели автоматически установить нулевые веса для малозначимых признаков.
Оценка апостериорных вероятностей: ARDRegression вычисляет апостериорные вероятности для каждого признака, что позволяет определить их важность. Признаки с высокими апостериорными вероятностями считаются релевантными и получают ненулевые веса, в то время как признаки с низкими апостериорными вероятностями получают нулевые веса.
Снижение размерности: Таким образом, ARDRegression может привести к снижению размерности данных, удаляя незначимые признаки.

Преимущества ARDRegression:

Автоматическое определение важных признаков: Метод автоматически определяет и использует только наиболее важные признаки, что может улучшить производительность модели и снизить риск переобучения.
Устойчивость к мультиколлинеарности: ARDRegression хорошо справляется с мультиколлинеарностью, когда признаки сильно коррелированы между собой.

Ограничения ARDRegression:

Требует выбора априорных распределений: Выбор подходящих априорных распределений может потребовать экспериментов.
Вычислительная сложность: Обучение ARDRegression может быть вычислительно затратным, особенно для больших наборов данных.

ARDRegression - это метод регрессии, который автоматически определяет важность признаков и устанавливает их веса на основе апостериорных вероятностей. Этот метод полезен, когда требуется учитывать только важные признаки для построения регрессионной модели и снижения размерности данных.

2.1.1.1. Код создания модели ARDRegression и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.ARDRegression, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# ARDRegression.py
# The code demonstrates the process of training ARDRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ARDRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name="ARDRegression"
onnx_model_filename = data_path + "ard_regression"

# create an ARDRegression model
regression_model = ARDRegression()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)

print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)

print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Скрипт создает и обучает модель sklearn.linear_model.ARDRegression (оригинальная модель считается в double), затем производится экспорт модели в ONNX для float и double (ard_regression_float.onnx и ard_regression_double.onnx) и сравнивается точность ее работы.

Он также создает файлы ARDRegression_plot_float.png и ARDRegression_plot_double.png, позволяющие визуально оценить результаты работы ONNX-моделей для float и double (рис. 2-3).

Рис.2. Результат работы скрипта ARDRegression.py (float)

Рис.3. Результат работы скрипта ARDRegression.py (double)

Визуально ONNX-модели для float и double выглядят одинаково (рис.2-3), во вкладке Journal можно посмотреть детали:

Python  ARDRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891289
Python  
Python  ARDRegression ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ard_regression_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382627587808
Python  Mean Absolute Error: 6.347568283744705
Python  Mean Squared Error: 49.778160054267204
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  ARDRegression ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ard_regression_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891289
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

В данном примере оригинальная модель считалась в double, затем она была экспортирована в ONNX-модели ard_regression_float.onnx и ard_regression_double.onnx для float и double соответственно.

Если точность модели оценивать по порядку средней абсолютной ошибки (Mean Absolute Error), то точность ONNX-модели для float составляет не более 6 знаков после запятой, ONNX-модель с использованием double показала сохранение точности в 15 знаков после запятой в соответствии с точностью оригинальной модели.

Свойства ONNX-моделей можно посмотреть в MetaEditor (рис.4-5).

Рис.4. ONNX-модель ARDRegression (float) в MetaEditor

Рис.4. ONNX-модель ard_regression_float.onnx в MetaEditor

Рис.5. ONNX-модель ARDRegression (double) в MetaEditor

Рис.5. ONNX-модель ard_regression_double.onnx в MetaEditor

Сравнение float и double ONNX-моделей показывает, что в данном случае расчет ONNX-моделей ARDRegression происходит по-разному: для чисел float используется оператор LinearRegressor() из ONNX-ML, для чисел double используются ONNX-операторы MatMul(), Add() и Reshape().

Реализация модели в ONNX зависит от конвертера, в примерах для экспорта в ONNX будет использоваться функция skl2onnx.convert_sklearn() библиотеки skl2onnx.

2.1.1.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели ard_regression_float.onnx и ard_regression_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                ARDRegression.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ARDRegression"
#define   ONNXFilenameFloat  "ard_regression_float.onnx"
#define   ONNXFilenameDouble "ard_regression_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

ARDRegression (EURUSD,H1)       Testing ONNX float: ARDRegression (ard_regression_float.onnx)
ARDRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382627587808
ARDRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475682837447049
ARDRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781600542671896
ARDRegression (EURUSD,H1)       
ARDRegression (EURUSD,H1)       Testing ONNX double: ARDRegression (ard_regression_double.onnx)
ARDRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382628120845
ARDRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475680128537597
ARDRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781593489128795

Сравнение с оргинальной моделью double в Python:

Testing ONNX float: ARDRegression (ard_regression_float.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475682837447049
       
Testing ONNX double: ARDRegression (ard_regression_double.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475680128537597

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.1.3. ONNX-представление моделей ard_regression_float.onnx и ard_regression_double.onnx

Netron (web-версия) - это инструмент для визуализации моделей и анализа графов вычислений, который может использоваться для моделей формата ONNX (Open Neural Network Exchange).

Netron представляет графы моделей и их архитектуру в наглядной и интерактивной форме, позволяя исследовать структуру и параметры моделей глубокого обучения, включая модели, созданные с использованием ONNX.

Основные возможности Netron включают:

Визуализацию графов: Netron отображает архитектуру модели в виде графа, позволяя вам видеть слои, операции и связи между ними. Вы можете легко понять структуру и поток данных внутри модели.
Интерактивное исследование: Вы можете выбирать узлы в графе, чтобы получить дополнительную информацию о каждом операторе и его параметрах.
Поддержка различных форматов: Netron поддерживает множество форматов моделей глубокого обучения, включая ONNX, TensorFlow, PyTorch, CoreML и другие.
Возможность анализа параметров: Вы можете просматривать параметры и веса модели, что полезно для понимания, какие значения используются в различных частях модели.

Netron удобен для разработчиков и исследователей в области машинного обучения и глубокого обучения, так как он облегчает визуализацию и анализ моделей, что помогает в понимании и отладке сложных нейронных сетей.

Этот инструмент позволяет быстро посмотреть модели, исследовать их структуру и параметры, что упрощает работу с глубокими нейронными сетями.

Подробнее про Netron в статьях: Visualizing your Neural Network with Netron и Visualize Keras Neural Networks with Netron.

Видео про Netron:

ONNX-модель ard_regression_float.onnx представлена на рис.6:

Рис.6. ONNX-представление модели ARDRegression (float) в Netron

Рис.6. ONNX-представление модели ard_regression_float.onnx в Netron

ONNX-оператор ai.onnx.ml LinearRegressor() представляет собой часть стандарта ONNX, который описывает модель для задачи регрессии. Этот оператор применяется для выполнения регрессии, то есть предсказания числовых значений (непрерывных) на основе входных признаков.

Он принимает на вход параметры модели, такие как веса и смещение, а также входные признаки, и выполняет линейную регрессию. Линейная регрессия оценивает параметры (веса) для каждого входного признака и затем выполняет линейную комбинацию этих признаков с весами для получения предсказания.

Этот оператор выполняет следующие шаги:

Принимает веса и смещение модели, а также входные признаки.
Для каждого примера входных данных выполняет линейную комбинацию весов с соответствующими признаками.
Добавляет смещение к полученному значению.

Результат представляет собой прогноз целевой переменной в задаче регрессии.

Параметры оператора LinearRegressor() представлены на рис.7.

Рис.7. Свойства оператора LinearRegressor модели ARDRegression (float) в Netron

Рис.7. Свойства оператора LinearRegressor() модели ard_regression_float.onnx в Netron

ONNX-модель ard_regression_double.onnx изображена на рис.8:

Рис.8. ONNX-представление модели ARDRegression (double) в Netron

Рис.8. ONNX-представление модели ard_regression_double.onnx в Netron

Параметры ONNX-операторов MatMul(), Add() и Reshape() приведены на рис.9-11.

Рис.9. Свойства оператора MatMul модели ARDRegression (double) в Netron

Рис.9. Свойства оператора MatMul модели ard_regression_double.onnx в Netron

ONNX-оператор MatMul (матричное умножение) выполняет умножение двух матриц.

Он принимает два входа: две матрицы и возвращает их матричное произведение.

Если у вас есть две матрицы A и B, то результат Matmul(A, B) - это матрица C, где каждый элемент C[i][j] вычисляется как сумма произведений элементов строки i матрицы A на элементы столбца j матрицы B.

Рис.10. Свойства оператора Add модели ARDRegression (double) в Netron

Рис.10. Свойства оператора Add модели ard_regression_double.onnx в Netron

ONNX-оператор Add (сложение) выполняет покомпонентное сложение двух тензоров или массивов одинаковой формы.

Он принимает два входа и возвращает результат, где каждый элемент результирующего тензора равен сумме соответствующих элементов входных тензоров.

Рис.11. Свойства оператора Reshape модели ARDRegression (double) в Netron

Рис.11. Свойства оператора Reshape модели ard_regression_double.onnx в Netron

ONNX-оператор Reshape(-1,1) используется для изменения формы (или размерности) входных данных. В этом операторе значение -1 для размерности означает, что размер данной размерности должен быть автоматически вычислен на основе остальных размерностей, чтобы обеспечить согласованность данных.

Значение 1 второй размерности указывает, что после преобразования формы каждый элемент будет иметь одну единственную подразмерность.

2.1.2. sklearn.linear_model.BayesianRidge

BayesianRidge - это метод регрессии, который использует байесовский подход для оценки параметров модели. Этот метод позволяет моделировать априорное распределение параметров и обновлять его с учетом данных, чтобы получить апостериорное распределение параметров.

BayesianRidge представляет собой байесовский метод регрессии, предназначенный для предсказания зависимой переменной на основе одной или нескольких независимых переменных.

Принцип работы BayesianRidge:

Априорное распределение параметров: Начинается с определения априорного распределения параметров модели. Это распределение представляет собой априорные знания или предположения о параметрах модели до учета данных. В случае BayesianRidge используются априорные распределения с Гауссовской формой.
Обновление распределения параметров: После того как априорное распределение параметров задано, оно обновляется на основе данных. Это делается с использованием метода байесовской теории, где апостериорное распределение параметров вычисляется с учетом данных. Важным аспектом является оценка гиперпараметров, которые влияют на форму апостериорного распределения.
Предсказание: После оценки апостериорного распределения параметров, можно делать предсказания для новых наблюдений. В результате получается распределение прогнозов, а не просто одно точечное значение, что позволяет учесть неопределенность в предсказаниях.

Преимущества BayesianRidge:

Учет неопределенности: BayesianRidge позволяет учитывать неопределенность в параметрах и предсказаниях модели. Вместо точечных прогнозов предоставляются доверительные интервалы.
Регуляризация: Байесовский метод регрессии может быть полезен для регуляризации моделей, что помогает предотвратить переобучение.
Автоматический отбор признаков: BayesianRidge может автоматически определять важность признаков, уменьшая веса незначительных признаков.

Ограничения BayesianRidge:

Вычислительная сложность: Метод требует вычислительных ресурсов для оценки параметров и вычисления апостериорного распределения.
Высокий уровень абстракции: Для понимания и использования BayesianRidge может потребоваться более глубокое понимание байесовской статистики.
Не всегда наилучший выбор: BayesianRidge может быть не самым подходящим методом в некоторых задачах регрессии, особенно если данных мало.

BayesianRidge полезен в задачах регрессии, где неопределенность параметров и предсказаний важна, и в случаях, когда требуется регуляризация модели.

2.1.2.1. Код создания модели BayesianRidge и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.BayesianRidge, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# BayesianRidge.py
# The code demonstrates the process of training BayesianRidge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "BayesianRidge"
onnx_model_filename = data_path + "bayesian_ridge"

# create a Bayesian Ridge regression model
regression_model = BayesianRidge()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ", compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  BayesianRidge Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891288
Python  
Python  BayesianRidge ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bayesian_ridge_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382627587808
Python  Mean Absolute Error: 6.347568283744705
Python  Mean Squared Error: 49.778160054267204
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  BayesianRidge ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bayesian_ridge_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891288
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.12. Результат работы скрипта BayesianRidge.py (float)

Рис.12. Результат работы скрипта BayesianRidge.py (float ONNX)

2.1.2.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели bayesian_ridge_float.onnx и bayesian_ridge_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                BayesianRidge.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "BayesianRidge"
#define   ONNXFilenameFloat  "bayesian_ridge_float.onnx"
#define   ONNXFilenameDouble "bayesian_ridge_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

BayesianRidge (EURUSD,H1)       Testing ONNX float: BayesianRidge (bayesian_ridge_float.onnx)
BayesianRidge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382627587808
BayesianRidge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475682837447049
BayesianRidge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781600542671896
BayesianRidge (EURUSD,H1)       
BayesianRidge (EURUSD,H1)       Testing ONNX double: BayesianRidge (bayesian_ridge_double.onnx)
BayesianRidge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382628120845
BayesianRidge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475680128537624
BayesianRidge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781593489128866

Сравнение с оригинальной моделью:

Testing ONNX float: BayesianRidge (bayesian_ridge_float.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475682837447049

Testing ONNX double: BayesianRidge (bayesian_ridge_double.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475680128537624

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.2.3. ONNX-представление моделей bayesian_ridge_float.onnx и bayesian_ridge_double.onnx

Рис.13. ONNX-представление модели bayesian_ridge_float.onnx в Netron

Рис.14. ONNX-представление модели bayesian_ridge_double.onnx в Netron

Примечание к методам ElasticNet и ElasticNetCV

ElasticNet и ElasticNetCV - это два связанных метода в машинном обучении, используемые для регуляризации моделей регрессии, особенно линейной регрессии. Они имеют общий функционал, но есть различия в способе использования и применения.

ElasticNet (Elastic Net Regression):

Принцип работы: ElasticNet - это метод регрессии, который объединяет в себе регуляризацию Лассо (L1-регуляризация) и регуляризацию Риджа (L2-регуляризация). Он добавляет к функции потерь два компонента регуляризации: один, который штрафует модель за большие абсолютные значения коэффициентов (как Лассо), и другой, который штрафует модель за большие квадраты коэффициентов (как Ридж).
Преимущества: ElasticNet обычно используется, когда в данных присутствует мультиколлинеарность (когда признаки сильно коррелированы) и когда необходимо сокращение размерности, а также контроль над значениями коэффициентов.

ElasticNetCV (Elastic Net Cross-Validation):

Принцип работы: ElasticNetCV - это расширение ElasticNet, которое включает в себя автоматический подбор оптимальных гиперпараметров альфа (коэффициент смешивания между L1 и L2 регуляризацией) и лямбда (сила регуляризации) с использованием кросс-валидации. Он выполняет перебор различных значений альфа и лямбда и выбирает комбинацию, которая дает наилучшую производительность на кросс-валидации.
Преимущества: ElasticNetCV автоматически настраивает параметры модели на основе кросс-валидации, что позволяет выбрать оптимальные значения гиперпараметров без необходимости ручной настройки. Это делает его более удобным для использования и помогает предотвратить переобучение модели.

Итак, основное отличие между ElasticNet и ElasticNetCV заключается в том, что ElasticNet - это сам метод регрессии, который применяется к данным, а ElasticNetCV - это инструмент, который автоматически находит оптимальные значения гиперпараметров для модели ElasticNet с использованием кросс-валидации. ElasticNetCV полезен, когда вам нужно найти лучшие параметры модели и сделать процесс настройки более автоматизированным.

2.1.3. sklearn.linear_model.ElasticNet

ElasticNet - это метод регрессии, который представляет собой комбинацию L1 (Lasso) и L2 (Ridge) регуляризации.

Этот метод используется для регрессии, то есть для предсказания числовых значений целевой переменной на основе набора признаков. ElasticNet помогает управлять переобучением и учитывает как L1, так и L2 штрафы на коэффициенты модели.

Принцип работы ElasticNet:

Входные данные: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Целевая функция: ElasticNet минимизирует функцию потерь, которая включает две компоненты - среднеквадратичное отклонение (MSE) и две регуляризации: L1 (Lasso) и L2 (Ridge).
Это означает, что целевая функция имеет следующий вид:
Целевая функция = MSE + α * L1 + β * L2
Где α и β - гиперпараметры, которые контролируют веса L1 и L2 регуляризации соответственно.
Подбор оптимальных α и β: Для нахождения наилучших значений α и β обычно используется метод перекрестной проверки (кросс-валидации). Это позволяет выбрать оптимальные значения, которые обеспечивают хороший баланс между уменьшением переобучения и сохранением важных признаков.
Обучение модели: ElasticNet обучает модель с учетом оптимальных α и β, минимизируя целевую функцию.
Предсказание: После обучения модели ElasticNet может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества ElasticNet:

Способность к отбору признаков: ElasticNet способен автоматически выбирать наиболее важные признаки, устанавливая веса при незначимых признаках на ноль (аналогично Lasso).
Управление переобучением: ElasticNet позволяет управлять переобучением благодаря регуляризации L1 и L2.
Работа с мультиколлинеарностью: Этот метод полезен при наличии мультиколлинеарности (высокой корреляции между признаками), так как L2-регуляризация может уменьшить влияние мультиколлинеарных признаков.

Ограничения ElasticNet:

Требуется настройка гиперпараметров α и β, что может быть нетривиальной задачей.
В зависимости от выбора параметров, ElasticNet может оставить слишком мало или слишком много признаков, что может повлиять на качество модели.

ElasticNet - это мощный метод регрессии, который может быть полезен в задачах, где важен отбор признаков и управление переобучением.

2.1.3.1. Код создания модели ElasticNet и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.ElasticNet, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# ElasticNet.py
# The code demonstrates the process of training ElasticNet model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ElasticNet
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ElasticNet"
onnx_model_filename = data_path + "elastic_net"

# create an ElasticNet model
regression_model = ElasticNet()

# fit the model to the data
regression_model.fit(X,y)

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  ElasticNet Original model (double)
Python  R-squared (Coefficient of determination): 0.9962377031744798
Python  Mean Absolute Error: 6.344394662876524
Python  Mean Squared Error: 49.78556489812415
Python  
Python  ElasticNet ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962377032416807
Python  Mean Absolute Error: 6.344395027824294
Python  Mean Squared Error: 49.78556400887057
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  6
Python  float ONNX model precision:  5
Python  
Python  ElasticNet ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962377031744798
Python  Mean Absolute Error: 6.344394662876524
Python  Mean Squared Error: 49.78556489812415
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.15. Результат работы скрипта ElasticNet.py (float ONNX)

2.1.3.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели elastic_net_double.onnx и elastic_net_float.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                   ElasticNet.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ElasticNet"
#define   ONNXFilenameFloat  "elastic_net_float.onnx"
#define   ONNXFilenameDouble "elastic_net_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

ElasticNet (EURUSD,H1)  Testing ONNX float: ElasticNet (elastic_net_float.onnx)
ElasticNet (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9962377032416807
ElasticNet (EURUSD,H1)  MQL5:   Mean Absolute Error: 6.3443950278242944
ElasticNet (EURUSD,H1)  MQL5:   Mean Squared Error: 49.7855640088705869
ElasticNet (EURUSD,H1)  
ElasticNet (EURUSD,H1)  Testing ONNX double: ElasticNet (elastic_net_double.onnx)
ElasticNet (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9962377031744798
ElasticNet (EURUSD,H1)  MQL5:   Mean Absolute Error: 6.3443946628765220
ElasticNet (EURUSD,H1)  MQL5:   Mean Squared Error: 49.7855648981241217

Сравнение с оригинальной моделью:

Testing ONNX float: ElasticNet (elastic_net_float.onnx)
Python  Mean Absolute Error: 6.344394662876524
MQL5:   Mean Absolute Error: 6.3443950278242944
  
Testing ONNX double: ElasticNet (elastic_net_double.onnx)
Python  Mean Absolute Error: 6.344394662876524
MQL5:   Mean Absolute Error: 6.3443946628765220

Точность MAE ONNX float: 5 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.3.3. ONNX-представление моделей elastic_net_float.onnx и elastic_net_double.onnx

Рис.16. ONNX-представление модели elastic_net_float.onnx в Netron

Рис.17. ONNX-представление модели elastic_net_double.onnx в Netron

2.1.4. sklearn.linear_model.ElasticNetCV

ElasticNetCV - это расширение метода ElasticNet, предназначенное для автоматического подбора оптимальных значений гиперпараметров α и β (L1 и L2 регуляризации) с использованием метода кросс-валидации.

Это позволяет находить наилучшее сочетание регуляризаций для модели ElasticNet без необходимости ручной настройки параметров.

Принцип работы ElasticNetCV:

Входные данные: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Определение диапазона α и β: Пользователь указывает диапазон значений α и β, которые следует рассмотреть в процессе оптимизации. Эти значения обычно выбираются на логарифмической шкале.
Разбиение данных: Набор данных разбивается на несколько фолдов для кросс-валидации. Каждый фолд используется как тестовый набор данных, в то время как остальные фолды используются для обучения.
Кросс-валидация: Для каждой комбинации α и β из указанного диапазона выполняется кросс-валидация. Модель ElasticNet обучается на обучающих данных, а затем оценивается на тестовых данных.
Оценка производительности: Для каждой комбинации α и β вычисляется средняя ошибка на тестовых наборах данных в рамках кросс-валидации.
Выбор оптимальных параметров: Находятся значения α и β, соответствующие минимальной средней ошибке, полученной в результате кросс-валидации.
Обучение модели с оптимальными параметрами: Модель ElasticNetCV обучается с использованием найденных оптимальных значений α и β.
Предсказание: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества ElasticNetCV:

Автоматический выбор гиперпараметров: ElasticNetCV автоматически находит оптимальные значения α и β, что упрощает настройку модели.
Предотвращение переобучения: Кросс-валидация помогает выбрать модель с хорошей обобщающей способностью.
Устойчивость к шуму: Этот метод устойчив к шумам в данных и может найти наилучшее сочетание регуляризаций, учитывая шум.

Ограничения ElasticNetCV:

Вычислительная сложность: Проведение кросс-валидации для большого диапазона параметров может потребовать значительного времени.
Оптимальные параметры зависят от выбора диапазона: Результаты могут зависеть от того, какой диапазон α и β был выбран, поэтому важно тщательно настраивать этот диапазон.

ElasticNetCV - это мощный инструмент для автоматической настройки регуляризации в модели ElasticNet и повышения ее производительности.

2.1.4.1. Код создания модели ElasticNetCV и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.ElasticNetCV, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# ElasticNetCV.py
# The code demonstrates the process of training ElasticNetCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ElasticNetCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ElasticNetCV"
onnx_model_filename = data_path + "elastic_net_cv"

# create an ElasticNetCV model
regression_model = ElasticNetCV()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  ElasticNetCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962137763338385
Python  Mean Absolute Error: 6.334487104423225
Python  Mean Squared Error: 50.10218299945999
Python  
Python  ElasticNetCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962137770260989
Python  Mean Absolute Error: 6.334486542922601
Python  Mean Squared Error: 50.10217383894468
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  ElasticNetCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962137763338385
Python  Mean Absolute Error: 6.334487104423225
Python  Mean Squared Error: 50.10218299945999
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

< Рис.18. Результат работы скрипта ElasticNetCV.py (float ONNX)

Рис.18. Результат работы скрипта ElasticNetCV.py (float ONNX)

2.1.4.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели elastic_net_cv_float.onnx и elastic_net_cv_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                 ElasticNetCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ElasticNetCV"
#define   ONNXFilenameFloat  "elastic_net_cv_float.onnx"
#define   ONNXFilenameDouble "elastic_net_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

ElasticNetCV (EURUSD,H1)        Testing ONNX float: ElasticNetCV (elastic_net_cv_float.onnx)
ElasticNetCV (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962137770260989
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3344865429226038
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Squared Error: 50.1021738389446938
ElasticNetCV (EURUSD,H1)        
ElasticNetCV (EURUSD,H1)        Testing ONNX double: ElasticNetCV (elastic_net_cv_double.onnx)
ElasticNetCV (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962137763338385
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3344871044232205
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Squared Error: 50.1021829994599983

Сравнение с оригинальной моделью:

Testing ONNX float: ElasticNetCV (elastic_net_cv_float.onnx)
Python  Mean Absolute Error: 6.334487104423225
MQL5:   Mean Absolute Error: 6.3344865429226038

Testing ONNX double: ElasticNetCV (elastic_net_cv_double.onnx)
Python  Mean Absolute Error: 6.334487104423225
MQL5:   Mean Absolute Error: 6.3344871044232205

Точность MAE ONNX float: 5 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.4.3. ONNX-представление моделей elastic_net_cv_float.onnx и elastic_net_cv_double.onnx

Рис.19. ONNX-представление модели elastic_net_cv_float.onnx в Netron

Рис.20. ONNX-представление модели elastic_net_cv_double.onnx в Netron

2.1.5. sklearn.linear_model.HuberRegressor

HuberRegressor - это метод машинного обучения, используемый для задачи регрессии, который является модификацией метода наименьших квадратов (Ordinary Least Squares, OLS) и спроектирован для быть устойчивым к выбросам в данных.

В отличие от OLS, который минимизирует квадраты ошибок, HuberRegressor минимизирует комбинацию квадратов ошибок и абсолютных значений ошибок. Это позволяет методу более устойчиво работать в случае наличия выбросов в данных.

Принцип работы HuberRegressor:

Входные данные: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Функция потерь Huber: HuberRegressor использует функцию потерь Huber, которая объединяет квадратичную функцию потерь для малых ошибок и линейную функцию потерь для больших ошибок. Это делает метод более устойчивым к выбросам.
Обучение модели: Модель обучается на данных с использованием функции потерь Huber. Во время обучения она настраивает веса (коэффициенты) для каждого признака и смещение.
Предсказание: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества HuberRegressor:

Устойчивость к выбросам: HuberRegressor более устойчив к выбросам в данных по сравнению с OLS, что делает его полезным в задачах, где данные могут содержать аномальные значения.
Оценка ошибок: Функция потерь Huber способствует оценке ошибок предсказания, что может быть полезно для анализа результатов модели.
Уровень регуляризации: HuberRegressor также может включать в себя уровень регуляризации, что может уменьшить переобучение.

Ограничения HuberRegressor:

Не так точен, как OLS в случае отсутствия выбросов: В случае, если в данных нет выбросов, OLS может дать более точные результаты.
Настройка параметра: HuberRegressor имеет параметр, который определяет, какой порог считать "большим" для переключения на линейную функцию потерь. Этот параметр требует настройки.

HuberRegressor полезен в задачах регрессии, где данные могут содержать выбросы и требуется модель, которая устойчива к таким аномалиям.

2.1.5.1. Код создания модели HuberRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.HuberRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# HuberRegressor.py
# The code demonstrates the process of training HuberRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import HuberRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "HuberRegressor"
onnx_model_filename = data_path + "huber_regressor"

# create a Huber Regressor model
huber_regressor_model = HuberRegressor()

# fit the model to the data
huber_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = huber_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(huber_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(huber_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  HuberRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9962363935647066
Python  Mean Absolute Error: 6.341633708569641
Python  Mean Squared Error: 49.80289464784336
Python  
Python  HuberRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\huber_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962363944236795
Python  Mean Absolute Error: 6.341633300252807
Python  Mean Squared Error: 49.80288328126165
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  HuberRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\huber_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962363935647066
Python  Mean Absolute Error: 6.341633708569641
Python  Mean Squared Error: 49.80289464784336
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.21. Результат работы скрипта HuberRegressor.py (float ONNX)

2.1.5.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели huber_regressor_float.onnx и huber_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                               HuberRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "HuberRegressor"
#define   ONNXFilenameFloat  "huber_regressor_float.onnx"
#define   ONNXFilenameDouble "huber_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

HuberRegressor (EURUSD,H1)      Testing ONNX float: HuberRegressor (huber_regressor_float.onnx)
HuberRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962363944236795
HuberRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3416333002528074
HuberRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 49.8028832812616571
HuberRegressor (EURUSD,H1)      
HuberRegressor (EURUSD,H1)      Testing ONNX double: HuberRegressor (huber_regressor_double.onnx)
HuberRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962363935647066
HuberRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3416337085696410
HuberRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 49.8028946478433525

Сравнение с оригинальной моделью:

Testing ONNX float: HuberRegressor (huber_regressor_float.onnx)
Python  Mean Absolute Error: 6.341633708569641
MQL5:   Mean Absolute Error: 6.3416333002528074
      
Testing ONNX double: HuberRegressor (huber_regressor_double.onnx)
Python  Mean Absolute Error: 6.341633708569641
MQL5:   Mean Absolute Error: 6.3416337085696410

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.5.3. ONNX-представление моделей huber_regressor_float.onnx и huber_regressor_double.onnx

Рис.22. ONNX-представление модели huber_regressor_float.onnx в Netron

Рис.23. ONNX-представление модели huber_regressor_double.onnx в Netron

2.1.6. sklearn.linear_model.Lars

LARS (Least Angle Regression) - это метод машинного обучения, используемый для задачи регрессии. Он представляет собой алгоритм, который строит модель линейной регрессии, выбирая активные признаки (переменные) по мере обучения.

LARS пытается найти наименьшее количество признаков, которые дают наилучшее приближение к целевой переменной.

Принцип работы LARS:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Инициализация: Начинается с нулевой модели, то есть без активных признаков. Все коэффициенты устанавливаются в ноль.
Выбор признака: На каждом шаге LARS выбирает признак, который наиболее коррелирует с остатками модели. Затем этот признак добавляется в модель, и соответствующий коэффициент настраивается с использованием метода наименьших квадратов.
Регрессия вдоль активных признаков: После добавления признака в модель, LARS обновляет коэффициенты всех активных признаков, чтобы учесть изменения в новой модели.
Шаги повторяются: Этот процесс повторяется до тех пор, пока не будут выбраны все признаки или достигнут заданный критерий останова.
Получение прогноза: После обучения модели, она может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества LARS:

Эффективность: LARS может быть эффективным методом, особенно когда число признаков велико, но только небольшое количество из них действительно влияет на целевую переменную.
Интерпретируемость: Так как LARS стремится выбрать только самые информативные признаки, модель остается относительно интерпретируемой.

Ограничения LARS:

Линейная модель: LARS строит линейную модель, что может быть недостаточно для моделирования сложных нелинейных зависимостей.
Неустойчивость к шуму: Метод может быть чувствителен к выбросам в данных.
Не способен обрабатывать мультиколлинеарность: Если признаки сильно коррелированы, LARS может столкнуться с проблемой мультиколлинеарности.

LARS полезен в задачах регрессии, где важно выбрать самые информативные признаки и построить линейную модель с минимальным количеством признаков.

2.1.6.1. Код создания модели Lars и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.Lars, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# Lars.py
# The code demonstrates the process of training Lars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lars
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "Lars"
onnx_model_filename = data_path + "lars"

# create a Lars Regressor model
lars_regressor_model = Lars()

# fit the model to the data
lars_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lars_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lars_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lars_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  Lars Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  
Python  Lars ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  Lars ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  15

Рис.24. Результат работы скрипта Lars.py (float ONNX)

2.1.6.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели lars_cv_float.onnx и lars_cv_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                         Lars.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "Lars"
#define   ONNXFilenameFloat  "lars_float.onnx"
#define   ONNXFilenameDouble "lars_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

Lars (EURUSD,H1)        Testing ONNX float: Lars (lars_float.onnx)
Lars (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
Lars (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3477377671679385
Lars (EURUSD,H1)        MQL5:   Mean Squared Error: 49.7781414740478638
Lars (EURUSD,H1)        
Lars (EURUSD,H1)        Testing ONNX double: Lars (lars_double.onnx)
Lars (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
Lars (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3477379263364302
Lars (EURUSD,H1)        MQL5:   Mean Squared Error: 49.7781401712817768

Сравнение с оригинальной моделью:

Testing ONNX float: Lars (lars_float.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477377671679385

Testing ONNX double: Lars (lars_double.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477379263364302

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.6.3. ONNX-представление моделей lars_float.onnx и lars_double.onnx

Рис.25. ONNX-представление модели lars_float.onnx в Netron

Рис.26. ONNX-представление модели lars_double.onnx в Netron

2.1.7. sklearn.linear_model.LarsCV

LarsCV - это вариант метода LARS (Least Angle Regression), который автоматически выбирает оптимальное количество признаков для включения в модель, используя кросс-валидацию.

Этот метод позволяет найти баланс между моделью, которая хорошо обобщает данные, и моделью, которая использует минимальное количество признаков.

Принцип работы LarsCV:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Инициализация: Начинается с нулевой модели, то есть без активных признаков. Все коэффициенты устанавливаются в ноль.
Кросс-валидация: LarsCV выполняет кросс-валидацию для разного количества включенных признаков. Это позволяет оценить производительность модели при разных наборах признаков.
Выбор оптимального числа признаков: LarsCV выбирает количество признаков, при котором достигается наилучшая производительность модели, как определено с использованием кросс-валидации.
Обучение модели: Модель обучается с использованием выбранного количества признаков и соответствующих коэффициентов.
Предсказание: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества LarsCV:

Автоматический выбор признаков: LarsCV позволяет автоматически выбирать оптимальное количество признаков, что упрощает процесс настройки модели.
Интерпретируемость: Как и обычный LARS, LarsCV сохраняет относительно высокую интерпретируемость модели.
Эффективность: Метод может быть эффективным, особенно когда набор данных имеет много признаков, но только небольшое количество из них важно.

Ограничения LarsCV:

Линейная модель: LarsCV строит линейную модель, что может быть недостаточно для моделирования сложных нелинейных зависимостей.
Неустойчивость к шуму: Метод может быть чувствителен к выбросам в данных.
Не способен обрабатывать мультиколлинеарность: Если признаки сильно коррелированы, LarsCV может столкнуться с проблемой мультиколлинеарности.

LarsCV полезен в задачах регрессии, где важно автоматически выбрать наилучший набор признаков, используемых в модели, и при этом сохранить интерпретируемость модели.

2.1.7.1. Код создания модели LarsCV и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.LarsCV, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# LarsCV.py
# The code demonstrates the process of training LarsCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LarsCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LarsCV"
onnx_model_filename = data_path + "lars_cv"

# create a LarsCV Regressor model
larscv_regressor_model = LarsCV()

# fit the model to the data
larscv_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = larscv_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(larscv_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(larscv_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  LarsCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  
Python  LarsCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382640824089
Python  Mean Absolute Error: 6.347737845846069
Python  Mean Squared Error: 49.778142539016564
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LarsCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  16

Рис.27. Результат работы скрипта LarsCV.py (float ONNX)

2.1.7.2. Код на MQL5 для исполнения ONNX-моделей

//+------------------------------------------------------------------+
//|                                                       LarsCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LarsCV"
#define   ONNXFilenameFloat  "lars_cv_float.onnx"
#define   ONNXFilenameDouble "lars_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

LarsCV (EURUSD,H1)      Testing ONNX float: LarsCV (lars_cv_float.onnx)
LarsCV (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962382640824089
LarsCV (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3477378458460691
LarsCV (EURUSD,H1)      MQL5:   Mean Squared Error: 49.7781425390165566
LarsCV (EURUSD,H1)      
LarsCV (EURUSD,H1)      Testing ONNX double: LarsCV (lars_cv_double.onnx)
LarsCV (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962382642612767
LarsCV (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3477379221400145
LarsCV (EURUSD,H1)      MQL5:   Mean Squared Error: 49.7781401721031642

Сравнение с оригинальной моделью:

Testing ONNX float: LarsCV (lars_cv_float.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477378458460691

Testing ONNX double: LarsCV (lars_cv_double.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477379221400145

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 16 знаков после запятой.

2.1.7.3. ONNX-представление моделей lars_cv_float.onnx и lars_cv_double.onnx

Рис.28. ONNX-представление модели lars_cv_float.onnx в Netron

Рис.29. ONNX-представление модели lars_cv_double.onnx в Netron

2.1.8. sklearn.linear_model.Lasso

Lasso (Least Absolute Shrinkage and Selection Operator) является методом регрессии, который применяется для выбора наиболее важных признаков и уменьшения размерности модели.

Он достигает этого путем добавления штрафа на сумму абсолютных значений коэффициентов (L1-регуляризация) в задаче оптимизации линейной регрессии.

Принцип работы Lasso:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Целевая функция: Целевая функция в Lasso включает в себя сумму квадратов ошибок регрессии, а также штраф на сумму абсолютных значений коэффициентов при признаках.
Оптимизация: Модель Lasso обучается путем минимизации целевой функции, что приводит к тому, что некоторые коэффициенты становятся равными нулю, что эквивалентно исключению соответствующих признаков из модели.
Выбор оптимального значения штрафа: В Lasso существует гиперпараметр, который определяет силу регуляризации. Выбор оптимального значения этого гиперпараметра может потребовать кросс-валидации.
Получение прогноза: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества Lasso:

Выбор признаков: Lasso автоматически выбирает наиболее важные признаки, исключая менее важные из модели. Это позволяет уменьшить размерность данных и упростить модель.
Регуляризация: Штраф на сумму абсолютных значений коэффициентов помогает предотвратить переобучение модели и увеличить ее обобщающую способность.
Интерпретируемость: Поскольку Lasso исключает некоторые признаки, модель остается относительно интерпретируемой.

Ограничения Lasso:

Линейная модель: Lasso строит линейную модель, что может быть недостаточно для моделирования сложных нелинейных зависимостей.
Неустойчивость к шуму: Метод может быть чувствителен к выбросам в данных.
Не способен обрабатывать мультиколлинеарность: Если признаки сильно коррелированы, Lasso может столкнуться с проблемой мультиколлинеарности.

Lasso полезен в задачах регрессии, где важно выбрать наиболее важные признаки и уменьшить размерность модели, при этом сохраняя интерпретируемость.

2.1.8.1. Код создания модели Lasso и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.Lasso, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# Lasso.py
# The code demonstrates the process of training Lasso model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "Lasso"
onnx_model_filename = data_path + "lasso"

# create a Lasso model
lasso_model = Lasso()

# fit the model to the data
lasso_model.fit(X, y)

# predict values for the entire dataset
y_pred = lasso_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lasso_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lasso_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  Lasso Original model (double)
Python  R-squared (Coefficient of determination): 0.9962381735682287
Python  Mean Absolute Error: 6.346393791922984
Python  Mean Squared Error: 49.77934029129379
Python  
Python  Lasso ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962381720269486
Python  Mean Absolute Error: 6.346395056911361
Python  Mean Squared Error: 49.77936068668213
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  Lasso ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962381735682287
Python  Mean Absolute Error: 6.346393791922984
Python  Mean Squared Error: 49.77934029129379
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.30. Результат работы скрипта Lasso.py (float ONNX)

2.1.8.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели lasso_float.onnx и lasso_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                        Lasso.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "Lasso"
#define   ONNXFilenameFloat  "lasso_float.onnx"
#define   ONNXFilenameDouble "lasso_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

Lasso (EURUSD,H1)       Testing ONNX float: Lasso (lasso_float.onnx)
Lasso (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962381720269486
Lasso (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3463950569113612
Lasso (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7793606866821037
Lasso (EURUSD,H1)       
Lasso (EURUSD,H1)       Testing ONNX double: Lasso (lasso_double.onnx)
Lasso (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962381735682287
Lasso (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3463937919229840
Lasso (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7793402912937850

Сравнение с оригинальной моделью:

Testing ONNX float: Lasso (lasso_float.onnx)
Python  Mean Absolute Error: 6.346393791922984
MQL5:   Mean Absolute Error: 6.3463950569113612

Testing ONNX double: Lasso (lasso_double.onnx)
Python  Mean Absolute Error: 6.346393791922984
MQL5:   Mean Absolute Error: 6.3463937919229840

Точность MAE ONNX float: 5 знаков после запятой, точность MAE ONNX double 15 знаков после запятой.

2.1.8.3. ONNX-представление моделей lasso_float.onnx и lasso_double.onnx

Рис.31. ONNX-представление модели lasso_float.onnx в Netron

Рис.32. ONNX-представление модели lasso_double.onnx в Netron

2.1.9. sklearn.linear_model.LassoCV

LassoCV - это вариант метода Lasso (Least Absolute Shrinkage and Selection Operator), который автоматически выбирает оптимальное значение гиперпараметра регуляризации (alpha) с использованием кросс-валидации.

Этот метод позволяет найти баланс между уменьшением размерности модели (выбором важных признаков) и предотвращением переобучения, что делает его полезным для задач регрессии.

Принцип работы LassoCV:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Инициализация: LassoCV инициализирует несколько различных значений гиперпараметра регуляризации (alpha), которые охватывают диапазон значений от низкого до высокого.
Кросс-валидация: Для каждого значения alpha, LassoCV выполняет кросс-валидацию, чтобы оценить производительность модели. Обычно используется метрика, такая как среднеквадратичная ошибка (MSE) или коэффициент детерминации (R^2).
Выбор оптимального alpha: LassoCV выбирает значение alpha, при котором модель достигает наилучшей производительности, как определено с использованием кросс-валидации.
Обучение модели: Модель Lasso обучается с использованием выбранного значения alpha, исключая менее важные признаки и применяя L1-регуляризацию.
Получение прогноза: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества LassoCV:

Автоматический выбор alpha: LassoCV позволяет автоматически выбирать оптимальное значение гиперпараметра alpha с использованием кросс-валидации, упрощая настройку модели.
Выбор признаков: LassoCV автоматически выбирает наиболее важные признаки, что уменьшает размерность модели и упрощает ее интерпретацию.
Регуляризация: Метод предотвращает переобучение модели с помощью L1-регуляризации.

Ограничения LassoCV:

Линейная модель: LassoCV строит линейную модель, что может быть недостаточно для моделирования сложных нелинейных зависимостей.
Неустойчивость к шуму: Метод может быть чувствителен к выбросам в данных.
Не способен обрабатывать мультиколлинеарность: Если признаки сильно коррелированы, LassoCV может столкнуться с проблемой мультиколлинеарности.

LassoCV полезен в задачах регрессии, где важно выбрать наиболее важные признаки и уменьшить размерность модели, при этом сохраняя интерпретируемость и предотвращая переобучение.

2.1.9.1. Код создания модели LassoCV и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.LassoCV, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# LassoCV.py
# The code demonstrates the process of training LassoCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LassoCV"
onnx_model_filename = data_path + "lasso_cv"

# create a LassoCV Regressor model
lassocv_regressor_model = LassoCV()

# fit the model to the data
lassocv_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lassocv_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lassocv_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lassocv_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  LassoCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962241428413416
Python  Mean Absolute Error: 6.33567334453819
Python  Mean Squared Error: 49.96500551028169
Python  
Python  LassoCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.996224142876629
Python  Mean Absolute Error: 6.335673221332177
Python  Mean Squared Error: 49.96500504333324
Python  R^2 matching decimal places:  10
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  6
Python  float ONNX model precision:  6
Python  
Python  LassoCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962241428413416
Python  Mean Absolute Error: 6.33567334453819
Python  Mean Squared Error: 49.96500551028169
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  14

Рис.33. Результат работы скрипта LassoCV.py (float ONNX)

2.1.9.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели lasso_cv_float.onnx и lasso_cv_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                      LassoCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoCV"
#define   ONNXFilenameFloat  "lasso_cv_float.onnx"
#define   ONNXFilenameDouble "lasso_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

2023.10.26 22:14:00.736 LassoCV (EURUSD,H1)     Testing ONNX float: LassoCV (lasso_cv_float.onnx)
2023.10.26 22:14:00.739 LassoCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962241428766290
2023.10.26 22:14:00.739 LassoCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3356732213321800
2023.10.26 22:14:00.739 LassoCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.9650050433332211
2023.10.26 22:14:00.748 LassoCV (EURUSD,H1)     
2023.10.26 22:14:00.748 LassoCV (EURUSD,H1)     Testing ONNX double: LassoCV (lasso_cv_double.onnx)
2023.10.26 22:14:00.753 LassoCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962241428413416
2023.10.26 22:14:00.753 LassoCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3356733445381899
2023.10.26 22:14:00.753 LassoCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.9650055102816992

Сравнение с оригинальной моделью:

Testing ONNX float: LassoCV (lasso_cv_float.onnx)
Python  Mean Absolute Error: 6.33567334453819
MQL5:   Mean Absolute Error: 6.3356732213321800
        
Testing ONNX double: LassoCV (lasso_cv_double.onnx)
Python  Mean Absolute Error: 6.33567334453819
MQL5:   Mean Absolute Error: 6.3356733445381899

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.9.3. ONNX-представление моделей lasso_cv_float.onnx и lasso_cv_double.onnx

Рис.34. ONNX-представление модели lasso_cv_float.onnx в Netron

Рис.35. ONNX-представление модели lasso_cv_double.onnx в Netron

2.1.10. sklearn.linear_model.LassoLars

LassoLars - это комбинация двух методов: Lasso (Least Absolute Shrinkage and Selection Operator) и LARS (Least Angle Regression).

Этот метод используется для задач регрессии и объединяет преимущества обоих алгоритмов, что позволяет одновременно выбирать важные признаки и уменьшать размерность модели.

Принцип работы LassoLars:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Инициализация: LassoLars начинает с нулевой модели, то есть без активных признаков. Все коэффициенты устанавливаются в ноль.
Пошаговый выбор признаков: Аналогично методу LARS, LassoLars на каждом шаге выбирает признак, который наиболее коррелирует с остатками модели, и добавляет его в модель. Затем коэффициент этого признака настраивается с использованием метода наименьших квадратов.
Применение L1-регуляризации: Одновременно с пошаговым выбором признаков LassoLars применяет L1-регуляризацию, добавляя штраф на сумму абсолютных значений коэффициентов. Это позволяет моделировать сложные зависимости и выбирать наиболее важные признаки.
Получение прогноза: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества LassoLars:

Выбор признаков: LassoLars автоматически выбирает наиболее важные признаки и уменьшает размерность модели, что помогает избежать переобучения и упростить интерпретацию.
Интерпретируемость: Метод сохраняет интерпретируемость модели, так как можно легко определить, какие признаки включены в модель и как они влияют на целевую переменную.
Регуляризация: LassoLars применяет L1-регуляризацию, что помогает предотвратить переобучение и повысить обобщающую способность модели.

Ограничения LassoLars:

Линейная модель: LassoLars строит линейную модель, что может быть недостаточно для моделирования сложных нелинейных зависимостей.
Неустойчивость к шуму: Метод может быть чувствителен к выбросам в данных.
Вычислительная сложность: Выбор признаков на каждом шаге и применение регуляризации может потребовать больше вычислительных ресурсов, чем простая линейная регрессия.

LassoLars полезен в задачах регрессии, где важно выбрать наиболее важные признаки, уменьшить размерность модели и сохранить интерпретируемость.

2.1.10.1. Код создания модели LassoLars и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.LassoLars, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# LassoLars.py
# The code demonstrates the process of training LassoLars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLars
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LassoLars"
onnx_model_filename = data_path + "lasso_lars"

# create a LassoLars Regressor model
lassolars_regressor_model = LassoLars(alpha=0.1)

# fit the model to the data
lassolars_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lassolars_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lassolars_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lassolars_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  LassoLars Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382633544077
Python  Mean Absolute Error: 6.3476035128950805
Python  Mean Squared Error: 49.778152172481896
Python  
Python  LassoLars ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382635045889
Python  Mean Absolute Error: 6.3476034814795375
Python  Mean Squared Error: 49.77815018516975
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LassoLars ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382633544077
Python  Mean Absolute Error: 6.3476035128950805
Python  Mean Squared Error: 49.778152172481896
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  16

Рис.36. Результат работы скрипта LassoLars.py (float)

2.1.10.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели lasso_lars_float.onnx и lasso_lars_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                    LassoLars.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoLars"
#define   ONNXFilenameFloat  "lasso_lars_float.onnx"
#define   ONNXFilenameDouble "lasso_lars_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

LassoLars (EURUSD,H1)   Testing ONNX float: LassoLars (lasso_lars_float.onnx)
LassoLars (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382635045889
LassoLars (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3476034814795375
LassoLars (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781501851697357
LassoLars (EURUSD,H1)   
LassoLars (EURUSD,H1)   Testing ONNX double: LassoLars (lasso_lars_double.onnx)
LassoLars (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382633544077
LassoLars (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3476035128950858
LassoLars (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781521724819029

Сравнение с оригинальной моделью:

Testing ONNX float: LassoLars (lasso_lars_float.onnx)
Python  Mean Absolute Error: 6.3476035128950805
MQL5:   Mean Absolute Error: 6.3476034814795375

Testing ONNX double: LassoLars (lasso_lars_double.onnx)
Python  Mean Absolute Error: 6.3476035128950805
MQL5:   Mean Absolute Error: 6.3476035128950858

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.10.3. ONNX-представление моделей lasso_lars_float.onnx и lasso_lars_double.onnx

Рис.37. ONNX-представление модели lasso_lars_float.onnx в Netron

Рис.38. ONNX-представление модели lasso_lars_double.onnx в Netron

2.1.11. sklearn.linear_model.LassoLarsCV

LassoLarsCV представляет собой комбинацию методов Lasso (Least Absolute Shrinkage and Selection Operator) и LARS (Least Angle Regression) с автоматическим выбором оптимального значения гиперпараметра регуляризации (alpha) с использованием кросс-валидации.

Этот метод объединяет преимущества обоих алгоритмов и позволяет определить оптимальное значение alpha для модели с учетом выбора признаков и регуляризации.

Принцип работы LassoLarsCV:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Инициализация: LassoLarsCV начинает с нулевой модели, где все коэффициенты равны нулю.
Определение диапазона alpha: Определяется диапазон значений гиперпараметра alpha, который будет рассмотрен в процессе выбора. Обычно используется логарифмическая шкала значений alpha.
Кросс-валидация: Для каждого значения alpha из выбранного диапазона LassoLarsCV выполняет кросс-валидацию для оценки производительности модели с этим значением alpha. Обычно используется метрика, такая как среднеквадратичная ошибка (MSE) или коэффициент детерминации (R^2).
Выбор оптимального alpha: LassoLarsCV выбирает значение alpha, при котором модель достигает наилучшей производительности на основе результатов кросс-валидации.
Обучение модели: Модель LassoLars обучается с использованием выбранного значения alpha, исключая менее важные признаки и применяя L1-регуляризацию.
Получение прогноза: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества LassoLarsCV:

Автоматический выбор alpha: LassoLarsCV автоматически выбирает оптимальное значение гиперпараметра alpha с использованием кросс-валидации, что упрощает настройку модели.
Выбор признаков: LassoLarsCV автоматически выбирает наиболее важные признаки и уменьшает размерность модели.
Регуляризация: Метод применяет L1-регуляризацию, что помогает предотвратить переобучение и увеличить обобщающую способность модели.

Ограничения LassoLarsCV:

Линейная модель: LassoLarsCV строит линейную модель, что может быть недостаточно для моделирования сложных нелинейных зависимостей.
Неустойчивость к шуму: Метод может быть чувствителен к выбросам в данных.
Вычислительная сложность: Выбор признаков на каждом шаге и применение регуляризации может потребовать больше вычислительных ресурсов, чем простая линейная регрессия.

LassoLarsCV полезен в задачах регрессии, где важно выбрать наиболее важные признаки, уменьшить размерность модели, предотвратить переобучение и автоматически настроить гиперпараметры модели.

2.1.11.1. Код создания модели LassoLarsCV и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.LassoLarsCV, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# LassoLarsCV.py
# The code demonstrates the process of training LassoLars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLarsCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LassoLarsCV"
onnx_model_filename = data_path + "lasso_lars_cv"

# create a LassoLarsCV Regressor model
lassolars_cv_regressor_model = LassoLarsCV(cv=5)

# fit the model to the data
lassolars_cv_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lassolars_cv_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lassolars_cv_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lassolars_cv_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  LassoLarsCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  
Python  LassoLarsCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382640824089
Python  Mean Absolute Error: 6.347737845846069
Python  Mean Squared Error: 49.778142539016564
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LassoLarsCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  16

Рис.39. Результат работы скрипта LassoLarsCV.py (float ONNX)

2.1.11.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели lasso_lars_cv_float.onnx и lasso_lars_cv_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                  LassoLarsCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoLarsCV"
#define   ONNXFilenameFloat  "lasso_lars_cv_float.onnx"
#define   ONNXFilenameDouble "lasso_lars_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

LassoLarsCV (EURUSD,H1) Testing ONNX float: LassoLarsCV (lasso_lars_cv_float.onnx)
LassoLarsCV (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382640824089
LassoLarsCV (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477378458460691
LassoLarsCV (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781425390165566
LassoLarsCV (EURUSD,H1) 
LassoLarsCV (EURUSD,H1) Testing ONNX double: LassoLarsCV (lasso_lars_cv_double.onnx)
LassoLarsCV (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382642612767
LassoLarsCV (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477379221400145
LassoLarsCV (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781401721031642

Сравнение с оригиналом:

Testing ONNX float: LassoLarsCV (lasso_lars_cv_float.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477378458460691
        
Testing ONNX double: LassoLarsCV (lasso_lars_cv_double.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477379221400145

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 16 знаков после запятой.

2.1.11.3. ONNX-представление моделей lasso_lars_cv_float.onnx и lasso_lars_cv_double.onnx

Рис.40. ONNX-представление модели lasso_lars_cv_float.onnx в Netron

Рис.41. ONNX-представление модели lasso_lars_cv_double.onnx в Netron

2.1.12. sklearn.linear_model.LassoLarsIC

LassoLarsIC - это метод регрессии, который объединяет Lasso (Least Absolute Shrinkage and Selection Operator) и информационный критерий (IC) для автоматического выбора оптимального набора признаков.

Он использует информационные критерии, такие как AIC (Akaike Information Criterion) и BIC (Bayesian Information Criterion), чтобы определить, какие признаки следует включить в модель, и применяет L1-регуляризацию для оценки коэффициентов модели.

Принцип работы LassoLarsIC:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Инициализация: LassoLarsIC начинает с нулевой модели, то есть без активных признаков. Все коэффициенты устанавливаются в ноль.
Выбор признаков с использованием информационного критерия: Метод оценивает информационный критерий (например, AIC или BIC) для разных наборов признаков, начиная с пустой модели и постепенно включая признаки в модель. Информационный критерий позволяет оценить качество модели, учитывая компромисс между подгонкой данных и сложностью модели.
Выбор оптимального набора признаков: LassoLarsIC выбирает набор признаков, для которого информационный критерий достигает наилучшего значения. Этот набор признаков будет включен в модель.
Применение L1-регуляризации: Для выбранных признаков применяется L1-регуляризация, что помогает оценить коэффициенты модели.
Получение прогноза: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества LassoLarsIC:

Автоматический выбор признаков: LassoLarsIC автоматически выбирает оптимальный набор признаков, уменьшая размерность модели и предотвращая переобучение.
Информационные критерии: Использование информационных критериев позволяет учитывать баланс между качеством модели и ее сложностью.
Регуляризация: Метод применяет L1-регуляризацию, что помогает предотвратить переобучение и повысить обобщающую способность модели.

Ограничения LassoLarsIC:

Линейная модель: LassoLarsIC строит линейную модель, что может быть недостаточно для моделирования сложных нелинейных зависимостей.
Неустойчивость к шуму: Метод может быть чувствителен к выбросам в данных.
Вычислительная сложность: Оценка информационных критериев для разных наборов признаков может потребовать дополнительных вычислительных ресурсов.

LassoLarsIC полезен в задачах регрессии, где важно автоматически выбирать наилучший набор признаков и уменьшать размерность модели с учетом информационного критерия.

2.1.12.1. Код создания модели LassoLarsIC и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.LassoLarsIC, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# LassoLarsIC.py
# The code demonstrates the process of training LassoLarsIC model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLarsIC
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name="LassoLarsIC"
onnx_model_filename = data_path + "lasso_lars_ic"

# create a LassoLarsIC Regressor model
lasso_lars_ic_regressor_model = LassoLarsIC(criterion='aic')

# fit the model to the data
lasso_lars_ic_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lasso_lars_ic_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lasso_lars_ic_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lasso_lars_ic_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  LassoLarsIC Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  
Python  LassoLarsIC ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_ic_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LassoLarsIC ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_ic_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  15

Рис.42. Результат работы скрипта LassoLarsIC.py (float ONNX)

2.1.12.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели lasso_lars_ic_float.onnx и lasso_lars_ic_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                  LassoLarsIC.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoLarsIC"
#define   ONNXFilenameFloat  "lasso_lars_ic_float.onnx"
#define   ONNXFilenameDouble "lasso_lars_ic_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

LassoLarsIC (EURUSD,H1) Testing ONNX float: LassoLarsIC (lasso_lars_ic_float.onnx)
LassoLarsIC (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
LassoLarsIC (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477377671679385
LassoLarsIC (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781414740478638
LassoLarsIC (EURUSD,H1) 
LassoLarsIC (EURUSD,H1) Testing ONNX double: LassoLarsIC (lasso_lars_ic_double.onnx)
LassoLarsIC (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
LassoLarsIC (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477379263364302
LassoLarsIC (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781401712817768

Сравнение с оригинальной моделью:

Testing ONNX float: LassoLarsIC (lasso_lars_ic_float.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477377671679385
 
Testing ONNX double: LassoLarsIC (lasso_lars_ic_double.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477379263364302

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.12.3. ONNX-представление моделей lasso_lars_ic_float.onnx и lasso_lars_ic_double.onnx

Рис.43. ONNX-представление модели lasso_lars_ic_float.onnx в Netron

Рис.44. ONNX-представление модели lasso_lars_ic_double.onnx в Netron

2.1.13. sklearn.linear_model.LinearRegression

LinearRegression - это один из наиболее простых и широко используемых методов в машинном обучении для задачи регрессии.

Он используется для построения линейных моделей, которые предсказывают числовые значения (непрерывные) целевой переменной на основе линейной комбинации входных признаков.

Принцип работы LinearRegression:

Линейная модель: Модель LinearRegression предполагает, что существует линейная зависимость между независимыми переменными (признаками) и целевой переменной. Эта зависимость может быть выражена уравнением линейной регрессии: y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ, где y - целевая переменная, β₀ - коэффициент сдвига (пересечения), β₁, β₂, ... βₚ - коэффициенты признаков, x₁, x₂, ... xₚ - значения признаков.
Оценка параметров: Задача LinearRegression состоит в оценке коэффициентов β₀, β₁, β₂, ... βₚ, которые наилучшим образом соответствуют данным. Обычно это делается с использованием метода наименьших квадратов (OLS), который минимизирует сумму квадратов разницы между фактическими и предсказанными значениями.
Оценка качества модели: Для оценки качества модели LinearRegression используются различные метрики, такие как средняя квадратическая ошибка (MSE), коэффициент детерминации (R²) и другие.

Преимущества LinearRegression:

Простота и интерпретируемость: LinearRegression - это простой метод с легкой интерпретируемостью, который позволяет анализировать влияние каждого признака на целевую переменную.
Высокая скорость обучения и предсказания: Модель линейной регрессии имеет высокую скорость обучения и предсказания, что делает ее хорошим выбором для больших наборов данных.
Применимость: LinearRegression может быть успешно применена к разнообразным задачам регрессии.

Ограничения LinearRegression:

Линейность: Этот метод предполагает линейность взаимосвязи между признаками и целевой переменной, что может быть недостаточным для моделирования сложных нелинейных зависимостей.
Чувствительность к выбросам: LinearRegression чувствительна к выбросам в данных, что может повлиять на качество модели.

LinearRegression - это простой и широко используемый метод регрессии, который строит линейную модель для предсказания числовых значений целевой переменной на основе линейной комбинации входных признаков. Он хорошо подходит для задач с линейной зависимостью и когда важна интерпретируемость модели.

2.1.13.1. Код создания модели LinearRegression и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.LinearRegression, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# LinearRegression.py
# The code demonstrates the process of training LinearRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LinearRegression"
onnx_model_filename = data_path + "linear_regression"

# create a Linear Regression model
linear_model = LinearRegression()

# fit the model to the data
linear_model.fit(X, y)

# predict values for the entire dataset
y_pred = linear_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(linear_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(linear_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  LinearRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  
Python  LinearRegression ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_regression_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LinearRegression ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_regression_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.45. Результат работы скрипта LinearRegression.py (float ONNX)

2.1.13.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели linear_regression_float.onnx и linear_regression_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                             LinearRegression.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LinearRegression"
#define   ONNXFilenameFloat  "linear_regression_float.onnx"
#define   ONNXFilenameDouble "linear_regression_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

LinearRegression (EURUSD,H1)    Testing ONNX float: LinearRegression (linear_regression_float.onnx)
LinearRegression (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
LinearRegression (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3477377671679385
LinearRegression (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7781414740478638
LinearRegression (EURUSD,H1)    
LinearRegression (EURUSD,H1)    Testing ONNX double: LinearRegression (linear_regression_double.onnx)
LinearRegression (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
LinearRegression (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3477379263364266
LinearRegression (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7781401712817768

Сравнение с оригинальной моделью:

Testing ONNX float: LinearRegression (linear_regression_float.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477377671679385

Testing ONNX double: LinearRegression (linear_regression_double.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477379263364266

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.13.3. ONNX-представление моделей linear_regression_float.onnx и linear_regression_double.onnx

Рис.46. ONNX-представление модели linear_regression_float.onnx в Netron

Рис.47. ONNX-представление модели linear_regression_double.onnx в Netron

Примечание к методам Ridge и RidgeCV

Ridge и RidgeCV - это два связанных метода в машинном обучении, используемые для регрессии с регуляризацией методом Ридж. Они имеют общий функционал, но различаются в способе использования и настройки параметров.

Принцип работы Ridge (Ridge Regression):

Ridge - это метод регрессии, который включает в себя L2-регуляризацию. Это означает, что он добавляет сумму квадратов коэффициентов (L2-норму) в функцию потерь, которую минимизирует модель. Этот дополнительный член регуляризации помогает уменьшить величины коэффициентов модели и тем самым предотвращает переобучение.
Использование параметра alpha: В методе Ridge параметр alpha (также называемый силой регуляризации) задается заранее и не изменяется автоматически. Пользователь должен выбрать подходящее значение alpha на основе своих знаний о данных и экспериментов.

Принцип работы RidgeCV (Ridge Cross-Validation):

RidgeCV - это расширение метода Ridge, которое включает в себя автоматический подбор оптимального значения параметра alpha с использованием кросс-валидации. Вместо того чтобы пользователь задавал alpha вручную, RidgeCV выполняет перебор различных значений alpha и выбирает значение, которое дает наилучшую производительность на кросс-валидации.
Преимущество автоматической настройки: Основное преимущество RidgeCV заключается в том, что он автоматически находит оптимальное значение параметра alpha без необходимости вручную настраивать его. Это делает процесс настройки более удобным и предотвращает возможные ошибки выбора alpha.

Основное отличие между Ridge и RidgeCV заключается в том, что Ridge требует, чтобы пользователь явно задал значение параметра alpha, в то время как RidgeCV автоматически находит оптимальное значение alpha с использованием кросс-валидации. RidgeCV обычно является более предпочтительным выбором, если у вас есть большое количество данных и вы хотите избежать ручной настройки параметров.

2.1.14. sklearn.linear_model.Ridge

Ridge - это метод регрессии, используемый в машинном обучении для решения задач регрессии. Он является частью семейства линейных моделей и представляет собой регуляризованную линейную регрессию.

Главной особенностью Ridge-регрессии является добавление L2-регуляризации к стандартному методу наименьших квадратов (OLS).

Принцип работы Ridge-регрессии:

Линейная регрессия: Как и в обычной линейной регрессии, Ridge-регрессия стремится найти линейную связь между независимыми переменными (признаками) и целевой переменной.
L2-регуляризация: Главное отличие Ridge-регрессии заключается в том, что она добавляет L2-регуляризацию к функции потерь. Это означает, что к сумме квадратов разностей между фактическими и предсказанными значениями добавляется штраф за большие значения коэффициентов регрессии.
Штраф на коэффициенты: L2-регуляризация применяет штраф на значения коэффициентов регрессии. Это приводит к тому, что некоторые коэффициенты становятся ближе к нулю, что, в свою очередь, уменьшает переобучение и повышает устойчивость модели.
Гиперпараметр α: Одним из важных параметров Ridge-регрессии является гиперпараметр α (альфа), который определяет степень регуляризации. Большие значения α приводят к более сильной регуляризации и, как следствие, к более простым моделям с более низкими значениями коэффициентов.

Преимущества Ridge-регрессии:

Уменьшение переобучения: L2-регуляризация в Ridge помогает уменьшить переобучение, делая модель более устойчивой к шуму в данных.
Устойчивость к мультиколлинеарности: Ridge-регрессия хорошо справляется с проблемой мультиколлинеарности, когда признаки сильно коррелированы.
Решение проклятия размерности: Метод Ridge может помочь в задачах с большим количеством признаков, где OLS может быть неустойчив.

Ограничения Ridge-регрессии:

Не устраняет признаки: Ridge-регрессия не обнуляет коэффициенты признаков, она только уменьшает их, поэтому некоторые признаки всё равно могут оставаться в модели.
Выбор оптимального α: Выбор правильного значения для гиперпараметра α может потребовать кросс-валидации.

Ridge-регрессия - это метод регрессии, который добавляет L2-регуляризацию к стандартной линейной регрессии, с целью уменьшения переобучения, увеличения устойчивости и решения проблем мультиколлинеарности. Этот метод полезен, когда требуется баланс между точностью и стабильностью модели.

2.1.14.1. Код создания модели Ridge и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.Ridge, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# Ridge.py
# The code demonstrates the process of training Ridge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "Ridge"
onnx_model_filename = data_path + "ridge"

# create a Ridge model
regression_model = Ridge()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  Ridge Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382641178552
Python  Mean Absolute Error: 6.347684462929819
Python  Mean Squared Error: 49.77814206996523
Python  
Python  Ridge ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382634837793
Python  Mean Absolute Error: 6.347684915729416
Python  Mean Squared Error: 49.77815046053819
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  Ridge ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641178552
Python  Mean Absolute Error: 6.347684462929819
Python  Mean Squared Error: 49.77814206996523
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.49. Результат работы скрипта Ridge.py (float ONNX)

2.1.14.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели ridge_float.onnx и ridge_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                        Ridge.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "Ridge"
#define   ONNXFilenameFloat  "ridge_float.onnx"
#define   ONNXFilenameDouble "ridge_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

Ridge (EURUSD,H1)       Testing ONNX float: Ridge (ridge_float.onnx)
Ridge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382634837793
Ridge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3476849157294160
Ridge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781504605381784
Ridge (EURUSD,H1)       
Ridge (EURUSD,H1)       Testing ONNX double: Ridge (ridge_double.onnx)
Ridge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382641178552
Ridge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3476844629298235
Ridge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781420699652131

Сравнение с оригинальной моделью:

Testing ONNX float: Ridge (ridge_float.onnx)
Python  Mean Absolute Error: 6.347684462929819
MQL5:   Mean Absolute Error: 6.3476849157294160

       
Testing ONNX double: Ridge (ridge_double.onnx)
Python  Mean Absolute Error: 6.347684462929819
MQL5:   Mean Absolute Error: 6.3476844629298235

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.14.3. ONNX-представление моделей ridge_float.onnx и ridge_double.onnx

Рис.50. ONNX-представление модели ridge_float.onnx в Netron

Рис.51. ONNX-представление модели ridge_double.onnx в Netron

2.1.15. sklearn.linear_model.RidgeCV

RidgeCV - это расширение Ridge-регрессии, которое включает в себя автоматический выбор наилучшего значения гиперпараметра α (альфа), который определяет степень регуляризации в Ridge-регрессии. Гиперпараметр α контролирует баланс между минимизацией суммы квадратов ошибок (как в обычной линейной регрессии) и минимизацией значения коэффициентов регрессии (регуляризацией). RidgeCV автоматически подбирает оптимальное значение α на основе заданных параметров и критериев. Вот ключевые аспекты RidgeCV:

Принцип работы RidgeCV:

Входные данные: RidgeCV принимает на вход данные, состоящие из признаков (независимых переменных) и целевой переменной (непрерывной).
Выбор α: Ridge-регрессия требует выбора гиперпараметра α, который определяет степень регуляризации. RidgeCV автоматически подбирает оптимальное значение α из заданного диапазона.
Кросс-валидация: RidgeCV использует кросс-валидацию, например, k-кратную перекрестную проверку (k-fold cross-validation), чтобы оценить, какое значение α обеспечивает наилучшее обобщение модели на независимых данных.
Оптимальное α: По завершении процесса обучения RidgeCV выбирает значение α, которое дает наилучшую производительность на кросс-валидации, и использует это значение для обучения окончательной Ridge-регрессионной модели.

Преимущества RidgeCV:

Автоматический выбор α: RidgeCV позволяет автоматически подобрать оптимальное значение гиперпараметра α, что упрощает процесс настройки модели.
Баланс между регуляризацией и производительностью: Этот метод помогает найти оптимальный баланс между регуляризацией (снижением переобучения) и производительностью модели.

Ограничения RidgeCV:

Вычислительная сложность: Кросс-валидация может потребовать значительных вычислительных ресурсов, особенно при использовании большого диапазона значений α.

RidgeCV - это метод Ridge-регрессии с автоматическим выбором оптимального значения гиперпараметра α с использованием кросс-валидации. Этот метод упрощает процесс выбора гиперпараметра и позволяет найти наилучший баланс между регуляризацией и производительностью модели.

2.1.15.1. Код создания модели RidgeCV и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.RidgeCV, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# RidgeCV.py
# The code demonstrates the process of training RidgeCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RidgeCV"
onnx_model_filename = data_path + "ridge_cv"

# create a RidgeCV model
regression_model = RidgeCV()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  RidgeCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382499160807
Python  Mean Absolute Error: 6.34720334999352
Python  Mean Squared Error: 49.77832999861571
Python  
Python  RidgeCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382499108485
Python  Mean Absolute Error: 6.3472036427935485
Python  Mean Squared Error: 49.77833006785168
Python  R^2 matching decimal places:  11
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  RidgeCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382499160807
Python  Mean Absolute Error: 6.34720334999352
Python  Mean Squared Error: 49.77832999861571
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  14

Рис.52. Результат работы скрипта RidgeCV.py (float ONNX)

2.1.15.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели ridge_cv_float.onnx и ridge_cv_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                      RidgeCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RidgeCV"
#define   ONNXFilenameFloat  "ridge_cv_float.onnx"
#define   ONNXFilenameDouble "ridge_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

RidgeCV (EURUSD,H1)     Testing ONNX float: RidgeCV (ridge_cv_float.onnx)
RidgeCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382499108485
RidgeCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3472036427935485
RidgeCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7783300678516909
RidgeCV (EURUSD,H1)     
RidgeCV (EURUSD,H1)     Testing ONNX double: RidgeCV (ridge_cv_double.onnx)
RidgeCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382499160807
RidgeCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3472033499935216
RidgeCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7783299986157246

Сравнение с оригинальной моделью:

Testing ONNX float: RidgeCV (ridge_cv_float.onnx)
Python  Mean Absolute Error: 6.34720334999352
MQL5:   Mean Absolute Error: 6.3472036427935485

Testing ONNX double: RidgeCV (ridge_cv_double.onnx)
Python  Mean Absolute Error: 6.34720334999352
MQL5:   Mean Absolute Error: 6.3472033499935216

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.15.3. ONNX-представление моделей ridge_cv_float.onnx и ridge_cv_double.onnx

Рис.53. ONNX-представление модели ridge_cv_float.onnx в Netron

Рис.54. ONNX-представление модели ridge_cv_double.onnx в Netron

2.1.16. sklearn.linear_model.OrthogonalMatchingPursuit

OrthogonalMatchingPursuit (OMP) - это алгоритм для решения задачи отбора признаков и решения линейной регрессии.

Он является одним из методов решения проблемы подбора наиболее важных признаков, что может быть полезным для снижения размерности данных и улучшения обобщающей способности модели.

Принцип работы OrthogonalMatchingPursuit:

Входные данные: Начнем с набора данных, который включает признаки (независимые переменные) и значения целевой переменной (непрерывные).
Выбор числа признаков: Одним из первых шагов при использовании OrthogonalMatchingPursuit является определение количества признаков, которые вы хотите включить в модель. Это количество может быть задано заранее или выбрано с использованием критериев, таких как критерий информационного критерия Акаике (AIC) или критерий минимальной ошибки.
Итеративное добавление признаков: Алгоритм начинает с пустой модели и итеративно добавляет признаки, которые наилучшим образом объясняют остатки модели. В каждой итерации выбирается новый признак так, чтобы он был ортогонален к остальным выбранным признакам. Оптимальный признак выбирается на основе корреляции с остатками модели.
Обучение модели: После добавления заданного числа признаков модель обучается на данных с учетом только этих признаков.
Получение прогнозов: После обучения модель может быть использована для предсказания значений целевой переменной на новых данных.

Преимущества OrthogonalMatchingPursuit:

Уменьшение размерности: OMP может быть использован для снижения размерности данных, выбирая только наиболее информативные признаки.
Интерпретируемость: Поскольку OMP выбирает только небольшое количество признаков, модели, созданные с его использованием, могут быть более интерпретируемыми.

Ограничения OrthogonalMatchingPursuit:

Чувствительность к числу выбранных признаков: Количество выбранных признаков должно быть правильно настроено, и неправильный выбор может привести к переобучению или недообучению.
Неучет мультиколлинеарности: OMP может не учитывать мультиколлинеарность между признаками, что может повлиять на выбор оптимальных признаков.
Вычислительная сложность: OMP является вычислительно затратным, особенно для больших наборов данных.

OrthogonalMatchingPursuit - это алгоритм для решения задачи отбора признаков и линейной регрессии, который позволяет выбирать наиболее информативные признаки для модели. Этот метод может быть полезным для снижения размерности данных и улучшения интерпретируемости модели.

2.1.16.1. Код создания модели OrthogonalMatchingPursuit и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.OrthogonalMatchingPursuit, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# OrthogonalMatchingPursuit.py
# The code demonstrates the process of training OrthogonalMatchingPursuit model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import OrthogonalMatchingPursuit
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "OrthogonalMatchingPursuit"
onnx_model_filename = data_path + "orthogonal_matching_pursuit"

# create an OrthogonalMatchingPursuit model
regression_model = OrthogonalMatchingPursuit()

# fit the model to the data
regression_model.fit(X, y)

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  OrthogonalMatchingPursuit Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281784
Python  
Python  OrthogonalMatchingPursuit ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\orthogonal_matching_pursuit_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  OrthogonalMatchingPursuit ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\orthogonal_matching_pursuit_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281784
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  16

Рис.55. Результат работы скрипта OrthogonalMatchingPursuit.py (float ONNX)

2.1.16.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели orthogonal_matching_pursuit_float.onnx и orthogonal_matching_pursuit_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                    OrthogonalMatchingPursuit.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "OrthogonalMatchingPursuit"
#define   ONNXFilenameFloat  "orthogonal_matching_pursuit_float.onnx"
#define   ONNXFilenameDouble "orthogonal_matching_pursuit_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

OrthogonalMatchingPursuit (EURUSD,H1)   Testing ONNX float: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_float.onnx)
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3477377671679385
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781414740478638
OrthogonalMatchingPursuit (EURUSD,H1)   
OrthogonalMatchingPursuit (EURUSD,H1)   Testing ONNX double: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_double.onnx)
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3477379263364275
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781401712817768

Сравнение с оригинальной моделью:

Testing ONNX float: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_float.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477377671679385
        
Testing ONNX double: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_double.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477379263364275

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 16 знаков после запятой.

2.1.16.3. ONNX-представление моделей orthogonal_matching_pursuit_float.onnx и orthogonal_matching_pursuit_double.onnx

Рис.56. ONNX-представление модели orthogonal_matching_pursuit_float.onnx в Netron

Рис.57. ONNX-представление модели orthogonal_matching_pursuit_double.onnx в Netron

2.1.17. sklearn.linear_model.PassiveAggressiveRegressor

PassiveAggressiveRegressor - это метод машинного обучения, который используется для решения задачи регрессии.

Этот метод представляет собой вариант алгоритма Passive-Aggressive (PA), который может использоваться для обучения модели, способной предсказывать непрерывные значения целевой переменной.

Принцип работы PassiveAggressiveRegressor:

Входные данные: Начнем с набора данных, который включает в себя признаки (независимые переменные) и значения целевой переменной (непрерывные).
Обучение с учителем: PassiveAggressiveRegressor - это метод обучения с учителем, который обучается на парах (X, y), где X - признаки, y - соответствующие значения целевой переменной.
Адаптивное обучение: Основная идея метода Passive-Aggressive заключается в том, что модель обучается адаптивно, минимизируя ошибку прогноза на каждом обучающем примере. Модель обновляется путем коррекции весов, чтобы уменьшить ошибку предсказания.
Параметр C: PassiveAggressiveRegressor имеет гиперпараметр C, который контролирует то, насколько сильно модель адаптируется к ошибкам. Большее значение C означает более агрессивное обновление весов, в то время как меньшее значение C делает модель менее агрессивной.
Предсказание: После обучения модель может быть использована для предсказания значений целевой переменной на новых данных.

Преимущества PassiveAggressiveRegressor:

Адаптивность: Метод способен адаптироваться к изменениям данных и обновлять модель, чтобы минимизировать ошибку предсказания.
Эффективность для больших данных: PassiveAggressiveRegressor может быть эффективным методом для регрессии, особенно при обучении на больших объемах данных.

Ограничения PassiveAggressiveRegressor:

Чувствительность к выбору параметра C: Корректный выбор значения C может потребовать настройки и экспериментов.
Могут потребоваться дополнительные признаки: В некоторых случаях для успешного обучения модели могут потребоваться дополнительные инженерные признаки.

PassiveAggressiveRegressor - это метод машинного обучения для задачи регрессии, который обучается адаптивно, минимизируя ошибку предсказания на обучающих данных. Этот метод может быть полезным для обработки больших объемов данных и требует настройки параметра C для достижения оптимальной производительности.

2.1.17.1. Код создания модели PassiveAggressiveRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.PassiveAggressiveRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# PassiveAggressiveRegressor.py
# The code demonstrates the process of training PassiveAggressiveRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import PassiveAggressiveRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PassiveAggressiveRegressor"
onnx_model_filename = data_path + "passive_aggressive_regressor"

# create a PassiveAggressiveRegressor model
regression_model = PassiveAggressiveRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  PassiveAggressiveRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9894376841493092
Python  Mean Absolute Error: 9.64524669506544
Python  Mean Squared Error: 139.76857373191007
Python  
Python  PassiveAggressiveRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\passive_aggressive_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9894376801868329
Python  Mean Absolute Error: 9.645248834431873
Python  Mean Squared Error: 139.76862616640122
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  5
Python  
Python  PassiveAggressiveRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\passive_aggressive_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9894376841493092
Python  Mean Absolute Error: 9.64524669506544
Python  Mean Squared Error: 139.76857373191007
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  14

Рис.58. Результат работы скрипта PassiveAggressiveRegressor.py (double ONNX)

2.1.17.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели passive_aggressive_regressor_float.onnx и passive_aggressive_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                   PassiveAggressiveRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "PassiveAggressiveRegressor"
#define   ONNXFilenameFloat  "passive_aggressive_regressor_float.onnx"
#define   ONNXFilenameDouble "passive_aggressive_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

PassiveAggressiveRegressor (EURUSD,H1)  Testing ONNX float: PassiveAggressiveRegressor (passive_aggressive_regressor_float.onnx)
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9894376801868329
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Absolute Error: 9.6452488344318716
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Squared Error: 139.7686261664012761
PassiveAggressiveRegressor (EURUSD,H1)  
PassiveAggressiveRegressor (EURUSD,H1)  Testing ONNX double: PassiveAggressiveRegressor (passive_aggressive_regressor_double.onnx)
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9894376841493092
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Absolute Error: 9.6452466950654419
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Squared Error: 139.7685737319100667

Сравнение с оригинальной моделью:

Testing ONNX float: PassiveAggressiveRegressor (passive_aggressive_regressor_float.onnx)
Python  Mean Absolute Error: 9.64524669506544
MQL5:   Mean Absolute Error: 9.6452488344318716
        
Testing ONNX double: PassiveAggressiveRegressor (passive_aggressive_regressor_double.onnx)
Python  Mean Absolute Error: 9.64524669506544
MQL5:   Mean Absolute Error: 9.6452466950654419

Точность MAE ONNX float: 5 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.17.3. ONNX-представление моделей passive_aggressive_regressor_float.onnx и passive_aggressive_regressor_double.onnx

Рис.59. ONNX-представление модели passive_aggressive_regressor_float.onnx в Netron

Рис.60. ONNX-представление модели passive_aggressive_regressor_double.onnx в Netron

2.1.18. sklearn.linear_model.QuantileRegressor

QuantileRegressor - это метод машинного обучения, который используется для оценки квантилей (конкретных процентилей) целевой переменной в задачах регрессии.

Вместо предсказания среднего значения целевой переменной, как это делается в типичных задачах регрессии, QuantileRegressor предсказывает значения, соответствующие заданным квантилям, таким как медиана (50-й процентиль) или 25-й и 75-й процентили.

Принцип работы QuantileRegressor:

Входные данные: Начнем с набора данных, который включает в себя признаки (независимые переменные) и целевую переменную (непрерывную).
Цель - квантили: Вместо предсказания точечных значений целевой переменной, QuantileRegressor моделирует условное распределение целевой переменной и предсказывает значения для определенных квантилей этого распределения.
Обучение для разных квантилей: Обучение модели QuantileRegressor включает в себя обучение отдельных моделей для каждого желаемого квантиля. Каждая из этих моделей предсказывает значение, соответствующее своему квантилю.
Параметр квантиля: Основным параметром для этого метода является выбор желаемых квантилей, для которых вы хотите получить предсказания. Например, если вам нужны предсказания медианы, вам потребуется обучить модель на 50-й процентиль.
Предсказание квантилей: После обучения модель может быть использована для предсказания значений, соответствующих заданным квантилям, на новых данных.

Преимущества QuantileRegressor:

Гибкость: QuantileRegressor предоставляет гибкость в предсказании различных квантилей, что может быть полезно в задачах, где важны разные процентили распределения.
Устойчивость к выбросам: Подход, ориентированный на квантили, может быть устойчив к выбросам, так как не принимает в расчет среднее, которое может быть сильно повлияно экстремальными значениями.

Ограничения QuantileRegressor:

Потребность в выборе квантилей: Выбор оптимальных квантилей может потребовать некоторого знания о задаче.
Увеличение вычислительной сложности: Обучение отдельных моделей для разных квантилей может увеличить вычислительную сложность задачи.

QuantileRegressor - это метод машинного обучения, который предназначен для предсказания значений, соответствующих заданным квантилям целевой переменной. Этот метод может быть полезным в задачах, где интересуют различные процентили распределения, а также в задачах, где данные могут содержать выбросы.

2.1.18.1. Код создания модели QuantileRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.QuantileRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# QuantileRegressor.py
# The code demonstrates the process of training QuantileRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import QuantileRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "QuantileRegressor"
onnx_model_filename = data_path + "quantile_regressor"

# create a QuantileRegressor model
regression_model = QuantileRegressor(solver='highs')

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  QuantileRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9959915738839231
Python  Mean Absolute Error: 6.3693091850025185
Python  Mean Squared Error: 53.0425343337143
Python  
Python  QuantileRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\quantile_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9959915739158818
Python  Mean Absolute Error: 6.3693091422201125
Python  Mean Squared Error: 53.042533910812814
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  7
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  7
Python  
Python  QuantileRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\quantile_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9959915738839231
Python  Mean Absolute Error: 6.3693091850025185
Python  Mean Squared Error: 53.0425343337143
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  13
Python  double ONNX model precision:  16

Рис.61. Результат работы скрипта QuantileRegressor.py (float ONNX)

2.1.18.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели quantile_regressor_float.onnx и quantile_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                            QuantileRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "QuantileRegressor"
#define   ONNXFilenameFloat  "quantile_regressor_float.onnx"
#define   ONNXFilenameDouble "quantile_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

QuantileRegressor (EURUSD,H1)   Testing ONNX float: QuantileRegressor (quantile_regressor_float.onnx)
QuantileRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9959915739158818
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3693091422201169
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 53.0425339108128071
QuantileRegressor (EURUSD,H1)   
QuantileRegressor (EURUSD,H1)   Testing ONNX double: QuantileRegressor (quantile_regressor_double.onnx)
QuantileRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9959915738839231
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3693091850025185
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 53.0425343337142721

Сравнение с оригинальной моделью:

Testing ONNX float: QuantileRegressor (quantile_regressor_float.onnx)
Python  Mean Absolute Error: 6.3693091850025185
MQL5:   Mean Absolute Error: 6.3693091422201169

Testing ONNX double: QuantileRegressor (quantile_regressor_double.onnx)
Python  Mean Absolute Error: 6.3693091850025185
MQL5:   Mean Absolute Error: 6.3693091850025185

Точность MAE ONNX float: 7 знаков после запятой, точность MAE ONNX double 16 знаков после запятой.

2.1.18.3. ONNX-представление моделей quantile_regressor_float.onnx и quantile_regressor_double.onnx

Рис.62. ONNX-представление модели quantile_regressor_float.onnx в Netron

Рис.63. ONNX-представление модели quantile_regressor_double.onnx в Netron

2.1.19. sklearn.linear_model.RANSACRegressor

RANSACRegressor - это метод машинного обучения, который применяется для решения задачи регрессии с использованием метода RANSAC (Random Sample Consensus).

Метод RANSAC предназначен для обработки данных, которые содержат выбросы или несовершенства, и позволяет создать более устойчивую регрессионную модель путем исключения влияния выбросов.

Принцип работы RANSACRegressor:

Входные данные: Начнем с набора данных, который включает в себя признаки (независимые переменные) и целевую переменную (непрерывную).
Выбор случайных подмножеств: Метод RANSAC начинает работу с выбора случайных подмножеств данных, которые используются для обучения регрессионной модели. Эти подмножества называются "гипотезами".
Подгонка модели к гипотезам: Для каждой выбранной гипотезы обучается регрессионная модель. В случае RANSACRegressor, обычно используется линейная регрессия, и модель подгоняется к подмножеству данных.
Оценка выбросов: После обучения модели оценивается, насколько хорошо она соответствует всем данным. Для каждой точки данных вычисляется ошибка между предсказанным значением и фактическим значением.
Идентификация выбросов: Точки данных, которые имеют ошибку, превышающую некоторый заданный порог, считаются выбросами. Эти выбросы могут влиять на обучение модели и искажать результаты.
Обновление модели: Все точки данных, которые не считаются выбросами, используются для обновления регрессионной модели. Этот процесс может повторяться несколько раз с различными случайными гипотезами.
Финальная модель: После нескольких итераций, RANSACRegressor выбирает наилучшую модель, которая была обучена на подмножестве данных, и возвращает ее в качестве финальной регрессионной модели.

Преимущества RANSACRegressor:

Устойчивость к выбросам: RANSACRegressor является устойчивым к выбросам методом, так как выбросы исключаются из обучения.
Робастная регрессия: Этот метод позволяет создать более надежную регрессионную модель в случае, когда данные содержат выбросы или несовершенства.

Ограничения RANSACRegressor:

Чувствительность к порогу ошибки: Выбор порога ошибки, который определяет, какие точки считаются выбросами, может потребовать экспериментов.
Сложность выбора гипотез: Выбор хороших гипотез на начальном этапе может быть не тривиальной задачей.

RANSACRegressor - это метод машинного обучения, который используется для решения задачи регрессии, основанный на методе RANSAC. Этот метод позволяет создать более устойчивую регрессионную модель в случае, когда данные содержат выбросы или несовершенства, путем исключения их влияния на модель.

2.1.19.1. Код создания модели RANSACRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.RANSACRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# RANSACRegressor.py
# The code demonstrates the process of training RANSACRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RANSACRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RANSACRegressor"
onnx_model_filename = data_path + "ransac_regressor"

# create a RANSACRegressor model
regression_model = RANSACRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("ONNX: MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  RANSACRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  
Python  RANSACRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ransac_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  RANSACRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ransac_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.64. Результат работы скрипта RANSACRegressor.py (float ONNX)

2.1.19.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели ransac_regressor_float.onnx и ransac_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                              RANSACRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RANSACRegressor"
#define   ONNXFilenameFloat  "ransac_regressor_float.onnx"
#define   ONNXFilenameDouble "ransac_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

RANSACRegressor (EURUSD,H1)     Testing ONNX float: RANSACRegressor (ransac_regressor_float.onnx)
RANSACRegressor (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3477377671679385
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7781414740478638
RANSACRegressor (EURUSD,H1)     
RANSACRegressor (EURUSD,H1)     Testing ONNX double: RANSACRegressor (ransac_regressor_double.onnx)
RANSACRegressor (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3477379263364266
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7781401712817768

Сравнение с оригинальной моделью:

Testing ONNX float: RANSACRegressor (ransac_regressor_float.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477377671679385
     
Testing ONNX double: RANSACRegressor (ransac_regressor_double.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477379263364266

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.19.3. ONNX-представление моделей ransac_regressor_float.onnx и ransac_regressor_double.onnx

Рис.65. ONNX-представление модели ransac_regressor_float.onnx в Netron

Рис.66. ONNX-представление модели ransac_regressor_double.onnx в Netron

2.1.20. sklearn.linear_model.TheilSenRegressor

Theil-Sen регрессия (Theil-Sen estimator) - это метод оценки регрессии, который используется для приближенного поиска линейной зависимости между независимыми переменными и целевой переменной.

Он предлагает более устойчивую оценку, чем обычная линейная регрессия в случае наличия выбросов и шума в данных.

Принцип работы Theil-Sen регрессии:

Выбор точек: Для начала, метод Theil-Sen выбирает случайные пары точек данных из набора обучающих данных.
Расчет наклона (наклонов): Для каждой пары точек данных, метод вычисляет наклон линии, проходящей через эти точки. Это создает множество наклонов.
Медианный наклон: Затем, метод находит медианный наклон из множества наклонов. Этот медианный наклон используется в качестве оценки наклона линейной регрессии.
Медиана отклонений: Для каждой точки данных, метод вычисляет отклонение (разницу между фактическим значением и предсказанным значением на основе медианного наклона) и находит медиану этих отклонений. Это создает оценку для коэффициента сдвига (пересечения) линейной регрессии.
Окончательная оценка: Итоговые оценки наклона и коэффициента сдвига используются для построения линейной регрессионной модели.

Преимущества Theil-Sen регрессии:

Устойчивость к выбросам: Theil-Sen регрессия более устойчива к выбросам и шуму в данных по сравнению с обычной линейной регрессией.
Нестрогие предположения: Метод не требует строгих предположений о распределении данных и форме зависимости, что делает его более универсальным.
Подходит для данных с мультиколлинеарностью: Theil-Sen регрессия хорошо справляется с данными, где независимые переменные сильно коррелированы (проблема мультиколлинеарности).

Ограничения Theil-Sen регрессии:

Вычислительная сложность: Вычисление медианных наклонов для всех пар точек данных может потребовать времени, особенно для больших наборов данных.
Оценка коэффициента сдвига: Медианные отклонения используются для оценки коэффициента сдвига, что может привести к смещению при наличии выбросов.

Theil-Sen регрессия - это метод оценки регрессии, который предоставляет устойчивую оценку линейной зависимости между независимыми переменными и целевой переменной, особенно в случае наличия выбросов и шума в данных. Этот метод полезен, когда требуется устойчивая оценка в условиях реальных данных.

2.1.20.1. Код создания модели TheilSenRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.TheilSenRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# TheilSenRegressor.py
# The code demonstrates the process of training TheilSenRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import TheilSenRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "TheilSenRegressor"
onnx_model_filename = data_path + "theil_sen_regressor"

# create a TheilSen Regressor model
regression_model = TheilSenRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  TheilSenRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9962329196940459
Python  Mean Absolute Error: 6.338686004537594
Python  Mean Squared Error: 49.84886353898735
Python  
Python  TheilSenRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\theil_sen_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.996232919516505
Python  Mean Absolute Error: 6.338686370832071
Python  Mean Squared Error: 49.84886588834327
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  TheilSenRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\theil_sen_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962329196940459
Python  Mean Absolute Error: 6.338686004537594
Python  Mean Squared Error: 49.84886353898735
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.67. Результат работы скрипта TheilSenRegressor.py (float ONNX)

2.1.20.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели theil_sen_regressor_float.onnx и theil_sen_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                            TheilSenRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "TheilSenRegressor"
#define   ONNXFilenameFloat  "theil_sen_regressor_float.onnx"
#define   ONNXFilenameDouble "theil_sen_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

TheilSenRegressor (EURUSD,H1)   Testing ONNX float: TheilSenRegressor (theil_sen_regressor_float.onnx)
TheilSenRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962329195165051
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3386863708320735
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 49.8488658883432691
TheilSenRegressor (EURUSD,H1)   
TheilSenRegressor (EURUSD,H1)   Testing ONNX double: TheilSenRegressor (theil_sen_regressor_double.onnx)
TheilSenRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962329196940459
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3386860045375943
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 49.8488635389873735

Сравнение с оригинальной моделью:

Testing ONNX float: TheilSenRegressor (theil_sen_regressor_float.onnx)
Python  Mean Absolute Error: 6.338686004537594
MQL5:   Mean Absolute Error: 6.3386863708320735
        
Testing ONNX double: TheilSenRegressor (theil_sen_regressor_double.onnx)
Python  Mean Absolute Error: 6.338686004537594
MQL5:   Mean Absolute Error: 6.3386860045375943

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 15 знаков после запятой.

2.1.20.3. ONNX-представление моделей theil_sen_regressor_float.onnx и theil_sen_regressor_double.onnx

Рис.68. ONNX-представление модели theil_sen_regressor_float.onnx в Netron

Рис.69. ONNX-представление модели theil_sen_regressor_double.onnx в Netron

2.1.21. sklearn.linear_model.LinearSVR

LinearSVR (Linear Support Vector Regression) - это модель машинного обучения для задачи регрессии, основанная на методе опорных векторов (Support Vector Machines, SVM).

Этот метод применяется для нахождения линейных отношений между признаками и целевой переменной с использованием линейного ядра.

Принцип работы LinearSVR:

Входные данные: LinearSVR начинает с набора данных, включающего признаки (независимые переменные) и соответствующие значения целевой переменной.
Выбор линейной модели: Модель предполагает, что существует линейная зависимость между признаками и целевой переменной, описываемая уравнением линейной регрессии.
Обучение модели: LinearSVR находит оптимальные значения коэффициентов модели, минимизируя функцию потерь, которая учитывает ошибку предсказания и допустимую ошибку (эпсилон).
Получение прогнозов: После обучения, модель может быть использована для предсказания значений целевой переменной для новых данных на основе найденных коэффициентов.

Преимущества LinearSVR:

Регрессия с поддержкой векторов: LinearSVR использует метод опорных векторов, который позволяет находить оптимальное разделение между данными с учетом допустимой ошибки.
Поддержка множества признаков: Модель способна работать с множеством признаков и обрабатывать данные в высоких размерностях.
Регуляризация: LinearSVR включает в себя регуляризацию, что помогает бороться с переобучением и обеспечивает более стабильные прогнозы.

Ограничения LinearSVR:

Линейность: LinearSVR ограничен использованием линейных отношений между признаками и целевой переменной. В случае сложных, нелинейных зависимостей, модель может быть недостаточно гибкой.
Чувствительность к выбросам: Модель может быть чувствительна к выбросам в данных и допустимой ошибке (эпсилон).
Неспособность улавливать сложные зависимости: LinearSVR, как и другие линейные модели, не способен улавливать сложные нелинейные зависимости между признаками и целевой переменной.

LinearSVR - это модель машинного обучения для регрессии, которая применяет метод опорных векторов для поиска линейных отношений между признаками и целевой переменной. Она поддерживает регуляризацию и может использоваться в задачах, где важен контроль допустимой ошибки. Однако модель ограничена линейной зависимостью и может быть чувствительна к выбросам.

2.1.21.1. Код создания модели LinearSVR и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.LinearSVR, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# LinearSVR.py
# The code demonstrates the process of training LinearSVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import LinearSVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LinearSVR"
onnx_model_filename = data_path + "linear_svr"

# create a Linear SVR model
linear_svr_model = LinearSVR()

# fit the model to the data
linear_svr_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = linear_svr_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(linear_svr_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(linear_svr_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  LinearSVR Original model (double)
Python  R-squared (Coefficient of determination): 0.9944935515149387
Python  Mean Absolute Error: 7.026852359381935
Python  Mean Squared Error: 72.86550241109444
Python  
Python  LinearSVR ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_svr_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9944935580726729
Python  Mean Absolute Error: 7.026849848037511
Python  Mean Squared Error: 72.86541563418206
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  4
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  4
Python  
Python  LinearSVR ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_svr_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9944935515149387
Python  Mean Absolute Error: 7.026852359381935
Python  Mean Squared Error: 72.86550241109444
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Рис.70. Результат работы скрипта LinearSVR.py (float ONNX)

2.1.21.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели linear_svr_float.onnx и linear_svr_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                    LinearSVR.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LinearSVR"
#define   ONNXFilenameFloat  "linear_svr_float.onnx"
#define   ONNXFilenameDouble "linear_svr_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

LinearSVR (EURUSD,H1)   Testing ONNX float: LinearSVR (linear_svr_float.onnx)
LinearSVR (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9944935580726729
LinearSVR (EURUSD,H1)   MQL5:   Mean Absolute Error: 7.0268498480375108
LinearSVR (EURUSD,H1)   MQL5:   Mean Squared Error: 72.8654156341820567
LinearSVR (EURUSD,H1)   
LinearSVR (EURUSD,H1)   Testing ONNX double: LinearSVR (linear_svr_double.onnx)
LinearSVR (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9944935515149387
LinearSVR (EURUSD,H1)   MQL5:   Mean Absolute Error: 7.0268523593819374
LinearSVR (EURUSD,H1)   MQL5:   Mean Squared Error: 72.8655024110944680

Сравнение с оригинальной моделью:

Testing ONNX float: LinearSVR (linear_svr_float.onnx)
Python  Mean Absolute Error: 7.026852359381935
MQL5:   Mean Absolute Error: 7.0268498480375108
   
Testing ONNX double: LinearSVR (linear_svr_double.onnx)
Python  Mean Absolute Error: 7.026852359381935
MQL5:   Mean Absolute Error: 7.0268523593819374

Точность MAE ONNX float: 4 знака после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.21.3. ONNX-представление моделей linear_svr_float.onnx и linear_svr_double.onnx

Рис.71. ONNX-представление модели linear_svr_float.onnx в Netron

Рис.72. ONNX-представление модели linear_svr_double.onnx в Netron

2.1.22. sklearn.neural_network.MLPRegressor

MLPRegressor (Multi-Layer Perceptron Regressor) - это модель машинного обучения, которая использует искусственные нейронные сети для задачи регрессии.

Она представляет собой многослойную нейронную сеть, состоящую из нескольких слоев нейронов (включая входной, скрытые и выходной слои), которая обучается предсказывать непрерывные значения целевой переменной.

Принцип работы MLPRegressor:

Входные данные: Начнем с набора данных, который включает признаки (независимые переменные) и соответствующие значения целевой переменной.
Создание многослойной нейронной сети: MLPRegressor использует многослойную нейронную сеть с несколькими скрытыми слоями нейронов. Эти нейроны связаны взвешенными связями и активационными функциями.
Обучение модели: MLPRegressor обучает нейронную сеть, настраивая веса и смещения так, чтобы минимизировать функцию потерь, которая измеряет разницу между прогнозами сети и реальными значениями целевой переменной. Это происходит с использованием алгоритмов обратного распространения ошибки (backpropagation).
Получение прогнозов: После обучения модель может быть использована для предсказания значений целевой переменной для новых данных.

Преимущества MLPRegressor:

Гибкость: Многослойные нейронные сети могут моделировать сложные нелинейные зависимости между признаками и целевой переменной.
Универсальность: MLPRegressor может использоваться для разнообразных задач регрессии, включая задачи временных рядов, аппроксимации функций и многие другие.
Способность к обобщению: Нейронные сети обучаются на данных и могут обобщать зависимости, найденные в обучающих данных, на новые данные.

Ограничения MLPRegressor:

Сложность исходной модели: Большие нейронные сети могут быть вычислительно затратными и требовать большого объема данных для обучения.
Настройка гиперпараметров: Выбор оптимальных гиперпараметров (количество слоев, количество нейронов в каждом слое, скорость обучения и т. д.) может потребовать экспериментов.
Чувствительность к переобучению: Большие нейронные сети могут быть склонны к переобучению, если недостаточно данных или недостаточно регуляризации.

MLPRegressor представляет собой мощную модель машинного обучения, основанную на многослойных нейронных сетях, и может быть использована для решения разнообразных задач регрессии. Эта модель гибкая, но требует тщательной настройки и обучения на больших объемах данных для достижения оптимальных результатов.

2.1.22.1. Код создания модели MLPRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.neural_network.MLPRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# MLPRegressor.py
# The code demonstrates the process of training MLPRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "MLPRegressor"
onnx_model_filename = data_path + "mlp_regressor"

# create an MLP Regressor model
mlp_regressor_model = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', max_iter=1000)

# fit the model to the data
mlp_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = mlp_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(mlp_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(mlp_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  MLPRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9874070836467945
Python  Mean Absolute Error: 10.62249788982753
Python  Mean Squared Error: 166.63901957615224
Python  
Python  MLPRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\mlp_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9874070821340352
Python  Mean Absolute Error: 10.62249972216809
Python  Mean Squared Error: 166.63903959413219
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  MLPRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\mlp_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9874070836467945
Python  Mean Absolute Error: 10.622497889827532
Python  Mean Squared Error: 166.63901957615244
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  12
Python  double ONNX model precision:  14

Рис.73. Результат работы скрипта MLPRegressor.py (float ONNX)

2.1.22.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели mlp_regressor_float.onnx и mlp_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                 MLPRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "MLPRegressor"
#define   ONNXFilenameFloat  "mlp_regressor_float.onnx"
#define   ONNXFilenameDouble "mlp_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

MLPRegressor (EURUSD,H1)        Testing ONNX float: MLPRegressor (mlp_regressor_float.onnx)
MLPRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9875198695654352
MLPRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 10.5596681685341309
MLPRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 165.1465507645494597
MLPRegressor (EURUSD,H1)        
MLPRegressor (EURUSD,H1)        Testing ONNX double: MLPRegressor (mlp_regressor_double.onnx)
MLPRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9875198617341387
MLPRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 10.5596715833884609
MLPRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 165.1466543942046599

Сравнение с оригинальной моделью:

Testing ONNX float: MLPRegressor (mlp_regressor_float.onnx)
Python  Mean Absolute Error: 10.62249788982753
MQL5:   Mean Absolute Error: 10.6224997221680901

Testing ONNX double: MLPRegressor (mlp_regressor_double.onnx)
Python  Mean Absolute Error: 10.62249788982753
MQL5:   Mean Absolute Error: 10.6224978898275282

Точность MAE ONNX float: 5 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.22.3. ONNX-представление моделей mlp_regressor_float.onnx и mlp_regressor_double.onnx

Рис.74. ONNX-представление модели mlp_regressor_float.onnx в Netron

Рис.75. ONNX-представление модели mlp_regressor_double.onnx в Netron

2.1.23. sklearn.cross_decomposition.PLSRegression

PLSRegression (Partial Least Squares Regression) - это метод машинного обучения, который используется для решения задачи регрессии.

Он является частью семейства методов PLS и применяется для анализа и моделирования отношений между двумя наборами переменных, где один набор является предикторами, а другой - целевыми переменными.

Принцип работы PLSRegression:

Входные данные: Начнем с двух наборов данных, обозначенных как X и Y. Набор X содержит независимые переменные (предикторы), а набор Y содержит целевые переменные (зависимые).
Подбор линейных комбинаций: PLSRegression находит линейные комбинации (компоненты) в наборах X и Y, которые максимизируют ковариацию между ними. Эти компоненты называются PLS-компонентами.
Поиск максимальной ковариации: Основная цель PLSRegression - найти такие PLS-компоненты, которые максимизируют ковариацию между X и Y. Это позволяет выделить наиболее информативные связи между предикторами и целевыми переменными.
Обучение модели: Как только PLS-компоненты найдены, вы можете использовать их для создания модели, которая может прогнозировать значения Y на основе X.
Получение прогнозов: После обучения модель может быть использована для предсказания значений Y на новых данных, используя соответствующие значения X.

Преимущества PLSRegression:

Изучение корреляции: PLSRegression позволяет анализировать и моделировать корреляции между двумя наборами переменных, что может быть полезно для понимания взаимосвязей между предикторами и целевыми переменными.
Уменьшение размерности: Метод также может использоваться для снижения размерности данных, выделяя наиболее важные PLS-компоненты.

Ограничения PLSRegression:

Чувствительность к выбору числа компонент: Выбор оптимального числа PLS-компонентов может потребовать некоторых экспериментов.
Зависимость от структуры данных: Результаты PLSRegression могут сильно зависеть от структуры данных и корреляций между ними.

PLSRegression - это метод машинного обучения, используемый для анализа и моделирования корреляций между двумя наборами переменных, где один набор служит предикторами, а другой - целевыми переменными. Этот метод позволяет изучать взаимосвязи между данными и может быть полезным для снижения размерности данных и предсказания значений целевых переменных на основе предикторов.

2.1.23.1. Код создания модели PLSRegression и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.cross_decomposition.PLSRegression, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# PLSRegression.py
# The code demonstrates the process of training PLSRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PLSRegression"
onnx_model_filename = data_path + "pls_regression"

# create a PLSRegression model
pls_model = PLSRegression(n_components=1)

# fit the model to the data
pls_model.fit(X, y)

# predict values for the entire dataset
y_pred = pls_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(pls_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(pls_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  PLSRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281805
Python  
Python  PLSRegression ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\pls_regression_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382638567003
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.778145525764096
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  8
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  8
Python  
Python  PLSRegression ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\pls_regression_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281805
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  16

Рис.76. Результат работы скрипта PLSRegression.py (float ONNX)

2.1.23.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели pls_regression_float.onnx и pls_regression_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                PLSRegression.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "PLSRegression"
#define   ONNXFilenameFloat  "pls_regression_float.onnx"
#define   ONNXFilenameDouble "pls_regression_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

PLSRegression (EURUSD,H1)       Testing ONNX float: PLSRegression (pls_regression_float.onnx)
PLSRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382638567003
PLSRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3477379221400145
PLSRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781455257640815
PLSRegression (EURUSD,H1)       
PLSRegression (EURUSD,H1)       Testing ONNX double: PLSRegression (pls_regression_double.onnx)
PLSRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
PLSRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3477379263364275
PLSRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781401712817839

Сравнение с оригинальной моделью:

Testing ONNX float: PLSRegression (pls_regression_float.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477379221400145
       
Testing ONNX double: PLSRegression (pls_regression_double.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477379263364275

Точность MAE ONNX float: 8 знаков после запятой, точность MAE ONNX double 16 знаков после запятой.

2.1.23.3. ONNX-представление моделей pls_regression_float.onnx и pls_regression_double.onnx

Рис.77. ONNX-представление модели pls_regression_float.onnx в Netron

Рис.78. ONNX-представление модели pls_regression_double.onnx в Netron

2.1.24. sklearn.linear_model.TweedieRegressor

TweedieRegressor - это метод регрессии, который предназначен для решения задачи регрессии с использованием распределения Твиди (Tweedie distribution). Распределение Твиди - это вероятностное распределение, которое может описывать широкий спектр данных, включая данные с разной структурой дисперсии. TweedieRegressor применяется в задачах регрессии, где целевая переменная имеет характеристики, которые соответствуют распределению Твиди.

Принцип работы TweedieRegressor:

Целевая переменная и распределение Твиди: TweedieRegressor предполагает, что целевая переменная имеет распределение Твиди. Распределение Твиди зависит от параметра p, который определяет форму распределения и степень дисперсии.
Обучение модели: TweedieRegressor обучает регрессионную модель, которая предсказывает целевую переменную на основе независимых переменных (признаков). Модель максимизирует правдоподобие для данных, соответствующих распределению Твиди.
Выбор параметра p: Выбор параметра p является важным аспектом при использовании TweedieRegressor. Этот параметр определяет форму распределения и степень дисперсии. Разные значения p могут соответствовать различным типам данных, например, p=1 соответствует распределению Пуассона, а p=2 - нормальному распределению.
Преобразование ответов: Иногда модель может требовать преобразования ответов (целевой переменной) перед обучением. Это преобразование связано с параметром p и может включать логарифмирование или другие преобразования, чтобы обеспечить соответствие распределению Твиди.

Преимущества TweedieRegressor:

Способность моделировать данные с разной дисперсией: Распределение Твиди может адаптироваться к данным с разной структурой дисперсии, что полезно в реальных данных, где дисперсия может изменяться.
Разнообразие параметров p: Возможность выбора различных значений параметра p позволяет моделировать разные типы данных.

Ограничения TweedieRegressor:

Сложность выбора параметра p: Выбор правильного значения параметра p может потребовать знаний о данных и экспериментов.
Требуется соответствие распределению Твиди: Для успешного применения TweedieRegressor целевая переменная должна соответствовать распределению Твиди. Несоответствие может привести к плохой производительности модели.

TweedieRegressor - это метод регрессии, который использует распределение Твиди для моделирования данных с разной структурой дисперсии. Этот метод полезен в задачах регрессии, где целевая переменная соответствует распределению Твиди и может быть настроен с различными значениями параметра p для лучшей адаптации к данным.

2.1.24.1. Код создания модели TweedieRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.TweedieRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# TweedieRegressor.py
# The code demonstrates the process of training TweedieRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import TweedieRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "TweedieRegressor"
onnx_model_filename = data_path + "tweedie_regressor"

# create a Tweedie Regressor model
regression_model = TweedieRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

2023.10.31 11:39:36.223 Python  TweedieRegressor Original model (double)
2023.10.31 11:39:36.223 Python  R-squared (Coefficient of determination): 0.9962368328117072
2023.10.31 11:39:36.223 Python  Mean Absolute Error: 6.342397897667562
2023.10.31 11:39:36.223 Python  Mean Squared Error: 49.797082198408745
2023.10.31 11:39:36.223 Python  
2023.10.31 11:39:36.223 Python  TweedieRegressor ONNX model (float)
2023.10.31 11:39:36.223 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\tweedie_regressor_float.onnx
2023.10.31 11:39:36.253 Python  Information about input tensors in ONNX:
2023.10.31 11:39:36.253 Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
2023.10.31 11:39:36.253 Python  Information about output tensors in ONNX:
2023.10.31 11:39:36.253 Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
2023.10.31 11:39:36.253 Python  R-squared (Coefficient of determination) 0.9962368338709323
2023.10.31 11:39:36.253 Python  Mean Absolute Error: 6.342397072978867
2023.10.31 11:39:36.253 Python  Mean Squared Error: 49.797068181938165
2023.10.31 11:39:36.253 Python  R^2 matching decimal places:  8
2023.10.31 11:39:36.253 Python  MAE matching decimal places:  6
2023.10.31 11:39:36.253 Python  MSE matching decimal places:  4
2023.10.31 11:39:36.253 Python  float ONNX model precision:  6
2023.10.31 11:39:36.613 Python  
2023.10.31 11:39:36.613 Python  TweedieRegressor ONNX model (double)
2023.10.31 11:39:36.613 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\tweedie_regressor_double.onnx
2023.10.31 11:39:36.613 Python  Information about input tensors in ONNX:
2023.10.31 11:39:36.613 Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
2023.10.31 11:39:36.613 Python  Information about output tensors in ONNX:
2023.10.31 11:39:36.628 Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
2023.10.31 11:39:36.628 Python  R-squared (Coefficient of determination) 0.9962368328117072
2023.10.31 11:39:36.628 Python  Mean Absolute Error: 6.342397897667562
2023.10.31 11:39:36.628 Python  Mean Squared Error: 49.797082198408745
2023.10.31 11:39:36.628 Python  R^2 matching decimal places:  16
2023.10.31 11:39:36.628 Python  MAE matching decimal places:  15
2023.10.31 11:39:36.628 Python  MSE matching decimal places:  15
2023.10.31 11:39:36.628 Python  double ONNX model precision:  15

Рис.79. Результат работы скрипта TweedieRegressor.py (float ONNX)

2.1.24.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели tweedie_regressor_float.onnx и tweedie_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                             TweedieRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "TweedieRegressor"
#define   ONNXFilenameFloat  "tweedie_regressor_float.onnx"
#define   ONNXFilenameDouble "tweedie_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

2023.10.31 11:42:20.113 TweedieRegressor (EURUSD,H1)    Testing ONNX float: TweedieRegressor (tweedie_regressor_float.onnx)
2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962368338709323
2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3423970729788666
2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7970681819381653
2023.10.31 11:42:20.125 TweedieRegressor (EURUSD,H1)    
2023.10.31 11:42:20.125 TweedieRegressor (EURUSD,H1)    Testing ONNX double: TweedieRegressor (tweedie_regressor_double.onnx)
2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962368328117072
2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3423978976675608
2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7970821984087593

Сравнение с оригинальной моделью:

Testing ONNX float: TweedieRegressor (tweedie_regressor_float.onnx)
Python  Mean Absolute Error: 6.342397897667562
MQL5:   Mean Absolute Error: 6.3423970729788666

Testing ONNX double: TweedieRegressor (tweedie_regressor_double.onnx)
Python  Mean Absolute Error: 6.342397897667562
MQL5:   Mean Absolute Error: 6.3423978976675608

Точность MAE ONNX float: 6 знаков после запятой, точность MAE ONNX double 14 знаков после запятой.

2.1.24.3. ONNX-представление моделей tweedie_regressor_float.onnx и tweedie_regressor_double.onnx

Рис.80. ONNX-представление модели tweedie_regressor_float.onnx в Netron

Рис.81. ONNX-представление модели tweedie_regressor_double.onnx в Netron

2.1.25. sklearn.linear_model.PoissonRegressor

PoissonRegressor - это метод машинного обучения, который применяется для решения задачи регрессии, основанной на распределении Пуассона.

Этот метод подходит, когда зависимая переменная (целевая переменная) является счетной, то есть представляет собой количество событий, произошедших в фиксированный период времени или в фиксированном интервале пространства. PoissonRegressor моделирует связь между предикторами (независимыми переменными) и целевой переменной, предполагая, что эта связь соответствует распределению Пуассона.

Принцип работы PoissonRegressor:

Входные данные: Начнем с набора данных, который включает в себя признаки (независимые переменные) и целевую переменную, представляющую количество событий.
Распределение Пуассона: Метод PoissonRegressor моделирует целевую переменную, предполагая, что она следует распределению Пуассона. Распределение Пуассона подходит для моделирования событий, которые происходят с фиксированной средней интенсивностью в заданном временном интервале или пространственном отрезке.
Обучение модели: PoissonRegressor обучает модель, которая оценивает параметры распределения Пуассона, учитывая предикторы. Модель старается найти наилучшую подгонку для наблюдаемых данных с использованием функции правдоподобия, соответствующей распределению Пуассона.
Предсказание счетных значений: После обучения модель может использоваться для предсказания счетных значений (количество событий) на новых данных, и эти предсказания также следуют распределению Пуассона.

Преимущества PoissonRegressor:

Подходит для счетных данных: PoissonRegressor подходит для задач, в которых целевая переменная представляет собой счетные данные, такие как количество заказов, количество звонков и т. д.
Специфичность распределения: Поскольку модель соответствует распределению Пуассона, она может быть более точной для данных, которые хорошо описываются этим распределением.

Ограничения PoissonRegressor:

Подходит только для счетных данных: PoissonRegressor не подходит для регрессии, где целевая переменная является непрерывной и несчетной.
Зависимость от выбора признаков: Качество модели может сильно зависеть от выбора признаков и их инженерии.

PoissonRegressor - это метод машинного обучения, используемый для решения задачи регрессии, когда целевая переменная представляет собой счетные данные, и она моделируется с использованием распределения Пуассона. Этот метод полезен для задач, связанных с событиями, которые происходят с фиксированной интенсивностью в определенных интервалах времени или пространства.

2.1.25.1. Код создания модели PoissonRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.PoissonRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# PoissonRegressor.py
# The code demonstrates the process of training PoissonRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PoissonRegressor"
onnx_model_filename = data_path + "poisson_regressor"

# create a PoissonRegressor model
regression_model = PoissonRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  PoissonRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9204304782362495
Python  Mean Absolute Error: 27.59790466048524
Python  Mean Squared Error: 1052.9242570153044
Python  
Python  PoissonRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\poisson_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9204305082536851
Python  Mean Absolute Error: 27.59790825165078
Python  Mean Squared Error: 1052.9238598018305
Python  R^2 matching decimal places:  6
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  2
Python  float ONNX model precision:  5
Python  
Python  PoissonRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\poisson_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9204304782362495
Python  Mean Absolute Error: 27.59790466048524
Python  Mean Squared Error: 1052.9242570153044
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  13
Python  double ONNX model precision:  14

Рис.82. Результат работы скрипта PoissonRegressor.py (float ONNX)

2.1.25.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели poisson_regressor_float.onnx и poisson_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                             PoissonRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "PoissonRegressor"
#define   ONNXFilenameFloat  "poisson_regressor_float.onnx"
#define   ONNXFilenameDouble "poisson_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

PoissonRegressor (EURUSD,H1)    Testing ONNX float: PoissonRegressor (poisson_regressor_float.onnx)
PoissonRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9204305082536851
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 27.5979082516507788
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 1052.9238598018305311
PoissonRegressor (EURUSD,H1)    
PoissonRegressor (EURUSD,H1)    Testing ONNX double: PoissonRegressor (poisson_regressor_double.onnx)
PoissonRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9204304782362493
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 27.5979046604852343
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 1052.9242570153051020

Сравнение с оригинальной моделью:

Testing ONNX float: PoissonRegressor (poisson_regressor_float.onnx)
Python  Mean Absolute Error: 27.59790466048524
MQL5:   Mean Absolute Error: 27.5979082516507788
    
Testing ONNX double: PoissonRegressor (poisson_regressor_double.onnx)
Python  Mean Absolute Error: 27.59790466048524
MQL5:   Mean Absolute Error: 27.5979046604852343

Точность MAE ONNX float: 5 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.25.3. ONNX-представление моделей poisson_regressor_float.onnx и poisson_regressor_double.onnx

Рис.83. ONNX-представление модели poisson_regressor_float.onnx в Netron

Рис.84. ONNX-представление модели poisson_regressor_double.onnx в Netron

2.1.26. sklearn.neighbors.RadiusNeighborsRegressor

RadiusNeighborsRegressor - это метод машинного обучения, который используется для задачи регрессии. Он является вариантом метода ближайших соседей (k-NN) и предназначен для прогнозирования значений целевой переменной на основе ближайших соседей в пространстве признаков. Однако вместо фиксированного числа соседей (как в методе k-NN), RadiusNeighborsRegressor использует фиксированный радиус для определения соседей для каждого примера. Вот ключевые аспекты RadiusNeighborsRegressor:

Принцип работы RadiusNeighborsRegressor:

Входные данные: Начнем с набора данных, который включает в себя признаки (независимые переменные) и целевую переменную (непрерывную).
Задание радиуса: RadiusNeighborsRegressor требует задания фиксированного радиуса, который будет использоваться для определения ближайших соседей для каждого примера в пространстве признаков.
Определение соседей: Для каждого примера определяются все точки данных, находящиеся внутри заданного радиуса. Эти точки становятся соседями данного примера.
Взвешенное усреднение: Для прогноза значения целевой переменной для каждого примера используются значения целевой переменной его соседей. Часто это делается с использованием взвешенного усреднения, где веса зависят от расстояния между примерами.
Предсказание: После обучения модель может быть использована для предсказания значений целевой переменной на новых данных, основываясь на ближайших соседях в пространстве признаков.

Преимущества RadiusNeighborsRegressor:

Универсальность: RadiusNeighborsRegressor может использоваться для задач регрессии, особенно в случаях, когда количество соседей может сильно варьироваться в зависимости от радиуса.
Устойчивость к выбросам: Подход, основанный на ближайших соседях, может быть устойчив к выбросам, так как модель учитывает только близкие точки данных.

Ограничения RadiusNeighborsRegressor:

Зависимость от выбора радиуса: Выбор правильного радиуса может потребовать настройки и экспериментов.
Вычислительная сложность: Обработка больших наборов данных может потребовать значительных вычислительных ресурсов.

RadiusNeighborsRegressor - это метод машинного обучения, который используется для задачи регрессии на основе метода ближайших соседей с фиксированным радиусом. Этот метод может быть полезным в ситуациях, где количество соседей может меняться в зависимости от радиуса, и в случаях, когда данные содержат выбросы.

2.1.26.1. Код создания модели RadiusNeighborsRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.neighbors.RadiusNeighborsRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# RadiusNeighborsRegressor.py
# The code demonstrates the process of training RadiusNeighborsRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import RadiusNeighborsRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RadiusNeighborsRegressor"
onnx_model_filename = data_path + "radius_neighbors_regressor"

# create a RadiusNeighborsRegressor model
regression_model = RadiusNeighborsRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  RadiusNeighborsRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9999521132921395
Python  Mean Absolute Error: 0.591458244376554
Python  Mean Squared Error: 0.6336732353950723
Python  
Python  RadiusNeighborsRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\radius_neighbors_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999999971
Python  Mean Absolute Error: 4.393654615473253e-06
Python  Mean Squared Error: 3.829042036424747e-11
Python  R^2 matching decimal places:  4
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  RadiusNeighborsRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\radius_neighbors_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 1.0
Python  Mean Absolute Error: 0.0
Python  Mean Squared Error: 0.0
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  double ONNX model precision:  0

Рис.85. Результат работы скрипта RadiusNeighborsRegressor.py (float ONNX)

2.1.26.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели radius_neighbors_regressor_float.onnx и radius_neighbors_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                     RadiusNeighborsRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RadiusNeighborsRegressor"
#define   ONNXFilenameFloat  "radius_neighbors_regressor_float.onnx"
#define   ONNXFilenameDouble "radius_neighbors_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

RadiusNeighborsRegressor (EURUSD,H1)    Testing ONNX float: RadiusNeighborsRegressor (radius_neighbors_regressor_float.onnx)
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9999999999999971
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000043936546155
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000382904
RadiusNeighborsRegressor (EURUSD,H1)    
RadiusNeighborsRegressor (EURUSD,H1)    Testing ONNX double: RadiusNeighborsRegressor (radius_neighbors_regressor_double.onnx)
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 1.0000000000000000
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000000000000000
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000000000

2.1.26.3. ONNX-представление моделей radius_neighbors_regressor_float.onnx и radius_neighbors_regressor_double.onnx

Рис.86. ONNX-представление модели radius_neighbors_regressor_float.onnx в Netron

Рис.87. ONNX-представление модели radius_neighbors_regressor_double.onnx в Netron

2.1.27. sklearn.neighbors.KNeighborsRegressor

KNeighborsRegressor - это метод машинного обучения, используемый для задачи регрессии.

Он относится к категории алгоритмов на основе ближайших соседей (k-Nearest Neighbors, k-NN), и используется для предсказания числовых значений целевой переменной на основе близости (сходства) между объектами в обучающем наборе данных.

Принцип работы KNeighborsRegressor:

Входные данные: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Выбор числа соседей (k): Необходимо выбрать количество ближайших соседей (k), которые будут учитываться при прогнозировании. Это число является одним из гиперпараметров модели.
Расчет близости: Для новых данных (точек, для которых необходимо сделать предсказания), рассчитывается расстояние или сходство между этими данными и всеми объектами обучающего набора данных.
Выбор k ближайших соседей: Выбираются k объектов из обучающего набора данных, которые наиболее близки к новым данным.
Прогноз: Для задачи регрессии, предсказание значения целевой переменной для новых данных вычисляется как среднее значение целевых переменных k ближайших соседей.

Преимущества KNeighborsRegressor:

Простота в использовании: KNeighborsRegressor - это простой алгоритм, который не требует сложной предварительной обработки данных.
Непараметричность: Метод не предполагает определенной функциональной формы зависимости между признаками и целевой переменной, что позволяет моделировать разнообразные отношения.
Воспроизводимость: Результаты модели KNeighborsRegressor можно воспроизвести, так как предсказания основаны на близости данных.

Ограничения KNeighborsRegressor:

Вычислительная сложность: Подсчет расстояний до всех точек обучающего набора данных может быть вычислительно затратным при большом объеме данных.
Чувствительность к выбору числа соседей: Выбор оптимального значения k требует настройки и может сильно влиять на производительность модели.
Чувствительность к шуму: Метод может быть чувствителен к шуму в данных и выбросам.

KNeighborsRegressor полезен в задачах регрессии, когда важно учитывать окрестность объектов для предсказания целевой переменной. Он может быть особенно полезен в ситуациях, где зависимость между признаками и целевой переменной нелинейна и сложна.

2.1.27.1. Код создания модели KNeighborsRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.neighbors.KNeighborsRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# KNeighborsRegressor.py
# The code demonstrates the process of training KNeighborsRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "KNeighborsRegressor"
onnx_model_filename = data_path + "kneighbors_regressor"

# create a KNeighbors Regressor model
kneighbors_model = KNeighborsRegressor(n_neighbors=5)

# fit the model to the data
kneighbors_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = kneighbors_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(kneighbors_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(kneighbors_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  KNeighborsRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9995599863346534
Python  Mean Absolute Error: 1.7414210057117578
Python  Mean Squared Error: 5.822594523532273
Python  
Python  KNeighborsRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\kneighbors_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9995599867417418
Python  Mean Absolute Error: 1.7414195457976402
Python  Mean Squared Error: 5.8225891366283875
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  4
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  4
Python  
Python  KNeighborsRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\kneighbors_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9995599863346534
Python  Mean Absolute Error: 1.7414210057117583
Python  Mean Squared Error: 5.822594523532269
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  13
Python  double ONNX model precision:  14

Рис.88. Результат работы скрипта KNeighborsRegressor.py (float ONNX)

2.1.27.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели kneighbors_regressor_float.onnx и kneighbors_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                          KNeighborsRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "KNeighborsRegressor"
#define   ONNXFilenameFloat  "kneighbors_regressor_float.onnx"
#define   ONNXFilenameDouble "kneighbors_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

KNeighborsRegressor (EURUSD,H1) Testing ONNX float: KNeighborsRegressor (kneighbors_regressor_float.onnx)
KNeighborsRegressor (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9995599860116634
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Absolute Error: 1.7414200607817711
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Squared Error: 5.8225987975798184
KNeighborsRegressor (EURUSD,H1) 
KNeighborsRegressor (EURUSD,H1) Testing ONNX double: KNeighborsRegressor (kneighbors_regressor_double.onnx)
KNeighborsRegressor (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9995599863346534
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Absolute Error: 1.7414210057117601
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Squared Error: 5.8225945235322705

Сравнение с оригинальной моделью:

Testing ONNX float: KNeighborsRegressor (kneighbors_regressor_float.onnx)
Python  Mean Absolute Error: 1.7414210057117578
MQL5:   Mean Absolute Error: 1.7414200607817711
 
Testing ONNX double: KNeighborsRegressor (kneighbors_regressor_double.onnx)
Python  Mean Absolute Error: 1.7414210057117578
MQL5:   Mean Absolute Error: 1.7414210057117601

Точность MAE ONNX float: 5 знаков после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.27.3. ONNX-представление моделей kneighbors_regressor_float.onnx и kneighbors_regressor_double.onnx

Рис.89. ONNX-представление модели kneighbors_regressor_float.onnx в Netron

Рис.90. ONNX-представление модели kneighbors_regressor_double.onnx в Netron

2.1.28. sklearn.gaussian_process.GaussianProcessRegressor

GaussianProcessRegressor (регрессия на основе гауссовских процессов) - это метод машинного обучения, используемый для задачи регрессии, который позволяет моделировать неопределенность в предсказаниях.

Gaussian Process (GP) является мощным инструментом байесовского машинного обучения и используется для моделирования сложных функций и предсказания значений целевой переменной с учетом неопределенности.

Принцип работы GaussianProcessRegressor:

Ввод данных: Начинаем с исходного набора данных, включая признаки (независимые переменные) и соответствующие значения целевой переменной.
Моделирование гауссовского процесса: В методе Gaussian Process используется гауссовский процесс, который представляет собой коллекцию случайных переменных, описываемых вероятностным распределением Гаусса (нормальным распределением). GP моделирует не только средние значения для каждой точки данных, но и ковариацию (или похожесть) между этими точками.
Выбор ковариационной функции: Одним из важных аспектов GP является выбор ковариационной функции (или ядра), которая определяет, какие точки данных взаимосвязаны между собой и насколько сильно. Различные ковариационные функции могут использоваться в зависимости от характера данных и задачи.
Обучение модели: GaussianProcessRegressor обучает GP с использованием обучающих данных. Во время обучения модель настраивает параметры ковариационной функции и оценивает неопределенность в предсказаниях.
Предсказание: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных. Важной особенностью GP является то, что она предсказывает не только среднее значение, но и доверительный интервал, который оценивает степень уверенности в предсказаниях.

Преимущества GaussianProcessRegressor:

Моделирование неопределенности: GP позволяет учитывать неопределенность в предсказаниях, что полезно в задачах, где важно знать, насколько мы уверены в предсказанных значениях.
Гибкость: GP может моделировать разнообразные функции, и его ковариационные функции могут быть настроены для разных типов данных.
Малое количество гиперпараметров: GP имеет относительно небольшое количество гиперпараметров, что упрощает настройку модели.

Ограничения GaussianProcessRegressor:

Вычислительная сложность: GP может быть вычислительно затратным, особенно при большом объеме данных.
Неэффективность в высокоразмерных пространствах: GP может потерять эффективность в задачах с большим числом признаков, из-за проклятия размерности.

GaussianProcessRegressor полезен в задачах регрессии, где важно моделировать неопределенность и обеспечивать надежные предсказания. Этот метод часто используется в байесовском машинном обучении и мета-анализе.

2.1.28.1. Код создания модели GaussianProcessRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.gaussian_process.GaussianProcessRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# GaussianProcessRegressor.py
# The code demonstrates the process of training GaussianProcessRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "GaussianProcessRegressor"
onnx_model_filename = data_path + "gaussian_process_regressor"

# create a GaussianProcessRegressor model
kernel = 1.0 * RBF()
gp_model = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)

# fit the model to the data
gp_model.fit(X, y)

# predict values for the entire dataset
y_pred = gp_model.predict(X, return_std=False)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(gp_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("ONNX: MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(gp_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  GaussianProcessRegressor Original model (double)
Python  R-squared (Coefficient of determination): 1.0
Python  Mean Absolute Error: 3.504041501400934e-13
Python  Mean Squared Error: 1.6396606443650807e-25
Python  
Python  GaussianProcessRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gaussian_process_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: GPmean, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999999936
Python  Mean Absolute Error: 6.454076974495848e-06
Python  Mean Squared Error: 8.493606782250733e-11
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  GaussianProcessRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gaussian_process_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: GPmean, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 1.0
Python  Mean Absolute Error: 3.504041501400934e-13
Python  Mean Squared Error: 1.6396606443650807e-25
Python  R^2 matching decimal places:  1
Python  MAE matching decimal places:  19
Python  MSE matching decimal places:  20
Python  double ONNX model precision:  19

Рис.91. Результат работы скрипта GaussianProcessRegressor.py (float ONNX)

2.1.28.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели gaussian_process_regressor_float.onnx и gaussian_process_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                     GaussianProcessRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "GaussianProcessRegressor"
#define   ONNXFilenameFloat  "gaussian_process_regressor_float.onnx"
#define   ONNXFilenameDouble "gaussian_process_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

GaussianProcessRegressor (EURUSD,H1)    Testing ONNX float: GaussianProcessRegressor (gaussian_process_regressor_float.onnx)
GaussianProcessRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9999999999999936
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000064540769745
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000849361
GaussianProcessRegressor (EURUSD,H1)    
GaussianProcessRegressor (EURUSD,H1)    Testing ONNX double: GaussianProcessRegressor (gaussian_process_regressor_double.onnx)
GaussianProcessRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 1.0000000000000000
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000000000003504
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000000000

2.1.28.3. ONNX-представление моделей gaussian_process_regressor_float.onnx и gaussian_process_regressor_double.onnx

Рис.92. ONNX-представление модели gaussian_process_regressor_float.onnx в Netron

Рис.93. ONNX-представление модели gaussian_process_regressor_double.onnx в Netron

2.1.29. sklearn.linear_model.GammaRegressor

GammaRegressor - это метод машинного обучения, предназначенный для задачи регрессии, в которой целевая переменная следует гамма-распределению.

Гамма-распределение является вероятностным распределением, используемым для моделирования положительных, непрерывных случайных величин. Этот метод позволяет моделировать и предсказывать положительные числовые значения, такие как стоимость, время или доли.

Принцип работы GammaRegressor:

Входные данные: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной, которая следует гамма-распределению.
Выбор функции потерь: GammaRegressor использует функцию потерь, которая соответствует гамма-распределению и учитывает особенности этого распределения. Это позволяет моделировать данные с учетом неотрицательности и правого наклона гамма-распределения.
Обучение модели: Модель обучается на данных с использованием выбранной функции потерь. Во время обучения она настраивает параметры модели таким образом, чтобы минимизировать функцию потерь.
Предсказание: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных.

Преимущества GammaRegressor:

Моделирование положительных значений: Этот метод специально разработан для моделирования положительных числовых значений, что может быть полезным в задачах, где целевая переменная ограничена снизу.
Учет формы гамма-распределения: GammaRegressor учитывает особенности гамма-распределения, что позволяет более точно моделировать данные, следующие этому распределению.
Полезность в эконометрике и медицинских исследованиях: Гамма-распределение часто используется для моделирования стоимости, времени ожидания и других положительных случайных величин в эконометрике и медицинских исследованиях.

Ограничения GammaRegressor:

Ограничение на тип данных: Этот метод подходит только для задач регрессии, в которых целевая переменная следует гамма-распределению или подобным ему распределениям. Для данных, которые не соответствуют такому распределению, этот метод может быть неэффективным.
Требуется выбор функции потерь: Выбор подходящей функции потерь может потребовать знания о распределении целевой переменной и ее особенностях.

GammaRegressor полезен в задачах, где необходимо моделировать и предсказывать положительные числовые значения, соответствующие гамма-распределению.

2.1.29.1. Код создания модели GammaRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.GammaRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# GammaRegressor.py
# The code demonstrates the process of training GammaRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import GammaRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 10+4*X + 10*np.sin(X*0.5)

model_name = "GammaRegressor"
onnx_model_filename = data_path + "gamma_regressor"

# create a Gamma Regressor model
regression_model = GammaRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  GammaRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.7963797339354436
Python  Mean Absolute Error: 37.266200319422815
Python  Mean Squared Error: 2694.457784927322
Python  
Python  GammaRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gamma_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.7963795030042045
Python  Mean Absolute Error: 37.266211754095956
Python  Mean Squared Error: 2694.4608407846144
Python  R^2 matching decimal places:  6
Python  MAE matching decimal places:  4
Python  MSE matching decimal places:  1
Python  float ONNX model precision:  4
Python  
Python  GammaRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gamma_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.7963797339354436
Python  Mean Absolute Error: 37.266200319422815
Python  Mean Squared Error: 2694.457784927322
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  12
Python  double ONNX model precision:  15

Рис.94. Результат работы скрипта GammaRegressor.py (float ONNX)

2.1.29.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели gamma_regressor_float.onnx и gamma_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                               GammaRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "GammaRegressor"
#define   ONNXFilenameFloat  "gamma_regressor_float.onnx"
#define   ONNXFilenameDouble "gamma_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(10+4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

GammaRegressor (EURUSD,H1)      Testing ONNX float: GammaRegressor (gamma_regressor_float.onnx)
GammaRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.7963795030042045
GammaRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 37.2662117540959628
GammaRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 2694.4608407846144473
GammaRegressor (EURUSD,H1)      
GammaRegressor (EURUSD,H1)      Testing ONNX double: GammaRegressor (gamma_regressor_double.onnx)
GammaRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.7963797339354435
GammaRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 37.2662003194228220
GammaRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 2694.4577849273218817

Сравнение с оригинальной моделью:

Testing ONNX float: GammaRegressor (gamma_regressor_float.onnx)
Python  Mean Absolute Error: 37.266200319422815
MQL5:   Mean Absolute Error: 37.2662117540959628
      
Testing ONNX double: GammaRegressor (gamma_regressor_double.onnx)
Python  Mean Absolute Error: 37.266200319422815
MQL5:   Mean Absolute Error: 37.2662003194228220

Точность MAE ONNX float: 4 знака после запятой, точность MAE ONNX double 13 знаков после запятой.

2.1.29.3. ONNX-представление моделей gamma_regressor_float.onnx и gamma_regressor_double.onnx

Рис.95. ONNX-представление модели gamma_regressor_float.onnx в Netron

Рис.96. ONNX-представление модели gamma_regressor_double.onnx в Netron

2.1.30. sklearn.linear_model.SGDRegressor

SGDRegressor представляет собой метод регрессии, который использует стохастический градиентный спуск (Stochastic Gradient Descent, SGD) для обучения регрессионной модели. Он является частью семейства линейных моделей и может использоваться для решения задач регрессии. Важными характеристиками SGDRegressor являются эффективность и способность работать с большими объемами данных.

Принцип работы SGDRegressor:

Линейная регрессия: SGDRegressor как и Ridge и Lasso, стремится найти линейную зависимость между независимыми переменными (признаками) и целевой переменной в задаче регрессии.
Стохастический градиентный спуск: Основой SGDRegressor является стохастический градиентный спуск. Вместо вычисления градиента на всем обучающем наборе данных, он обновляет модель на основе случайно выбранных мини-пакетов (batch) данных. Это позволяет обучать модель эффективно и работать с большими данными.
Регуляризация: SGDRegressor поддерживает L1 и L2 регуляризацию (Lasso и Ridge). Это позволяет контролировать переобучение и улучшать устойчивость модели.
Гиперпараметры: Как и в случае Ridge и Lasso, в SGDRegressor можно настраивать гиперпараметры, такие как параметр регуляризации (α, альфа) и тип регуляризации.

Преимущества SGDRegressor:

Эффективность: SGDRegressor работает хорошо с большими объемами данных и позволяет эффективно обучать модели на больших наборах данных.
Способность к регуляризации: Возможность применения L1 и L2 регуляризации делает этот метод подходящим для задач управления переобучением.
Адаптивный градиентный спуск: Стохастический градиентный спуск позволяет адаптироваться к изменяющимся данным и обучать модель на лету.

Ограничения SGDRegressor:

Чувствительность к выбору гиперпараметров: Настройка гиперпараметров, таких как скорость обучения и коэффициент регуляризации, может потребовать экспериментов.
Не всегда сходится к глобальному минимуму: Из-за стохастичности градиентного спуска SGDRegressor не всегда сходится к глобальному минимуму функции потерь.

SGDRegressor - это метод регрессии, который использует стохастический градиентный спуск для обучения регрессионной модели. Он эффективен и способен работать с большими данными, а также поддерживает регуляризацию для управления переобучением.

2.1.30.1. Код создания модели SGDRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.SGDRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# SGDRegressor2.py
# The code demonstrates the process of training SGDRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,10,0.1).reshape(-1,1)
y = 4*X + np.sin(X*10)

model_name = "SGDRegressor"
onnx_model_filename = data_path + "sgd_regressor"

# create an SGDRegressor model
regression_model = SGDRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  SGDRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9961197872743282
Python  Mean Absolute Error: 0.6405924406136998
Python  Mean Squared Error: 0.5169867345998348
Python  
Python  SGDRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\sgd_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9961197876338647
Python  Mean Absolute Error: 0.6405924014799271
Python  Mean Squared Error: 0.5169866866963753
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  7
Python  MSE matching decimal places:  6
Python  float ONNX model precision:  7
Python  
Python  SGDRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\sgd_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9961197872743282
Python  Mean Absolute Error: 0.6405924406136998
Python  Mean Squared Error: 0.5169867345998348
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  16
Python  double ONNX model precision:  16

Рис.97. Результат работы скрипта SGDRegressor.py (float ONNX)

2.1.30.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели sgd_regressor_float.onnx и sgd_rgressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                 SGDRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "SGDRegressor"
#define   ONNXFilenameFloat  "sgd_regressor_float.onnx"
#define   ONNXFilenameDouble "sgd_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i*0.1;
      y[i]=(double)(4*x[i] + sin(x[i]*10));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

SGDRegressor (EURUSD,H1)        Testing ONNX float: SGDRegressor (sgd_regressor_float.onnx)
SGDRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9961197876338647
SGDRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 0.6405924014799272
SGDRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 0.5169866866963754
SGDRegressor (EURUSD,H1)        
SGDRegressor (EURUSD,H1)        Testing ONNX double: SGDRegressor (sgd_regressor_double.onnx)
SGDRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9961197872743282
SGDRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 0.6405924406136998
SGDRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 0.5169867345998348

Сравнение с оригинальной моделью:

Testing ONNX float: SGDRegressor (sgd_regressor_float.onnx)
Python  Mean Absolute Error: 0.6405924406136998
MQL5:   Mean Absolute Error: 0.6405924014799272
        
Testing ONNX double: SGDRegressor (sgd_regressor_double.onnx)
Python  Mean Absolute Error: 0.6405924406136998
MQL5:   Mean Absolute Error: 0.6405924406136998

Точность MAE ONNX float: 7 знаков после запятой, точность MAE ONNX double 16 знаков после запятой.

2.1.30.3. ONNX-представление моделей sgd_regressor_float.onnx и sgd_regressor_double.onnx

Рис.98. ONNX-представление модели sgd_regressor_float.onnx в Netron

Рис.99. ONNX-представление модели sgd_rgressor_double.onnx в Netron

2.2. Регрессионные модели библиотеки Scikit-learn, которые конвертируются только во float ONNX-модели

В этом разделе рассматриваются модели, работа с которыми возможна только с точностью float. Конвертация в ONNX с точностью double приводит к ошибкам, связанным с ограничениями ONNX-операторов подмножества ai.onnx.ml.

2.2.1. sklearn.linear_model.AdaBoostRegressor

AdaBoostRegressor - это метод машинного обучения, который используется для регрессии, то есть для прогнозирования числовых значений (например, цены на недвижимость, объемы продаж и т. д.).

Этот метод является вариацией алгоритма AdaBoost (Adaptive Boosting), который был изначально разработан для задач классификации.

Принцип работы AdaBoostRegressor:

Исходный набор данных: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие целевые переменные (зависимые переменные, которые мы пытаемся предсказать).
Инициализация весов: В начале каждая точка данных (наблюдение) имеет равные веса, и модель строится на основе этого взвешенного набора данных.
Обучение слабых учеников: AdaBoostRegressor строит несколько слабых регрессионных моделей (например, решающие деревья), которые пытаются предсказать целевую переменную. Эти модели называются "слабыми учителями". Каждый слабый учитель обучается на данных, учитывая веса каждого наблюдения.
Выбор весов для слабых учителей: AdaBoostRegressor рассчитывает вес для каждого слабого учителя на основе того, насколько хорошо этот учитель справился с предсказаниями. Учителя, которые делают более точные предсказания, получают большие веса, и наоборот.
Обновление весов наблюдений: Веса наблюдений обновляются так, чтобы наблюдения, которые были неправильно предсказаны ранее, получали больший вес, таким образом, увеличивая их важность для следующей модели.
Финальное предсказание: AdaBoostRegressor комбинирует предсказания всех слабых учителей, присваивая им веса в зависимости от их производительности. Это финальное предсказание модели.

Преимущества AdaBoostRegressor:

Способность к адаптации: AdaBoostRegressor способен адаптироваться к сложным функциям и лучше справляется с нелинейными зависимостями.
Снижение переобучения: AdaBoostRegressor использует регуляризацию за счет обновления весов наблюдений, что помогает предотвратить переобучение.
Мощный ансамбль: Комбинируя несколько слабых моделей, AdaBoostRegressor может создавать сильные модели, которые могут достаточно точно предсказывать целевую переменную.

Ограничения AdaBoostRegressor:

Чувствительность к выбросам: AdaBoostRegressor чувствителен к выбросам в данных, что может повлиять на качество предсказаний.
Высокие вычислительные затраты: Построение множества слабых учителей может потребовать больше вычислительных ресурсов и времени.
Не всегда лучший выбор: AdaBoostRegressor не всегда является наилучшим выбором, и в некоторых случаях другие методы регрессии могут работать лучше.

AdaBoostRegressor - это полезный метод машинного обучения, который может быть применен в различных задачах регрессии, особенно в случаях, когда данные содержат сложные зависимости.

2.2.1.1. Код создания модели AdaBoostRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.AdaBoostRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# AdaBoostRegressor.py
# The code demonstrates the process of training AdaBoostRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "AdaBoostRegressor"
onnx_model_filename = data_path + "adaboost_regressor"

# create an AdaBoostRegressor model
regression_model = AdaBoostRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  AdaBoostRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9991257208809748
Python  Mean Absolute Error: 2.3678022748065457
Python  Mean Squared Error: 11.569124350863143
Python  
Python  AdaBoostRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9991257199849699
Python  Mean Absolute Error: 2.36780399225718
Python  Mean Squared Error: 11.569136207480646
Python  R^2 matching decimal places:  7
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  AdaBoostRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_double.onnx

Здесь модель была экспортирована в ONNX-модели для float и double, ONNX-модель float исполнилась успешно, а с моделью double возникли проблемы (ошибки во вкладке Errors):

AdaBoostRegressor.py started    AdaBoostRegressor.py    1       1
Traceback (most recent call last):      AdaBoostRegressor.py    1       1
    onnx_session = ort.InferenceSession(onnx_filename)  AdaBoostRegressor.py    159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_double.onnx failed:Type Error:       onnxruntime_inference_collection.py     424     1
AdaBoostRegressor.py finished in 3207 ms                5       1

Рис.100. Результат работы скрипта AdaBoostRegressor.py (float ONNX)

2.2.1.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели adaboost_regressor_float.onnx и adaboost_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                            AdaBoostRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "AdaBoostRegressor"
#define   ONNXFilenameFloat  "adaboost_regressor_float.onnx"
#define   ONNXFilenameDouble "adaboost_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

AdaBoostRegressor (EURUSD,H1)   
AdaBoostRegressor (EURUSD,H1)   Testing ONNX float: AdaBoostRegressor (adaboost_regressor_float.onnx)
AdaBoostRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9991257199849699
AdaBoostRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 2.3678039922571803
AdaBoostRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 11.5691362074806463
AdaBoostRegressor (EURUSD,H1)   
AdaBoostRegressor (EURUSD,H1)   Testing ONNX double: AdaBoostRegressor (adaboost_regressor_double.onnx)
AdaBoostRegressor (EURUSD,H1)   ONNX: cannot create session (OrtStatus: 1 'Type Error: Type parameter (T) of Optype (Mul) bound to different types (tensor(float) and tensor(double) in node (Mul).'), inspect code 'Scripts\Regression\AdaBoostRegressor.mq5' (133:16)
AdaBoostRegressor (EURUSD,H1)   model_name=AdaBoostRegressor OnnxCreate error 5800

Здесь мы видим, что ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

2.2.1.3. ONNX-представление моделей adaboost_regressor_float.onnx и adaboost_regressor_double.onnx

Рис.101. ONNX-представление модели adaboost_regressor_float.onnx в Netron

Рис.102. ONNX-представление модели adaboost_regressor_double.onnx в Netron

2.2.2. sklearn.linear_model.BaggingRegressor

BaggingRegressor - это метод машинного обучения, который используется для задачи регрессии.

Он представляет собой ансамбльный метод, основанный на идее "бэггинга" (Bootstrap Aggregating), который состоит в построении нескольких базовых регрессионных моделей и комбинировании их предсказаний для получения более стабильного и точного результата.

Принцип работы BaggingRegressor:

Исходный набор данных: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие целевые переменные (зависимые переменные, которые мы пытаемся предсказать).
Генерация подвыборок: BaggingRegressor случайным образом создает несколько подвыборок (выборки с возвращением) из исходных данных. Каждая подвыборка содержит случайный набор наблюдений из исходных данных.
Обучение базовых регрессионных моделей: Для каждой подвыборки BaggingRegressor строит отдельную базовую регрессионную модель (например, дерево решений, случайный лес, регрессионная линейная модель и др.).
Прогнозы базовых моделей: Каждая базовая модель используется для предсказания целевой переменной на основе соответствующей подвыборки.
Усреднение или комбинирование: BaggingRegressor усредняет или комбинирует предсказания всех базовых моделей, чтобы получить окончательное регрессионное предсказание.

Преимущества BaggingRegressor:

Уменьшение дисперсии: BaggingRegressor позволяет уменьшить дисперсию модели, делая ее более устойчивой к разбросу в данных.
Снижение переобучения: Поскольку модель обучается на разных подвыборках данных, BaggingRegressor обычно снижает вероятность переобучения.
Улучшение обобщения: За счет комбинирования предсказаний нескольких моделей BaggingRegressor обычно дает более точные и стабильные прогнозы.
Широкий выбор базовых моделей: BaggingRegressor может использовать различные типы базовых регрессионных моделей, что делает его гибким методом.

Ограничения BaggingRegressor:

Он не всегда способен улучшить производительность в тех случаях, когда базовая модель уже достаточно хорошо работает на данных.
BaggingRegressor может потребовать больше вычислительных ресурсов и времени, чем обучение отдельной модели.

BaggingRegressor - это мощный метод машинного обучения, который может быть полезен в задачах регрессии, особенно при наличии шумовых данных и потребности в улучшении стабильности предсказаний.

2.2.2.1. Код создания модели BaggingRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.BaggingRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# BaggingRegressor.py
# The code demonstrates the process of training BaggingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "BaggingRegressor"
onnx_model_filename = data_path + "bagging_regressor"

# create a Bagging Regressor model
regression_model = BaggingRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  
Python  BaggingRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9998128324923137
Python  Mean Absolute Error: 1.0257279210387649
Python  Mean Squared Error: 2.4767424083953005
Python  
Python  BaggingRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9998128317934672
Python  Mean Absolute Error: 1.0257282792130034
Python  Mean Squared Error: 2.4767516560614187
Python  R^2 matching decimal laces:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  BaggingRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_double.onnx

Ошибки во вкладке Errors:

BaggingRegressor.py started     BaggingRegressor.py     1       1
Traceback (most recent call last):      BaggingRegressor.py     1       1
    onnx_session = ort.InferenceSession(onnx_filename)  BaggingRegressor.py     161     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_double.onnx failed:Type Error: T      onnxruntime_inference_collection.py     424     1
BaggingRegressor.py finished in 3173 ms         5       1

Рис.103. Результат работы скрипта BaggingRegressor.py (float ONNX)

2.2.2.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели bagging_regressor_float.onnx и bagging_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                             BaggingRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "BaggingRegressor"
#define   ONNXFilenameFloat  "bagging_regressor_float.onnx"
#define   ONNXFilenameDouble "bagging_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

BaggingRegressor (EURUSD,H1)    Testing ONNX float: BaggingRegressor (bagging_regressor_float.onnx)
BaggingRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9998128317934672
BaggingRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 1.0257282792130034
BaggingRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 2.4767516560614196
BaggingRegressor (EURUSD,H1)    
BaggingRegressor (EURUSD,H1)    Testing ONNX double: BaggingRegressor (bagging_regressor_double.onnx)
BaggingRegressor (EURUSD,H1)    ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (ReduceMean) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\BaggingRegressor.mq5' (133:16)
BaggingRegressor (EURUSD,H1)    model_name=BaggingRegressor OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

2.2.2.3. ONNX-представление моделей bagging_regressor_float.onnx и bagging_regressor_double.onnx

Рис.104. ONNX-представление модели bagging_regressor_float.onnx в Netron

Рис.105. ONNX-представление модели bagging_regressor_double.onnx в Netron

2.2.3. sklearn.linear_model.DecisionTreeRegressor

DecisionTreeRegressor - это метод машинного обучения, используемый для задачи регрессии, то есть для прогнозирования числовых значений целевой переменной на основе набора признаков (независимых переменных).

Этот метод основан на построении деревьев решений, которые разбивают пространство признаков на интервалы и предсказывают значение целевой переменной для каждого интервала.

Принцип работы DecisionTreeRegressor:

Начало построения: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Выбор признака и разделение: Дерево решений выбирает признак и пороговое значение, которое разделяет данные на две или более подгруппы. Разделение выполняется таким образом, чтобы минимизировать среднеквадратичную ошибку (среднеквадратическое отклонение между предсказанными и фактическими значениями целевой переменной) в каждой подгруппе.
Рекурсивное построение: Процесс выбора признака и разделения повторяется для каждой подгруппы, создавая поддеревья. Этот процесс выполняется рекурсивно до тех пор, пока не выполнены определенные критерии остановки, такие как максимальная глубина дерева или минимальное количество образцов в узле.
Листовые узлы: Когда критерии остановки выполнены, создаются листовые узлы, которые предсказывают числовые значения целевой переменной для образцов, попавших в данный листовой узел.
Предсказание: Для новых данных дерево решений применяется, и новые наблюдения проходят через дерево, пока не достигнут листовой узел, который предсказывает числовое значение целевой переменной.

Преимущества DecisionTreeRegressor:

Простота интерпретации: Деревья решений легко понимать и визуализировать, что делает их полезными для объяснения принятия решений моделью.
Устойчивость к выбросам: Деревья решений могут быть устойчивы к выбросам в данных.
Работа с числовыми и категориальными данными: Деревья решений могут обрабатывать как числовые, так и категориальные признаки без дополнительной предварительной обработки.
Способность к автоматическому отбору признаков: Деревья могут автоматически отбирать важные признаки, игнорируя менее значимые.

Ограничения DecisionTreeRegressor:

Неустойчивость к переобучению: Деревья решений могут быть склонны к переобучению, особенно если они слишком глубокие.
Проблемы с обобщением: Деревья решений могут не обобщать хорошо на данные, которые не входили в обучающий набор.
Не всегда оптимальный выбор: В некоторых задачах другие методы регрессии, такие как линейная регрессия или метод ближайших соседей, могут работать лучше.

DecisionTreeRegressor - это полезный метод для задач регрессии, особенно когда важно понимать логику принятия решений моделью и визуализировать процесс.

2.2.3.1. Код создания модели DecisionTreeRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.linear_model.DecisionTreeRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# DecisionTreeRegressor.py
# The code demonstrates the process of training DecisionTreeRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "DecisionTreeRegressor"
onnx_model_filename = data_path + "decision_tree_regressor"

# create a Decision Tree Regressor model
regression_model = DecisionTreeRegressor()

# fit the model to the data
regression_model.fit(X, y)

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  DecisionTreeRegressor Original model (double)
Python  R-squared (Coefficient of determination): 1.0
Python  Mean Absolute Error: 0.0
Python  Mean Squared Error: 0.0
Python  
Python  DecisionTreeRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999999971
Python  Mean Absolute Error: 4.393654615473253e-06
Python  Mean Squared Error: 3.829042036424747e-11
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  DecisionTreeRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_double.onnx

Ошибки во вкладке Errors:

DecisionTreeRegressor.py started        DecisionTreeRegressor.py        1       1
Traceback (most recent call last):      DecisionTreeRegressor.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  DecisionTreeRegressor.py        160     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_double.onnx failed:Type Er      onnxruntime_inference_collection.py     424     1
DecisionTreeRegressor.py finished in 2957 ms            5       1

Рис.106. Результат работы скрипта DecisionTreeRegressor.py (float ONNX)

2.2.3.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели decision_tree_regressor_float.onnx и decision_tree_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                        DecisionTreeRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "DecisionTreeRegressor"
#define   ONNXFilenameFloat  "decision_tree_regressor_float.onnx"
#define   ONNXFilenameDouble "decision_tree_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

DecisionTreeRegressor (EURUSD,H1)       Testing ONNX float: DecisionTreeRegressor (decision_tree_regressor_float.onnx)
DecisionTreeRegressor (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9999999999999971
DecisionTreeRegressor (EURUSD,H1)       MQL5:   Mean Absolute Error: 0.0000043936546155
DecisionTreeRegressor (EURUSD,H1)       MQL5:   Mean Squared Error: 0.0000000000382904
DecisionTreeRegressor (EURUSD,H1)       
DecisionTreeRegressor (EURUSD,H1)       Testing ONNX double: DecisionTreeRegressor (decision_tree_regressor_double.onnx)
DecisionTreeRegressor (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\DecisionTreeRegressor.mq5' (133:16)
DecisionTreeRegressor (EURUSD,H1)       model_name=DecisionTreeRegressor OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

2.2.3.3. ONNX-представление моделей decision_tree_regressor_float.onnx и decision_tree_regressor_double.onnx

Рис.107. ONNX-представление модели decision_tree_regressor_float.onnx в Netron

Рис.108. ONNX-представление модели decision_tree_regressor_double.onnx в Netron

2.2.4. sklearn.tree.ExtraTreeRegressor

ExtraTreeRegressor, или Extremely Randomized Trees Regressor, представляет собой регрессионный ансамбльный метод, который основан на решающих деревьях.

Этот метод является вариацией случайных лесов (Random Forest) и отличается тем, что вместо выбора наилучшего разделения для каждого узла дерева, он использует случайные разделения для каждого узла. Это делает его более случайным и быстрым, что может быть полезным в некоторых ситуациях.

Принцип работы ExtraTreeRegressor:

Начало построения: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Случайность при разделениях: В отличие от обычных решающих деревьев, где выбирается наилучшее разделение, ExtraTreeRegressor использует случайные пороговые значения для разделения узлов дерева. Это делает процесс разделения более случайным и менее склонным к переобучению.
Строительство дерева: Дерево строится путем разделения узлов на основе случайных признаков и пороговых значений. Этот процесс продолжается до достижения критериев остановки, таких как максимальная глубина дерева или минимальное количество образцов в узле.
Ансамбль деревьев: ExtraTreeRegressor строит несколько таких случайных деревьев, их количество контролируется гиперпараметром "n_estimators".
Предсказание: Для предсказания целевой переменной для новых данных, ExtraTreeRegressor просто усредняет предсказания всех деревьев в ансамбле.

Преимущества ExtraTreeRegressor:

Снижение переобучения: Использование случайных разделений узлов делает метод менее склонным к переобучению, чем обычные решающие деревья.
Высокая параллелизация: Так как деревья строятся независимо друг от друга, ExtraTreeRegressor может быть легко параллелизирован для обучения на многих процессорах.
Быстрое обучение: В сравнении с некоторыми другими методами, такими как градиентный бустинг, ExtraTreeRegressor может обучаться быстрее.

Ограничения ExtraTreeRegressor:

Может быть менее точным: В некоторых случаях, особенно при наличии небольших наборов данных, ExtraTreeRegressor может быть менее точным по сравнению с более сложными методами.
Менее интерпретируем: По сравнению с линейными моделями, решающими деревьями и другими более простыми методами, ExtraTreeRegressor обычно менее интерпретируем.

ExtraTreeRegressor может быть полезным методом для регрессии в ситуациях, где требуется снижение переобучения и быстрое обучение.

2.2.4.1. Код создания модели ExtraTreeRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.tree.ExtraTreeRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# ExtraTreeRegressor.py
# The code demonstrates the process of training ExtraTreeRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import ExtraTreeRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ExtraTreeRegressor"
onnx_model_filename = data_path + "extra_tree_regressor"

# create an ExtraTreeRegressor model
regression_model = ExtraTreeRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

2023.10.30 14:40:57.665 Python  ExtraTreeRegressor Original model (double)
2023.10.30 14:40:57.665 Python  R-squared (Coefficient of determination): 1.0
2023.10.30 14:40:57.665 Python  Mean Absolute Error: 0.0
2023.10.30 14:40:57.665 Python  Mean Squared Error: 0.0
2023.10.30 14:40:57.681 Python  
2023.10.30 14:40:57.681 Python  ExtraTreeRegressor ONNX model (float)
2023.10.30 14:40:57.681 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_float.onnx
2023.10.30 14:40:57.681 Python  Information about input tensors in ONNX:
2023.10.30 14:40:57.681 Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
2023.10.30 14:40:57.681 Python  Information about output tensors in ONNX:
2023.10.30 14:40:57.681 Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
2023.10.30 14:40:57.681 Python  R-squared (Coefficient of determination) 0.9999999999999971
2023.10.30 14:40:57.681 Python  Mean Absolute Error: 4.393654615473253e-06
2023.10.30 14:40:57.681 Python  Mean Squared Error: 3.829042036424747e-11
2023.10.30 14:40:57.681 Python  R^2 matching decimal places:  0
2023.10.30 14:40:57.681 Python  MAE matching decimal places:  0
2023.10.30 14:40:57.681 Python  MSE matching decimal places:  0
2023.10.30 14:40:57.681 Python  float ONNX model precision:  0
2023.10.30 14:40:58.011 Python  
2023.10.30 14:40:58.011 Python  ExtraTreeRegressor ONNX model (double)
2023.10.30 14:40:58.011 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_double.onnx

Ошибки во вкладке Errors:

ExtraTreeRegressor.py started   ExtraTreeRegressor.py   1       1
Traceback (most recent call last):      ExtraTreeRegressor.py   1       1
    onnx_session = ort.InferenceSession(onnx_filename)  ExtraTreeRegressor.py   159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_double.onnx failed:Type Error      onnxruntime_inference_collection.py     424     1
ExtraTreeRegressor.py finished in 2980 ms               5       1

Рис.109. Результат работы скрипта ExtraTreeRegressor.py (float ONNX)

2.2.4.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели extra_tree_regressor_float.onnx и extra_tree_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                           ExtraTreeRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ExtraTreeRegressor"
#define   ONNXFilenameFloat  "extra_tree_regressor_float.onnx"
#define   ONNXFilenameDouble "extra_tree_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

ExtraTreeRegressor (EURUSD,H1)  Testing ONNX float: ExtraTreeRegressor (extra_tree_regressor_float.onnx)
ExtraTreeRegressor (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9999999999999971
ExtraTreeRegressor (EURUSD,H1)  MQL5:   Mean Absolute Error: 0.0000043936546155
ExtraTreeRegressor (EURUSD,H1)  MQL5:   Mean Squared Error: 0.0000000000382904
ExtraTreeRegressor (EURUSD,H1)  
ExtraTreeRegressor (EURUSD,H1)  Testing ONNX double: ExtraTreeRegressor (extra_tree_regressor_double.onnx)
ExtraTreeRegressor (EURUSD,H1)  ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\ExtraTreeRegressor.mq5' (133:16)
ExtraTreeRegressor (EURUSD,H1)  model_name=ExtraTreeRegressor OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

2.2.4.3. ONNX-представление моделей extra_tree_regressor_float.onnx и extra_tree_regressor_double.onnx

Рис.110. ONNX-представление модели extra_tree_regressor_float.onnx в Netron

Рис.111. ONNX-представление модели extra_tree_regressor_double.onnx в Netron

2.2.5. sklearn.ensemble.ExtraTreesRegressor

ExtraTreesRegressor (Extremely Randomized Trees Regressor) - это метод машинного обучения, который представляет собой вариацию случайных лесов (Random Forests) для задачи регрессии.

Этот метод использует ансамбль деревьев решений для предсказания числовых значений целевой переменной на основе набора признаков.

Принцип работы ExtraTreesRegressor:

Начало построения: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Случайность при разделениях: В отличие от обычных решающих деревьев, где для разделения узлов выбирается наилучший разделитель, ExtraTreesRegressor использует случайные пороговые значения для разделения узлов дерева. Это делает процесс разделения более случайным и менее склонным к переобучению.
Строительство деревьев: ExtraTreesRegressor строит несколько деревьев решений в ансамбле. Количество деревьев управляется гиперпараметром "n_estimators". Каждое дерево обучается на случайной подвыборке данных (с повторениями) и случайных подмножествах признаков.
Прогноз: Для предсказания целевой переменной для новых данных ExtraTreesRegressor агрегирует предсказания всех деревьев в ансамбле (обычно путем усреднения).

Преимущества ExtraTreesRegressor:

Снижение переобучения: Использование случайных разделений узлов и подвыборок данных делает метод менее склонным к переобучению, чем обычные решающие деревья.
Высокая параллелизация: Так как деревья строятся независимо друг от друга, ExtraTreesRegressor может быть легко параллелизирован для обучения на многих процессорах.
Устойчивость к выбросам: Метод обычно устойчив к выбросам в данных.
Возможность работы с числовыми и категориальными данными: ExtraTreesRegressor способен обрабатывать как числовые, так и категориальные признаки без дополнительной предварительной обработки.

Ограничения ExtraTreesRegressor:

Может потребовать тонкой настройки гиперпараметров: Хотя ExtraTreesRegressor обычно хорошо работает с параметрами по умолчанию, для достижения максимальной производительности может потребоваться настройка гиперпараметров.
Менее интерпретируем: Как и другие ансамблевые методы, ExtraTreesRegressor менее интерпретируем, чем простые модели, такие как линейная регрессия.

ExtraTreesRegressor может быть полезным методом для регрессии в различных задачах, особенно когда необходимо снизить переобучение и улучшить обобщающую способность модели.

2.2.5.1. Код создания модели ExtraTreesRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.ensemble.ExtraTreesRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# ExtraTreesRegressor.py
# The code demonstrates the process of training ExtraTreesRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ExtraTreesRegressor"
onnx_model_filename = data_path + "extra_trees_regressor"

# create an Extra Trees Regressor model
regression_model = ExtraTreesRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  ExtraTreesRegressor Original model (double)
Python  R-squared (Coefficient of determination): 1.0
Python  Mean Absolute Error: 2.2302160118670144e-13
Python  Mean Squared Error: 8.41048471722451e-26
Python  
Python  ExtraTreesRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999998015
Python  Mean Absolute Error: 3.795239380975701e-05
Python  Mean Squared Error: 2.627067474763585e-09
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  ExtraTreesRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_double.onnx

Ошибки во вкладке Errors:

ExtraTreesRegressor.py started  ExtraTreesRegressor.py  1       1
Traceback (most recent call last):      ExtraTreesRegressor.py  1       1
    onnx_session = ort.InferenceSession(onnx_filename)  ExtraTreesRegressor.py  160     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_double.onnx failed:Type Erro      onnxruntime_inference_collection.py     424     1
ExtraTreesRegressor.py finished in 4654 ms              5       1

Рис.112. Результат работы скрипта ExtraTreesRegressor.py (float ONNX)

2.2.5.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели extra_trees_regressor_float.onnx и extra_trees_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                          ExtraTreesRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ExtraTreesRegressor"
#define   ONNXFilenameFloat  "extra_trees_regressor_float.onnx"
#define   ONNXFilenameDouble "extra_trees_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

ExtraTreesRegressor (EURUSD,H1) Testing ONNX float: ExtraTreesRegressor (extra_trees_regressor_float.onnx)
ExtraTreesRegressor (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9999999999998015
ExtraTreesRegressor (EURUSD,H1) MQL5:   Mean Absolute Error: 0.0000379523938098
ExtraTreesRegressor (EURUSD,H1) MQL5:   Mean Squared Error: 0.0000000026270675
ExtraTreesRegressor (EURUSD,H1) 
ExtraTreesRegressor (EURUSD,H1) Testing ONNX double: ExtraTreesRegressor (extra_trees_regressor_double.onnx)
ExtraTreesRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\ExtraTreesRegressor.mq5' (133:16)
ExtraTreesRegressor (EURUSD,H1) model_name=ExtraTreesRegressor OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

2.2.5.3. ONNX-представление моделей extra_trees_regressor_float.onnx и extra_trees_regressor_double.onnx

Рис.113. ONNX-представление модели extra_trees_regressor_float.onnx в Netron

Рис.114. ONNX-представление модели extra_trees_regressor_double.onnx в Netron

2.2.6. sklearn.svm.NuSVR

NuSVR - это метод машинного обучения, который используется для задачи регрессии. Этот метод основан на Support Vector Machine (SVM), но применяется к задачам регрессии вместо задач классификации.

NuSVR представляет собой вариант SVM, который позволяет решать задачи регрессии, предсказывая непрерывные значения целевой переменной.

Принцип работы NuSVR:

Входные данные: Начнем с набора данных, который включает признаки (независимые переменные) и значения целевой переменной (непрерывные).
Выбор ядра (kernel): NuSVR использует ядра, такие как линейное, полиномиальное или радиально-базисная функция (RBF), чтобы преобразовать данные в более высокоразмерное пространство, где можно найти линейную разделяющую гиперплоскость.
Определение параметра Nu: Параметр Nu (nu) в NuSVR контролирует сложность модели и определяет, как много обучающих примеров будет рассмотрено как выбросы. Значение Nu должно быть в диапазоне от 0 до 1, и оно влияет на количество опорных векторов.
Построение опорных векторов: NuSVR стремится найти оптимальную разделяющую гиперплоскость таким образом, чтобы максимизировать зазор между этой гиперплоскостью и ближайшими точками выборки.
Обучение модели: Модель обучается таким образом, чтобы минимизировать ошибку регрессии и удовлетворить ограничения, связанные с параметром Nu.
Получение прогнозов: После обучения модель может быть использована для предсказания значений целевой переменной на новых данных.

Преимущества NuSVR:

Обработка выбросов: NuSVR позволяет управлять выбросами с помощью параметра Nu, который регулирует количество обучающих примеров, рассматриваемых как выбросы.
Множество ядер: Метод поддерживает различные типы ядер, что позволяет моделировать сложные нелинейные зависимости.

Ограничения NuSVR:

Выбор параметра Nu: Выбор правильного значения для параметра Nu может потребовать некоторых экспериментов.
Чувствительность к масштабу данных: SVM, включая NuSVR, может быть чувствительным к масштабу данных, поэтому может потребоваться стандартизация или нормализация признаков.
Вычислительная сложность: Для больших наборов данных и сложных ядер NuSVR может быть вычислительно затратным.

NuSVR - это метод машинного обучения для задачи регрессии, основанный на методе Support Vector Machine (SVM). Он позволяет предсказывать непрерывные значения целевой переменной и может управлять выбросами с использованием параметра Nu.

2.2.6.1. Код создания модели NuSVR и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.svm.NuSVR, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# NuSVR.py
# The code demonstrates the process of training NuSVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import NuSVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "NuSVR"
onnx_model_filename = data_path + "nu_svr"

# create a NuSVR model
nusvr_model = NuSVR()

# fit the model to the data
nusvr_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = nusvr_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(nusvr_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(nusvr_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  NuSVR Original model (double)
Python  R-squared (Coefficient of determination): 0.2771437770527445
Python  Mean Absolute Error: 83.76666411704255
Python  Mean Squared Error: 9565.381751764757
Python  
Python  NuSVR ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\nu_svr_float.onnx
Python  Information about input tensors in ONNX:
1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.27714379657935495
Python  Mean Absolute Error: 83.766663385322
Python  Mean Squared Error: 9565.381493373838
Python  R^2 matching decimal places:  7
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  5
Python  
Python  NuSVR ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\nu_svr_double.onnx

Ошибки во вкладке Errors:

NuSVR.py started        NuSVR.py        1       1
Traceback (most recent call last):      NuSVR.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  NuSVR.py        159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess.initialize_session(providers, provider_options, disabled_optimizers)   onnxruntime_inference_collection.py     435     1
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for SVMRegressor(1) node with name 'SVM'        onnxruntime_inference_collection.py     435     1
NuSVR.py finished in 2925 ms            5       1

Рис.115. Результат работы скрипта NuSVR.py (float ONNX)

2.2.6.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели nu_svr_float.onnx и nu_svr_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                        NuSVR.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "NuSVR"
#define   ONNXFilenameFloat  "nu_svr_float.onnx"
#define   ONNXFilenameDouble "nu_svr_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

NuSVR (EURUSD,H1)       Testing ONNX float: NuSVR (nu_svr_float.onnx)
NuSVR (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.2771437965793548
NuSVR (EURUSD,H1)       MQL5:   Mean Absolute Error: 83.7666633853219906
NuSVR (EURUSD,H1)       MQL5:   Mean Squared Error: 9565.3814933738358377
NuSVR (EURUSD,H1)       
NuSVR (EURUSD,H1)       Testing ONNX double: NuSVR (nu_svr_double.onnx)
NuSVR (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 9 'Could not find an implementation for SVMRegressor(1) node with name 'SVM''), inspect code 'Scripts\Regression\NuSVR.mq5' (133:16)
NuSVR (EURUSD,H1)       model_name=NuSVR OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

Сравнение с оригинальной моделью:

Testing ONNX float: NuSVR (nu_svr_float.onnx)
Python  Mean Absolute Error: 83.76666411704255
MQL5:   Mean Absolute Error: 83.7666633853219906

2.2.6.3. ONNX-представление моделей nu_svr_float.onnx и nu_svr_double.onnx

Рис.116. ONNX-представление модели nu_svr_float.onnx в Netron

Рис.117. ONNX-представление модели nu_svr_double.onnx в Netron

2.2.7. sklearn.ensemble.RandomForestRegressor

RandomForestRegressor - это метод машинного обучения, который применяется для решения задачи регрессии.

Он является одним из наиболее популярных методов, основанных на ансамблевом обучении, и использует алгоритм "Случайного леса" для создания мощных и устойчивых регрессионных моделей.

Принцип работы RandomForestRegressor:

Входные данные: Начнем с набора данных, который включает в себя признаки (независимые переменные) и целевую переменную (непрерывную).
Случайный лес: RandomForestRegressor использует ансамбль деревьев решений для решения задачи регрессии. В данном случае, каждое дерево в лесу работает над предсказанием значений целевой переменной.
Бутстрэп выборка: Для обучения каждого дерева используется бутстрэп выборка данных, то есть случайная выборка с возвратом из обучающего набора данных. Это позволяет разнообразить данные, на которых обучается каждое дерево.
Случайный выбор признаков: При построении каждого дерева также производится случайный выбор подмножества признаков, что делает модель более устойчивой и способствует снижению коррелированности между деревьями.
Усреднение прогнозов: Когда все деревья построены, RandomForestRegressor усредняет или комбинирует их прогнозы для получения окончательного регрессионного предсказания.

Преимущества RandomForestRegressor:

Мощность и устойчивость: RandomForestRegressor является мощным методом регрессии, который часто обеспечивает хорошую производительность.
Способность к обработке большого объема данных: Метод хорошо справляется с большими объемами данных и может обрабатывать множество признаков.
Устойчивость к переобучению: Благодаря бутстрэп выборке и случайному выбору признаков, случайный лес обычно устойчив к переобучению.
Возможность оценки важности признаков: Random Forest может предоставить информацию о важности каждого признака в задаче регрессии.

Ограничения RandomForestRegressor:

Неинтерпретируемость: Возможно, модель будет менее интерпретируемой по сравнению с линейными моделями.
Не всегда самая точная модель: В некоторых задачах более сложные ансамбли могут быть лишними, и линейные модели могут быть более подходящими.

RandomForestRegressor - это мощный метод машинного обучения для задачи регрессии, который использует ансамбль случайных деревьев решений для создания устойчивой и высокопроизводительной регрессионной модели. Этот метод особенно полезен в задачах с большими объемами данных и для оценки важности признаков.

2.2.7.1. Код создания модели RandomForestRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.ensemble.RandomForestRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# RandomForestRegressor.py
# The code demonstrates the process of training RandomForestRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RandomForestRegressor"
onnx_model_filename = data_path + "random_forest_regressor"

# create a RandomForestRegressor model
regression_model = RandomForestRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  RandomForestRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9998854509605539
Python  Mean Absolute Error: 0.9186485980852603
Python  Mean Squared Error: 1.5157997632401086
Python  
Python  RandomForestRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9998854516013125
Python  Mean Absolute Error: 0.9186420704511761
Python  Mean Squared Error: 1.515791284236419
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  5
Python  
Python  RandomForestRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_double.onnx

Ошибки во вкладке Errors:

RandomForestRegressor.py started        RandomForestRegressor.py        1       1
Traceback (most recent call last):      RandomForestRegressor.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  RandomForestRegressor.py        159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_double.onnx failed:Type Er      onnxruntime_inference_collection.py     424     1
RandomForestRegressor.py finished in 4392 ms            5       1

Рис.118. Результат работы скрипта RandomForestRegressor.py (float ONNX)

2.2.7.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели random_forest_regressor_float.onnx и random_forest_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                        RandomForestRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RandomForestRegressor"
#define   ONNXFilenameFloat  "random_forest_regressor_float.onnx"
#define   ONNXFilenameDouble "random_forest_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

RandomForestRegressor (EURUSD,H1)       
RandomForestRegressor (EURUSD,H1)       Testing ONNX float: RandomForestRegressor (random_forest_regressor_float.onnx)
RandomForestRegressor (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9998854516013125
RandomForestRegressor (EURUSD,H1)       MQL5:   Mean Absolute Error: 0.9186420704511761
RandomForestRegressor (EURUSD,H1)       MQL5:   Mean Squared Error: 1.5157912842364190
RandomForestRegressor (EURUSD,H1)       
RandomForestRegressor (EURUSD,H1)       Testing ONNX double: RandomForestRegressor (random_forest_regressor_double.onnx)
RandomForestRegressor (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\RandomForestRegressor.mq5' (133:16)
RandomForestRegressor (EURUSD,H1)       model_name=RandomForestRegressor OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

2.2.7.3. ONNX-представление моделей random_forest_regressor_float.onnx и random_forest_regressor_double.onnx

Рис.119. ONNX-представление модели random_forest_regressor_float.onnx в Netron

Рис.120. ONNX-представление модели random_forest_regressor_double.onnx в Netron

2.2.8. sklearn.ensemble.GradientBoostingRegressor

GradientBoostingRegressor - это метод машинного обучения, который используется для задачи регрессии. Он является частью семейства ансамблевых методов и базируется на идее построения слабых моделей и комбинирования их в сильную модель с помощью градиентного бустинга.

Градиентный бустинг - это техника улучшения моделей путем итеративного добавления слабых моделей и коррекции ошибок предыдущих моделей.

Вот как работает GradientBoostingRegressor:

Начало построения: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Первая модель: Начинаем с обучения первой модели, которая часто выбирается как простая регрессионная модель (например, решающее дерево) на исходных данных.
Остатки и антиградиент: Вычисляем остатки, то есть разницу между предсказанными значениями первой модели и фактическими значениями целевой переменной. Затем вычисляем антиградиент этой функции потерь, который указывает направление, в котором мы хотим улучшить нашу модель.
Построение следующей модели: Строим следующую модель, фокусируясь на предсказании антиградиента (ошибок первой модели). Эта модель обучается на остатках и добавляется к первой модели.
Итерации: Процесс построения новых моделей и коррекции остатков повторяется многократно. Каждая новая модель учитывает остатки предыдущих моделей и пытается улучшить предсказания.
Комбинирование моделей: Предсказания всех моделей комбинируются в итоговое предсказание путем усреднения или взвешивания их в соответствии с их важностью.

Преимущества GradientBoostingRegressor:

Высокая производительность: Градиентный бустинг является мощным методом и способен достичь высокой производительности в задачах регрессии.
Устойчивость к выбросам: Градиентный бустинг способен обрабатывать выбросы в данных и строить модели с учетом этой неопределенности.
Автоматический отбор признаков: Градиентный бустинг способен автоматически выбирать наиболее важные признаки для предсказания целевой переменной.
Работа с различными функциями потерь: Метод позволяет использовать различные функции потерь в зависимости от задачи.

Ограничения GradientBoostingRegressor:

Требуется настройка гиперпараметров: Для достижения максимальной производительности, гиперпараметры, такие как скорость обучения (learning rate), глубина деревьев и количество моделей, требуют настройки.
Вычислительно затратен: Градиентный бустинг может быть вычислительно затратным, особенно при больших объемах данных и большом количестве деревьев.

GradientBoostingRegressor является мощным методом регрессии и часто используется в практических задачах для достижения высокой производительности при настройке правильных гиперпараметров.

2.2.8.1. Код создания модели GradientBoostingRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.ensemble.GradientBoostingRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# GradientBoostingRegressor.py
# The code demonstrates the process of training GradientBoostingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "GradientBoostingRegressor"
onnx_model_filename = data_path + "gradient_boosting_regressor"

# create a Gradient Boosting Regressor model
regression_model = GradientBoostingRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  GradientBoostingRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9999959514652565
Python  Mean Absolute Error: 0.15069342754017417
Python  Mean Squared Error: 0.053573282108575676
Python  
Python  GradientBoostingRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999959514739537
Python  Mean Absolute Error: 0.15069457426101718
Python  Mean Squared Error: 0.05357316702127665
Python  R^2 matching decimal places:  10
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  6
Python  float ONNX model precision:  5
Python  
Python  GradientBoostingRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_double.onnx

Ошибки во вкладке Errors:

GradientBoostingRegressor.py started    GradientBoostingRegressor.py    1       1
Traceback (most recent call last):      GradientBoostingRegressor.py    1       1
    onnx_session = ort.InferenceSession(onnx_filename)  GradientBoostingRegressor.py    161     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     419     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     452     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_double.onnx failed:Typ      onnxruntime_inference_collection.py     452     1
GradientBoostingRegressor.py finished in 3073 ms                5       1

Рис.121. Результат работы скрипта GradientBoostingRegressor.py (float ONNX)

2.2.8.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели gradient_boosting_regressor_float.onnx и gradient_boosting_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                    GradientBoostingRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "GradientBoostingRegressor"
#define   ONNXFilenameFloat  "gradient_boosting_regressor_float.onnx"
#define   ONNXFilenameDouble "gradient_boosting_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

GradientBoostingRegressor (EURUSD,H1)   Testing ONNX float: GradientBoostingRegressor (gradient_boosting_regressor_float.onnx)
GradientBoostingRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9999959514739537
GradientBoostingRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 0.1506945742610172
GradientBoostingRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 0.0535731670212767
GradientBoostingRegressor (EURUSD,H1)   
GradientBoostingRegressor (EURUSD,H1)   Testing ONNX double: GradientBoostingRegressor (gradient_boosting_regressor_double.onnx)
GradientBoostingRegressor (EURUSD,H1)   ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\GradientBoostingRegressor.mq5' (133:16)
GradientBoostingRegressor (EURUSD,H1)   model_name=GradientBoostingRegressor OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

Сравнение с оригинальной моделью:

Testing ONNX float: GradientBoostingRegressor (gradient_boosting_regressor_float.onnx)
Python  Mean Absolute Error: 0.15069342754017417
MQL5:   Mean Absolute Error: 0.1506945742610172

Точность MAE ONNX float: 5 знаков после запятой.

2.2.8.3. ONNX-представление моделей gradient_boosting_regressor_float.onnx и gradient_boosting_regressor_double.onnx

Рис.122. ONNX-представление модели gradient_boosting_regressor_float.onnx в Netron

Рис.123. ONNX-представление модели gradient_boosting_regressor_double.onnx в Netron

2.2.9. sklearn.ensemble.HistGradientBoostingRegressor

HistGradientBoostingRegressor - это метод машинного обучения, который представляет собой вариацию градиентного бустинга, оптимизированную для работы с большими наборами данных.

Этот метод используется для задачи регрессии, а его название "Hist" указывает на то, что он использует гистограммные методы для ускорения обучения.

Принцип работы HistGradientBoostingRegressor:

Начало построения: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Гистограммные методы: Вместо того, чтобы использовать точное разделение данных на узлах деревьев, HistGradientBoostingRegressor использует гистограммные методы для эффективного представления данных в форме гистограмм. Это сильно ускоряет обучение, особенно на больших объемах данных.
Построение базовых деревьев: Метод строит набор базовых решающих деревьев, называемых "деревьями решений с гистограммами", используя гистограммные представления данных. Эти деревья строятся на основе градиентного бустинга и настраиваются на остатках предыдущей модели.
Постепенное обучение: HistGradientBoostingRegressor добавляет новые деревья в ансамбль постепенно, каждое дерево исправляет остатки предыдущих деревьев.
Комбинирование моделей: После построения базовых деревьев, предсказания всех деревьев комбинируются для получения итогового предсказания.

Преимущества HistGradientBoostingRegressor:

Высокая производительность: Этот метод оптимизирован для работы с большими объемами данных и способен достичь высокой производительности.
Устойчивость к шуму: HistGradientBoostingRegressor обычно хорошо работает в присутствии шума в данных.
Эффективность высокой размерности: Метод может обрабатывать задачи с большим количеством признаков (высокоразмерные данные).
Отличная параллелизация: Возможность эффективной параллелизации обучения на многих процессорах.

Ограничения HistGradientBoostingRegressor:

Требуется настройка гиперпараметров: Для достижения максимальной производительности, гиперпараметры, такие как глубина деревьев и количество моделей, требуют настройки.
Не так интерпретируем, как линейные модели: Как и другие ансамблевые методы, HistGradientBoostingRegressor менее интерпретируем, чем простые модели, такие как линейная регрессия.

HistGradientBoostingRegressor может быть полезным методом регрессии в задачах с большими наборами данных, где высокая производительность и эффективность высокой размерности данных важны.

2.2.9.1. Код создания модели HistGradientBoostingRegressor и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.ensemble.HistGradientBoostingRegressor, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# HistGradientBoostingRegressor.py
# The code demonstrates the process of training HistGradientBoostingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "HistGradientBoostingRegressor"
onnx_model_filename = data_path + "hist_gradient_boosting_regressor"

# create a Histogram-Based Gradient Boosting Regressor model
hist_gradient_boosting_model = HistGradientBoostingRegressor()

# fit the model to the data
hist_gradient_boosting_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = hist_gradient_boosting_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(hist_gradient_boosting_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(hist_gradient_boosting_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  HistGradientBoostingRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9833421349506157
Python  Mean Absolute Error: 9.070567104488434
Python  Mean Squared Error: 220.4295035561544
Python  
Python  HistGradientBoostingRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9833421351962779
Python  Mean Absolute Error: 9.07056497799043
Python  Mean Squared Error: 220.42950030536645
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  5
Python  
Python  HistGradientBoostingRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_double.onnx

Ошибки во вкладке Errors:

HistGradientBoostingRegressor.py started        HistGradientBoostingRegressor.py        1       1
Traceback (most recent call last):      HistGradientBoostingRegressor.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  HistGradientBoostingRegressor.py        161     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     419     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     452     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_double.onnx faile      onnxruntime_inference_collection.py     452     1
HistGradientBoostingRegressor.py finished in 3100 ms            5       1

Рис.124. Результат работы скрипта HistGradientBoostingRegressor.py (float ONNX)

2.2.9.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели hist_gradient_boosting_regressor_float.onnx и hist_gradient_boosting_regressor_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                HistGradientBoostingRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "HistGradientBoostingRegressor"
#define   ONNXFilenameFloat  "hist_gradient_boosting_regressor_float.onnx"
#define   ONNXFilenameDouble "hist_gradient_boosting_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

HistGradientBoostingRegressor (EURUSD,H1)       Testing ONNX float: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_float.onnx)
HistGradientBoostingRegressor (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9833421351962779
HistGradientBoostingRegressor (EURUSD,H1)       MQL5:   Mean Absolute Error: 9.0705649779904292
HistGradientBoostingRegressor (EURUSD,H1)       MQL5:   Mean Squared Error: 220.4295003053665312
HistGradientBoostingRegressor (EURUSD,H1)       
HistGradientBoostingRegressor (EURUSD,H1)       Testing ONNX double: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_double.onnx)
HistGradientBoostingRegressor (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\HistGradientBoostingRegressor.mq5' (133:16)
HistGradientBoostingRegressor (EURUSD,H1)       model_name=HistGradientBoostingRegressor OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

Сравнение ONNX-float модели с оригинальной моделью:

Testing ONNX float: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_float.onnx)
Python  Mean Absolute Error: 9.070567104488434
MQL5:   Mean Absolute Error: 9.0705649779904292

Точность MAE ONNX float: 5 знаков после запятой.

2.2.9.3. ONNX-представление моделей hist_gradient_boosting_regressor_float.onnx и hist_gradient_boosting_regressor_double.onnx

Рис.125. ONNX-представление модели hist_gradient_boosting_regressor_float.onnx в Netron

Рис.126. ONNX-представление модели hist_gradient_boosting_regressor_double.onnx в Netron

2.2.10. sklearn.svm.SVR

SVR (Support Vector Regression) - это метод машинного обучения, используемый для задачи регрессии. Он основан на той же идее, что и Support Vector Machine (SVM) для задачи классификации, но адаптирован для регрессии. Главная цель SVR - предсказать непрерывные значения целевой переменной, опираясь на максимальное среднее расстояние между точками данных и линией регрессии.

Принцип работы SVR:

Определение границ: Как и в случае SVM, SVR также строит границы, которые отделяют разные классы точек данных. Вместо того, чтобы разделять классы, SVR стремится построить "трубку" (tube) вокруг точек данных, где ширина трубки контролируется гиперпараметром.
Целевая переменная и функция потерь: Вместо использования классов, как в задаче классификации, SVR работает с непрерывными значениями целевой переменной. Он минимизирует ошибку предсказания, которая измеряется с использованием функции потерь, такой как квадрат разности между предсказанным и фактическим значением.
Регуляризация: SVR также поддерживает регуляризацию, которая помогает управлять сложностью модели и предотвращать переобучение.
Функции ядра: SVR обычно использует функции ядра (kernel functions), которые позволяют работать с нелинейными зависимостями между признаками и целевой переменной. Популярными ядровыми функциями являются радиальная базисная функция (RBF), полиномиальная и линейная функции.

Преимущества SVR:

Устойчивость к выбросам: SVR способен обрабатывать выбросы в данных, так как он стремится минимизировать ошибку предсказания.
Поддержка нелинейных зависимостей: Использование ядровых функций позволяет SVR моделировать сложные и нелинейные зависимости между признаками и целевой переменной.
Высокое качество предсказаний: В случае, если задача регрессии требует точных предсказаний, SVR может обеспечить хорошее качество результатов.

Ограничения SVR:

Чувствительность к гиперпараметрам: Выбор ядровой функции и параметров модели, таких как ширина трубки (гиперпараметры), может потребовать тщательной настройки и оптимизации.
Вычислительная сложность: Обучение SVR модели, особенно при использовании сложных ядровых функций и больших объемов данных, может быть вычислительно затратным.

SVR - это метод машинного обучения для задачи регрессии, основанный на идее построения "трубки" вокруг точек данных, чтобы минимизировать ошибку предсказания. Он обладает устойчивостью к выбросам и способностью работать с нелинейными зависимостями, что делает его полезным в различных задачах регрессии.

2.2.10.1. Код создания модели SVR и ее экспорта в ONNX для float и double

Этот код создает модель sklearn.svm.SVR, обучает ее на синтетических данных, сохраняет модель в формате ONNX и выполняет предсказания с использованием как float, так и double входных данных. Также он оценивает точность как исходной модели, так и моделей, экспортированных в ONNX.

# SVR.py
# The code demonstrates the process of training SVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "SVR"
onnx_model_filename = data_path + "svr"

# create an SVR model
regression_model = SVR()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  SVR Original model (double)
Python  R-squared (Coefficient of determination): 0.398243655775797
Python  Mean Absolute Error: 73.63683696034649
Python  Mean Squared Error: 7962.89631509593
Python  
Python  SVR ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\svr_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.3982436352100983
Python  Mean Absolute Error: 73.63683840363255
Python  Mean Squared Error: 7962.896587236852
Python  R^2 matching decimal places:  7
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  5
Python  
Python  SVR ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\svr_double.onnx

Рис.127. Результат работы скрипта SVR.py (float ONNX)

2.2.10.2. Код на MQL5 для исполнения ONNX-моделей

Этот код исполняет сохраненные ONNX-модели svr_float.onnx и svr_double.onnx и демонстрирует использование регрессионных метрик в MQL5.

//+------------------------------------------------------------------+
//|                                                          SVR.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "SVR"
#define   ONNXFilenameFloat  "svr_float.onnx"
#define   ONNXFilenameDouble "svr_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Результат:

SVR (EURUSD,H1) Testing ONNX float: SVR (svr_float.onnx)
SVR (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.3982436352100981
SVR (EURUSD,H1) MQL5:   Mean Absolute Error: 73.6368384036325523
SVR (EURUSD,H1) MQL5:   Mean Squared Error: 7962.8965872368517012
SVR (EURUSD,H1) 
SVR (EURUSD,H1) Testing ONNX double: SVR (svr_double.onnx)
SVR (EURUSD,H1) ONNX: cannot create session (OrtStatus: 9 'Could not find an implementation for SVMRegressor(1) node with name 'SVM''), inspect code 'Scripts\R\SVR.mq5' (133:16)
SVR (EURUSD,H1) model_name=SVR OnnxCreate error 5800

ONNX-модель с расчетом во float исполнилась нормально, а при исполнении модели с double возникает ошибка.

Сравнение с оригинальной моделью:

Testing ONNX float: SVR (svr_float.onnx)
Python  Mean Absolute Error: 73.63683696034649
MQL5:   Mean Absolute Error: 73.6368384036325523

Точность MAE ONNX float: 5 знаков после запятой.

2.2.10.3. ONNX-представление моделей svr_float.onnx и svr_double.onnx

Рис.128. ONNX-представление модели svr_float.onnx в Netron

Рис.129. ONNX-представление модели svr_double.onnx в Netron

2.3. Регрессионные модели, которые столкнулись с проблемами при конвертации в ONNX

Некоторые регрессионные модели конвертер sklearn-onnx не смог конвертировать в ONNX-формат.

2.3.1. sklearn.dummy.DummyRegressor

DummyRegressor - это метод машинного обучения, который используется в задачах регрессии для создания базовой модели, которая предсказывает целевую переменную с использованием простых правил. DummyRegressor полезен для сравнения с другими более сложными моделями и оценки их производительности. Этот метод часто используется в контексте оценки качества других регрессионных моделей.

DummyRegressor предоставляет несколько стратегий для предсказания:

"mean" (по умолчанию): DummyRegressor предсказывает среднее значение целевой переменной из обучающего набора данных. Эта стратегия полезна, когда нужно определить, насколько другая модель лучше, чем просто предсказание среднего.
"median": DummyRegressor предсказывает медианное значение целевой переменной из обучающего набора данных.
"quantile": DummyRegressor предсказывает значение квантиля целевой переменной (задается параметром quantile) из обучающего набора данных.
"constant": DummyRegressor предсказывает постоянное значение, которое задается пользователем (с помощью параметра strategy).

Преимущества DummyRegressor:

Оценка производительности: DummyRegressor полезен для оценки производительности других более сложных моделей. Если ваша модель не может превзойти предсказания, сделанные DummyRegressor, это может свидетельствовать о проблемах в модели.
Сравнение с базовыми моделями: DummyRegressor позволяет сравнить производительность более сложных моделей с базовой линией (например, средним или медианным значением).
Простота использования: DummyRegressor легко внедряется и используется для сравнительного анализа.

Ограничения DummyRegressor:

Не предназначен для точного прогнозирования: DummyRegressor предоставляет только простые базовые предсказания и не предназначен для точного прогнозирования.
Не учитывает сложные зависимости: DummyRegressor игнорирует сложные структуры данных и зависимости между признаками.
Не подходит для задач, где нужен точный прогноз: В реальных задачах предсказания целевой переменной с использованием DummyRegressor недостаточно.

DummyRegressor полезен как инструмент для быстрой оценки и сравнения производительности других моделей регрессии, но сам по себе не представляет серьезной регрессионной модели.

2.3.1.1. Код создания модели DummyRegressor

# DummyRegressor.py
# The code demonstrates the process of training DummyRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.dummy import DummyRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "DummyRegressor"
onnx_model_filename = data_path + "dummy_regressor"

# create an Dummy Regressor model
regression_model = DummyRegressor(strategy="mean")

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  DummyRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.0
Python  Mean Absolute Error: 100.00329851715793
Python  Mean Squared Error: 13232.758393867645

Ошибки во вкладке "Errors":

DummyRegressor.py started       DummyRegressor.py       1       1
Traceback (most recent call last):      DummyRegressor.py       1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     DummyRegressor.py       87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.dummy.DummyRegressor'>'. _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
DummyRegressor.py finished in 2565 ms           19      1

2.3.2. sklearn.kernel_ridge.KernelRidge

KernelRidge (ядерная регрессия) - это метод машинного обучения, используемый для задачи регрессии. Он является комбинацией ядерного метода опорных векторов (Kernel SVM) и регрессии. KernelRidge позволяет моделировать сложные, нелинейные отношения между признаками и целевой переменной, используя ядерные функции.

Принцип работы KernelRidge:

Входные данные: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Ядерные функции: В KernelRidge используются ядерные функции (например, полиномиальные, RBF - радиальная базисная функция и др.), которые преобразуют данные в высокоразмерное пространство, где можно строить более сложные нелинейные зависимости.
Обучение модели: Модель обучается на данных, минимизируя среднеквадратичную ошибку между предсказанными значениями и фактическими значениями целевой переменной. При этом используются ядерные функции для учета сложных зависимостей.
Предсказание: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных, используя те же ядерные функции.

Преимущества KernelRidge:

Моделирование сложных нелинейных зависимостей: KernelRidge позволяет моделировать сложные и нелинейные зависимости между признаками и целевой переменной.
Возможность выбора различных ядер: Вы можете выбирать разные ядра в зависимости от характера данных и задачи.
Регуляризация: Метод включает в себя регуляризацию, что помогает предотвратить переобучение модели.

Ограничения KernelRidge:

Неинтерпретируемость: Как и многие нелинейные методы, KernelRidge менее интерпретируем, чем линейные модели.
Вычислительная сложность: Использование ядерных функций может быть вычислительно затратным при большом объеме данных и/или высокой размерности.
Необходимость настройки параметров: Выбор подходящего ядра и параметров модели требует настройки и экспертного знания.

KernelRidge полезен в задачах регрессии, где данные имеют сложные, нелинейные зависимости, и требуется модель, способная их учесть. Он также полезен в задачах, где ядерные функции могут быть использованы для преобразования данных в более информативное представление.

2.3.2.1. Код создания модели KernelRidge

# KernelRidge.py
# The code demonstrates the process of training KernelRidge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.kernel_ridge import KernelRidge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "KernelRidge"
onnx_model_filename = data_path + "kernel_ridge"

# create an KernelRidge model
regression_model = KernelRidge(alpha=1.0, kernel='linear')

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  KernelRidge Original model (double)
Python  R-squared (Coefficient of determination): 0.9962137909675411
Python  Mean Absolute Error: 6.36977985227399
Python  Mean Squared Error: 50.10198935520715

Ошибки во вкладке Errors:

KernelRidge.py started  KernelRidge.py  1       1
Traceback (most recent call last):      KernelRidge.py  1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     KernelRidge.py  87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.kernel_ridge.KernelRidge'>'.     _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
KernelRidge.py finished in 2516 ms              19      1

2.3.3. sklearn.isotonic.IsotonicRegression

IsotonicRegression - это метод машинного обучения, используемый для задачи регрессии, который моделирует монотонную зависимость между признаками и целевой переменной. В данном контексте "монотонность" означает, что увеличение значения одного из признаков приводит к увеличению или уменьшению значения целевой переменной, сохраняя при этом направление изменения.

Принцип работы IsotonicRegression:

Входные данные: Начинаем с исходного набора данных, где у нас есть признаки (независимые переменные) и соответствующие значения целевой переменной.
Монотонная регрессия: IsotonicRegression стремится найти наилучшую монотонную функцию, которая описывает зависимость между признаками и целевой переменной. Эта функция может быть линейной или нелинейной, но она должна сохранять монотонность.
Обучение модели: Модель обучается на данных, чтобы определить параметры монотонной функции. В процессе обучения модель пытается минимизировать сумму квадратов ошибок между предсказаниями и фактическими значениями целевой переменной.
Предсказание: После обучения модель может использоваться для предсказания значений целевой переменной для новых данных, сохраняя монотонность взаимосвязи.

Преимущества IsotonicRegression:

Моделирование монотонных зависимостей: Этот метод является идеальным выбором, когда данные демонстрируют монотонные зависимости, и важно сохранить эту характеристику в модели.
Интерпретируемость: Монотонные модели могут быть более интерпретируемыми, поскольку они позволяют четко определить направление влияния каждого признака на целевую переменную.

Ограничения IsotonicRegression:

Не подходит для сложных, нелинейных зависимостей: Этот метод ограничивается моделированием монотонных связей, и, следовательно, он не подходит для моделирования сложных нелинейных зависимостей.
Настройка параметров: Некоторые реализации IsotonicRegression могут иметь параметры, которые требуют настройки, чтобы достичь оптимальной производительности.

IsotonicRegression полезен в задачах, где монотонность взаимосвязи между признаками и целевой переменной считается важным фактором, и требуется построение модели, которая сохраняет эту характеристику.

2.3.3.1. Код создания модели IsotonicRegression

# IsotonicRegression.py
# The code demonstrates the process of training IsotonicRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.isotonic import IsotonicRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "IsotonicRegression"
onnx_model_filename = data_path + "isotonic_regression"

# create an IsotonicRegression model
regression_model = IsotonicRegression()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  IsotonicRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9999898125037958
Python  Mean Absolute Error: 0.20093409873424467
Python  Mean Squared Error: 0.13480867590911208

Ошибки во вкладке Errors:

IsotonicRegression.py started   IsotonicRegression.py   1       1
Traceback (most recent call last):      IsotonicRegression.py   1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     IsotonicRegression.py   87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.isotonic.IsotonicRegression'>'.  _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
IsotonicRegression.py finished in 2499 ms               19      1

2.3.4. sklearn.cross_decomposition.PLSCanonical

PLSCanonical (Partial Least Squares Canonical) - это метод машинного обучения, который используется для решения задачи канонической корреляции. Он является расширением метода частных наименьших квадратов (PLS) и применяется для анализа и моделирования связей между двумя наборами переменных.

Принцип работы PLSCanonical:

Входные данные: Начнем с двух наборов данных (X и Y), где каждый набор представляет собой множество переменных (признаков). Обычно X и Y содержат коррелированные данные, и задача - найти линейные комбинации признаков, которые максимизируют корреляцию между ними.
Подбор линейных комбинаций: PLSCanonical находит линейные комбинации (компоненты), как в X, так и в Y, таким образом, чтобы максимизировать корреляцию между компонентами двух наборов данных. Эти компоненты называются каноническими переменными.
Поиск максимальной корреляции: Основная цель PLSCanonical - найти такие канонические переменные, которые максимизируют корреляцию между X и Y. Это позволяет выделить наиболее информативные связи между двумя наборами данных.
Обучение модели: Как только канонические переменные найдены, вы можете использовать их для создания модели, которая может прогнозировать значения Y на основе X.
Получение прогнозов: После обучения модель может быть использована для предсказания значений Y на новых данных, используя соответствующие значения X.

Преимущества PLSCanonical:

Изучение корреляции: PLSCanonical позволяет анализировать и моделировать корреляции между двумя наборами данных, что может быть полезно для понимания связей между переменными.
Уменьшение размерности: Метод также может использоваться для снижения размерности данных, выделяя наиболее важные компоненты.

Ограничения PLSCanonical:

Чувствительность к выбору числа компонент: Выбор оптимального числа канонических переменных может потребовать некоторых экспериментов.
Зависимость от структуры данных: Результаты PLSCanonical могут сильно зависеть от структуры данных и корреляций между ними.

PLSCanonical - это метод машинного обучения, используемый для анализа и моделирования корреляций между двумя наборами переменных. Этот метод позволяет изучать взаимосвязи между данными и может быть полезным для снижения размерности данных и предсказания значений на основе коррелированных компонент.

2.3.4.1. Код создания модели PLSCanonical

# PLSCanonical.py
# The code demonstrates the process of training PLSCanonical model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import PLSCanonical
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PLSCanonical"
onnx_model_filename = data_path + "pls_canonical"

# create an PLSCanonical model
regression_model = PLSCanonical(n_components=1)

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  
Python  PLSCanonical Original model (double)
Python  R-squared (Coefficient of determination): 0.9962347199278333
Python  Mean Absolute Error: 6.3561407034365995
Python  Mean Squared Error: 49.82504148022689

Ошибки во вкладке Errors:

PLSCanonical.py started PLSCanonical.py 1       1
Traceback (most recent call last):      PLSCanonical.py 1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     PLSCanonical.py 87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.cross_decomposition._pls.PLSCanonical'>'.        _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
PLSCanonical.py finished in 2513 ms             19      1

2.3.5. sklearn.cross_decomposition.CCA

Canonical Correlation Analysis (CCA), или канонический анализ корреляции, является мультивариативным методом статистического анализа, используемым для изучения связей между двумя наборами переменных (набором переменных X и набором переменных Y). Основная цель CCA - найти линейные комбинации переменных X и Y, которые максимизируют корреляцию между ними. Эти линейные комбинации называются каноническими переменными.

Принцип работы CCA:

Входные данные: Начинаем с двух наборов переменных X и Y. В этих наборах переменных может быть любое количество переменных, и CCA пытается найти линейные комбинации, которые максимизируют корреляцию между ними.
Построение канонических переменных: CCA находит канонические переменные в X и Y, которые максимизируют корреляцию между ними. Эти канонические переменные представляют собой линейные комбинации исходных переменных, по одной для каждого канонического показателя.
Оценка корреляции: CCA оценивает корреляцию между парами канонических переменных. Канонические переменные обычно упорядочены по убыванию корреляции, так что первая пара имеет наибольшую корреляцию, вторая - следующую по величине, и так далее.
Интерпретация: Канонические переменные могут быть интерпретированы с учетом их корреляции и весов переменных. Это позволяет понять, какие переменные из наборов X и Y наиболее сильно связаны.

Преимущества CCA:

Позволяет выявить скрытые связи: CCA может помочь выявить скрытые корреляции между двумя наборами переменных, которые не очевидны при первичном анализе.
Устойчивость к шуму: CCA может учесть шум в данных и сосредоточиться на наиболее значимых корреляциях.
Множественные применения: CCA может быть использован в различных областях, включая статистику, биоинформатику, финансы, и другие, для изучения связей между наборами переменных.

Ограничения CCA:

Требуется больше данных: CCA может требовать больший объем данных, чем другие методы анализа, чтобы надежно оценивать корреляции.
Линейные отношения: CCA предполагает линейные отношения между переменными, что может быть недостаточным в некоторых случаях.
Комплексность интерпретации: Интерпретация канонических переменных может быть сложной, особенно если есть много переменных в наборах X и Y.

CCA полезен в задачах, где требуется изучение взаимосвязи между двумя наборами переменных и выявление скрытых корреляций.

2.3.5.1. Код создания модели CCA

# CCA.py
# The code demonstrates the process of training CCA model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import CCA
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name="CCA"
onnx_model_filename = data_path + "cca"

# create an CCA model
regression_model = CCA(n_components=1)

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Результат:

Python  CCA Original model (double)
Python  R-squared (Coefficient of determination): 0.9962347199278333
Python  Mean Absolute Error: 6.3561407034365995
Python  Mean Squared Error: 49.82504148022689

Ошибки во вкладке Errors:

CCA.py started  CCA.py  1       1
Traceback (most recent call last):      CCA.py  1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     CCA.py  87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.cross_decomposition._pls.CCA'>'. _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
CCA.py finished in 2543 ms              19      1

Выводы

В статье рассмотрены 45 регрессионных моделей, доступных в библиотеке Scikit-learn версии 1.3.2.

1. Из этого набора 5 моделей столкнулись с трудностями при конвертации в ONNX-формат:

DummyRegressor (Фиктивный регрессор);
KernelRidge (Регрессия Ridge с использованием методов ядра);
IsotonicRegression (Изотоническая регрессия);
PLSCanonical (Канонический анализ частных наименьших квадратов);
CCA (Канонический анализ корреляции).

Возможно, эти модели слишком сложны по своей структуре или логике, они могут использовать специфические структуры данных или алгоритмы, которые не полностью совместимы с форматом ONNX.

2. Остальные 40 моделей были успешно конвертированы в ONNX с расчетом на float.

ARDRegression: Регрессия с автоматическим определением релевантности (ARD);
BayesianRidge: Байесовская линейная регрессия с регуляризацией;
ElasticNet: Комбинация регуляризации L1 и L2 для уменьшения переобучения;
ElasticNetCV: Эластичная сеть с автоматическим выбором параметра регуляризации;
HuberRegressor: Регрессия с уменьшенной чувствительностью к выбросам;
Lars: Линейная регрессия методом наименьших углов;
LarsCV: Кросс-валидация для метода наименьших углов;
Lasso: Регрессия с L1-регуляризацией для отбора признаков;
LassoCV: Кросс-валидация для Lasso-регрессии;
LassoLars: Комбинация Lasso и LARS для регрессии;
LassoLarsCV: Кросс-валидация для LassoLars-регрессии;
LassoLarsIC: Информационный критерий (IC) для подбора параметров LassoLars;
LinearRegression: Простая линейная регрессия;
Ridge: Линейная регрессия с L2-регуляризацией;
RidgeCV: Кросс-валидация для Ridge-регрессии;
OrthogonalMatchingPursuit: Регрессия с ортогональным отбором признаков;
PassiveAggressiveRegressor: Регрессия с пассивно-агрессивным подходом к обучению;
QuantileRegressor: Регрессия квантилей;
RANSACRegressor: Регрессия методом RANdom SAmple Consensus;
TheilSenRegressor: Нелинейная регрессия, основанная на методе Theil-Sen.
LinearSVR: Линейная регрессия для задачи опорных векторов;
MLPRegressor: Регрессия с использованием многослойного персептрона;
PLSRegression: Регрессия методом Partial Least Squares;
TweedieRegressor: Регрессия на основе распределения Твиди;
PoissonRegressor: Регрессия для моделирования данных с распределением Пуассона;
RadiusNeighborsRegressor: Регрессия на основе радиуса ближайших соседей;
KNeighborsRegressor: Регрессия на основе ближайших соседей;
GaussianProcessRegressor: Регрессия на основе гауссовских процессов;
GammaRegressor: Регрессия для моделирования данных с гамма-распределением;
SGDRegressor: Регрессия на основе стохастического градиентного спуска;
AdaBoostRegressor: Регрессия с использованием алгоритма AdaBoost;
BaggingRegressor: Регрессия с использованием метода Bagging;
DecisionTreeRegressor: Регрессия с деревом решений;
ExtraTreeRegressor: Регрессия с дополнительным деревом решений;
ExtraTreesRegressor: Регрессия с дополнительными деревьями решений;
NuSVR: Непрерывная линейная регрессия для опорных векторов (SVR);
RandomForestRegressor: Регрессия с ансамблем деревьев решений (Random Forest);
GradientBoostingRegressor: Регрессия с градиентным бустингом;
HistGradientBoostingRegressor: Регрессия с гистограммным градиентным бустингом;
SVR: Регрессия методом опорных векторов.

3. Также исследовалась возможность конвертации регрессионных моделей в ONNX с расчетом в double.

Серьезной проблемой при конвертации в ONNX модели с расчетом в double является ограничение ML-операторов ai.onnx.ml.LinearRegressor, ai.onnx.ml.SVMRegressor, ai.onnx.ml.TreeEnsembleRegressor: их параметры и выходное значение имеют тип float. Фактически это звенья понижения точности и их исполнение при расчетах с двойной точностью сомнительно. По этой причине в библиотеке ONNX Runtime не стали реализовывать некоторые операторы для ONNX-моделей с расчетом на double (в таких случаях возможны ошибки вида NOT_IMPLEMENTED :'Could not find an implementation for the node LinearRegressor:LinearRegressor(1), Could not find an implementation for SVMRegressor(1) node with name 'SVM' и т.п.). Таким образом, в рамках текущей спецификации ONNX полноценная работа в double для этих ML-операторов невозможна.

Для моделей линейной регрессии конвертеру sklearn-onnx удалось обойти ограничение LinearRegressor: для ONNX-моделей double вместо него используются ONNX-операторы MatMul() и Add(). Благодаря такому подходу первые 30 моделей предыдущего списка успешно конвертируются в ONNX-модели с расчетом в double, и эти модели показывают сохранение точности оригинальных моделей в double.

Однако для более сложных ML-операторов SVMRegressor и TreeEnsembleRegressor этого сделать пока не удалось, поэтому модели AdaBoostRegressor, BaggingRegressor, DecisionTreeRegressor, ExtraTreeRegressor, ExtraTreesRegressor, NuSVR, RandomForestRegressor, GradientBoostingRegressor, HistGradientBoostingRegressor и SVR сейчас доступны только в ONNX-моделях с расчетом во float.

Заключение

В статье рассмотрены 45 регрессионных моделей библиотеки Scikit-learn версии 1.3.2 и результаты их преобразования в формат ONNX для расчетах во float и double.

Из всех рассмотренных моделей, 5 оказались сложными для конвертации в формат ONNX. Эти модели включают в себя DummyRegressor, KernelRidge, IsotonicRegression, PLSCanonical и CCA. Вероятно, их сложная структура или логика требуют дополнительной адаптации для успешной конвертации в формат ONNX.

Остальные 40 регрессионных моделей были успешно преобразованы в формат ONNX для float. Из них 30 моделей также были успешно преобразованы в ONNX-формат для double и показали сохранение своей точности.

По причине ограничения ML-операторов SVMRegressor и TreeEnsembleRegressor модели AdaBoostRegressor, BaggingRegressor, DecisionTreeRegressor, ExtraTreeRegressor, ExtraTreesRegressor, NuSVR, RandomForestRegressor, GradientBoostingRegressor, HistGradientBoostingRegressor и SVR в настоящее время доступны только в ONNX-моделях с расчетами во float.

Все скрипты из статьи также доступны в публичном проекте MQL5\Shared Projects\Scikit.Regression.ONNX.