Modelos de regresión de la biblioteca Scikit-learn y su exportación a ONNX

MetaTrader 5 — Integración | 15 julio 2024, 12:15

ONNX (Open Neural Network Exchange) es un formato para describir e intercambiar modelos de aprendizaje automático, proporcionando la capacidad de transferir modelos entre diferentes marcos de aprendizaje automático. En el aprendizaje profundo y las redes neuronales, los tipos de datos como float32 se utilizan con frecuencia. Se aplican ampliamente porque suelen proporcionar una precisión y eficiencia aceptables para el entrenamiento de modelos de aprendizaje profundo.

Algunos modelos clásicos de aprendizaje automático son difíciles de representar como operadores ONNX. Por lo tanto, se introdujeron operadores ML adicionales (ai.onnx.ml) para implementarlos en ONNX. Cabe destacar que, según la especificación de ONNX, los operadores clave de este conjunto (LinearRegressor, SVMRegressor, TreeEnsembleRegressor) pueden aceptar varios tipos de datos de entrada (tensor(float), tensor(double), tensor(int64), tensor(int32)), pero siempre devuelven el tipo tensor(float) como salida. La parametrización de estos operadores también se realiza utilizando números de coma flotante, lo que puede limitar la precisión de los cálculos, especialmente si se utilizaron números de doble precisión para definir los parámetros del modelo original.

Esto puede provocar una pérdida de precisión al convertir modelos o utilizar distintos tipos de datos en el proceso de conversión y procesamiento de datos en ONNX. Mucho depende del conversor, como veremos más adelante; algunos modelos consiguen saltarse estas limitaciones y garantizan la plena portabilidad de los modelos ONNX, permitiendo trabajar con ellos en doble precisión sin perder exactitud. Es importante tener en cuenta estas características al trabajar con modelos y su representación en ONNX, especialmente en los casos en los que la precisión de la representación de los datos es importante.

Scikit-learn es una de las librerías para aprendizaje automático más populares y utilizadas en la comunidad Python. Ofrece una amplia gama de algoritmos, una interfaz fácil de usar y una buena documentación. El artículo anterior, "Modelos de clasificación de la biblioteca Scikit-learn y su exportación a ONNX", trataba sobre los modelos de clasificación.

En este artículo, exploraremos la aplicación de modelos de regresión en el paquete Scikit-learn, calcularemos sus parámetros con doble precisión para el conjunto de datos de prueba, intentaremos convertirlos al formato ONNX para flotación y doble precisión, y utilizaremos los modelos obtenidos en programas en MQL5. Adicionalmente, compararemos la precisión de los modelos originales y sus versiones ONNX para precisión flotante y doble. Además, examinaremos la representación ONNX de los modelos de regresión, lo que permitirá comprender mejor su estructura interna y su funcionamiento.

Contenido

Si te molesta, eres bienvenido a contribuir.

En el foro de desarrolladores de ONNX Runtime, uno de los usuarios reportó un error "[ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for the node LinearRegressor:LinearRegressor(1)" al ejecutar un modelo a través de ONNX Runtime.

Hola a todos, obtengo este error al intentar inferir un modelo de regresión lineal. Por favor, ayúdeme a resolver esto.

"NOT_IMPLEMENTED : Could not find an implementation for the node LinearRegressor:LinearRegressor(1)" error del foro de desarrolladores de ONNX Runtime

Respuesta de los desarrolladores:

Es porque sólo lo implementamos para float32, no para float64. Pero su modelo necesita float64.

Véase:
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/core/providers/cpu/ml/linearregressor.cc#L16

Si te molesta, eres bienvenido a contribuir.

En el modelo ONNX del usuario, el operador ai.onnx.ml.LinearRegressor se llama con tipo de datos double (float64), y el mensaje de error surge porque el ONNX Runtime carece de soporte para el operador LinearRegressor() con precisión doble.

Según la especificación del operador ai.onnx.ml.LinearRegressor, el tipo de datos de entrada doble es posible (T: tensor(float), tensor(double), tensor(int64), tensor(int32)); sin embargo, los desarrolladores decidieron intencionadamente no implementarlo.

La razón es que la salida siempre devuelve el valor Y: tensor(float). Además, los parámetros de cálculo son números flotantes (coeficientes: lista de flotantes, interceptos: lista de flotantes).

En consecuencia, cuando los cálculos se realizan en doble precisión, este operador reduce la precisión a float, y su implementación en cálculos de doble precisión tiene un valor cuestionable.

Descripción del operador ai.onnx.ml.LinearRegressor

Así, la reducción de la precisión a float en los parámetros y el valor de salida hace imposible que el ai.onnx.ml.LinearRegressor funcione completamente con números dobles (float64). Presumiblemente, por esta razón, los desarrolladores de ONNX Runtime decidieron abstenerse de implementarlo para el tipo double.

El método de "añadir doble soporte" fue demostrado por los desarrolladores en comentarios de código (resaltados en amarillo).

En ONNX Runtime, su cálculo se realiza mediante la clase LinearRegressor (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/ml/linearregressor.h).

Los parámetros del operador, coefficients_, e intercepts_, se almacenan como std::vector<float>:

#pragma once

#include "core/common/common.h"
#include "core/framework/op_kernel.h"
#include "core/util/math_cpuonly.h"
#include "ml_common.h"

namespace onnxruntime {
namespace ml {

class LinearRegressor final : public OpKernel {
 public:
  LinearRegressor(const OpKernelInfo& info);
  Status Compute(OpKernelContext* context) const override;

 private:
  int64_t num_targets_;
  std::vector<float> coefficients_;
  std::vector<float> intercepts_;
  bool use_intercepts_;
  POST_EVAL_TRANSFORM post_transform_;
};

}  // namespace ml
}  // namespace onnxruntime

La implementación del operador LinearRegressor (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/ml/linearregressor.cc)

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

#include "core/providers/cpu/ml/linearregressor.h"
#include "core/common/narrow.h"
#include "core/providers/cpu/math/gemm.h"

namespace onnxruntime {
namespace ml {

ONNX_CPU_OPERATOR_ML_KERNEL(
    LinearRegressor,
    1,
    // KernelDefBuilder().TypeConstraint("T", std::vector<MLDataType>{
    //                                            DataTypeImpl::GetTensorType<float>(),
    //                                            DataTypeImpl::GetTensorType<double>()}),
    KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    LinearRegressor);

LinearRegressor::LinearRegressor(const OpKernelInfo& info)
    : OpKernel(info),
      intercepts_(info.GetAttrsOrDefault<float>("intercepts")),
      post_transform_(MakeTransform(info.GetAttrOrDefault<std::string>("post_transform", "NONE"))) {
  ORT_ENFORCE(info.GetAttr<int64_t>("targets", &num_targets_).IsOK());
  ORT_ENFORCE(info.GetAttrs<float>("coefficients", coefficients_).IsOK());

  // use the intercepts_ if they're valid
  use_intercepts_ = intercepts_.size() == static_cast<size_t>(num_targets_);
}

// Use GEMM for the calculations, with broadcasting of intercepts
// https://github.com/onnx/onnx/blob/main/docs/Operators.md#Gemm
//
// X: [num_batches, num_features]
// coefficients_: [num_targets, num_features]
// intercepts_: optional [num_targets].
// Output: X * coefficients_^T + intercepts_: [num_batches, num_targets]
template <typename T>
static Status ComputeImpl(const Tensor& input, ptrdiff_t num_batches, ptrdiff_t num_features, ptrdiff_t num_targets,
                          const std::vector<float>& coefficients,
                          const std::vector<float>* intercepts, Tensor& output,
                          POST_EVAL_TRANSFORM post_transform,
                          concurrency::ThreadPool* threadpool) {
  const T* input_data = input.Data<T>();
  T* output_data = output.MutableData<T>();

  if (intercepts != nullptr) {
    TensorShape intercepts_shape({num_targets});
    onnxruntime::Gemm<T>::ComputeGemm(CBLAS_TRANSPOSE::CblasNoTrans, CBLAS_TRANSPOSE::CblasTrans,
                                      num_batches, num_targets, num_features,
                                      1.f, input_data, coefficients.data(), 1.f,
                                      intercepts->data(), &intercepts_shape,
                                      output_data,
                                      threadpool);
  } else {
    onnxruntime::Gemm<T>::ComputeGemm(CBLAS_TRANSPOSE::CblasNoTrans, CBLAS_TRANSPOSE::CblasTrans,
                                      num_batches, num_targets, num_features,
                                      1.f, input_data, coefficients.data(), 1.f,
                                      nullptr, nullptr,
                                      output_data,
                                      threadpool);
  }

  if (post_transform != POST_EVAL_TRANSFORM::NONE) {
    ml::batched_update_scores_inplace(gsl::make_span(output_data, SafeInt<size_t>(num_batches) * num_targets),
                                      num_batches, num_targets, post_transform, -1, false, threadpool);
  }
  return Status::OK();
}

Status LinearRegressor::Compute(OpKernelContext* ctx) const {
  Status status = Status::OK();

  const auto& X = *ctx->Input<Tensor>(0);
  const auto& input_shape = X.Shape();

  if (input_shape.NumDimensions() > 2) {
    return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Input shape had more than 2 dimension. Dims=",
                           input_shape.NumDimensions());
  }

  ptrdiff_t num_batches = input_shape.NumDimensions() <= 1 ? 1 : narrow<ptrdiff_t>(input_shape[0]);
  ptrdiff_t num_features = input_shape.NumDimensions() <= 1 ? narrow<ptrdiff_t>(input_shape.Size())
                                                            : narrow<ptrdiff_t>(input_shape[1]);
  Tensor& Y = *ctx->Output(0, {num_batches, num_targets_});
  concurrency::ThreadPool* tp = ctx->GetOperatorThreadPool();

  auto element_type = X.GetElementType();

  switch (element_type) {
    case ONNX_NAMESPACE::TensorProto_DataType_FLOAT: {
      status = ComputeImpl<float>(X, num_batches, num_features, narrow<ptrdiff_t>(num_targets_), coefficients_,
                                  use_intercepts_ ? &intercepts_ : nullptr,
                                  Y, post_transform_, tp);

      break;
    }
    case ONNX_NAMESPACE::TensorProto_DataType_DOUBLE: {
      // TODO: Add support for 'double' to the scoring functions in ml_common.h
      // once that is done we can just call ComputeImpl<double>...
      // Alternatively we could cast the input to float.
    }
    default:
      status = ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Unsupported data type of ", element_type);
  }

  return status;
}

}  // namespace ml
}  // namespace onnxruntime

Resulta que existe una opción para utilizar números dobles como valores de entrada y realizar el cálculo del operador con parámetros flotantes. Otra posibilidad podría ser reducir la precisión de los datos de entrada a float. Sin embargo, ninguna de estas opciones puede considerarse una solución adecuada.

La especificación del operador ai.onnx.ml.LinearRegressor restringe la capacidad de operación completa con números dobles ya que los parámetros y el valor de salida están limitados al tipo float.

Una situación similar ocurre con otros operadores ML de ONNX, como ai.onnx.ml.SVMRegressor y ai.onnx.ml.TreeEnsembleRegressor.

Como resultado, todos los desarrolladores que utilicen la ejecución de modelos ONNX en doble precisión se enfrentan a esta limitación de la especificación. Una solución podría pasar por ampliar la especificación ONNX (o añadir operadores similares como LinearRegressor64, SVMRegressor64 y TreeEnsembleRegressor64 con parámetros y valores de salida en doble). Sin embargo, en la actualidad esta cuestión sigue sin resolverse.

Depende mucho del convertidor ONNX. Para los modelos calculados en doble, puede ser preferible evitar el uso de estos operadores (aunque no siempre es posible). En este caso concreto, el conversor a ONNX no funcionó de forma óptima con el modelo del usuario.

Como veremos más adelante, el conversor sklearn-onnx consigue saltarse la limitación de LinearRegressor: para los modelos dobles ONNX, utiliza en su lugar los operadores ONNX MatMul() y Add(). Gracias a este método, numerosos modelos de regresión de la librería Scikit-learn se convierten con éxito en modelos ONNX calculados en doble, preservando la precisión de los modelos dobles originales.

1. Conjunto de datos de prueba

Para ejecutar los ejemplos, necesitarás instalar Python (nosotros utilizamos la versión 3.10.8), librerías adicionales (pip install -U scikit-learn numpy matplotlib onnx onnxruntime skl2onnx), y especificar la ruta a Python en el MetaEditor (en el menú Herramientas>Opciones>Compiladores>Python).

Como conjunto de datos de prueba, utilizaremos valores generados de la función y = 4X + 10sin(X*0,5).

Para mostrar un gráfico de una función de este tipo, abra el MetaEditor, cree un archivo llamado RegressionData.py, copie el texto del script y ejecútelo haciendo clic en el botón "Compilar".

El script para mostrar el conjunto de datos de prueba

# RegressionData.py
# The code plots the synthetic data, used for all regression models
# Copyright 2023, MetaQuotes Ltd.
# https://mql5.com

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

# set the figure size
plt.figure(figsize=(8,5))

# plot the initial data for regression
plt.scatter(X, y, label='Regression Data', marker='o')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Regression data')
plt.show()

Como resultado, se mostrará un gráfico de la función, que utilizaremos para probar los métodos de regresión.

Fig.1. Función para probar modelos de regresión

2. Modelos de regresión

El objetivo de una tarea de regresión es encontrar una función matemática o modelo que describa mejor la relación entre las características y la variable objetivo para predecir los valores numéricos de los nuevos datos. Esto permite hacer previsiones, optimizar soluciones y tomar decisiones informadas basadas en datos.

Consideremos los principales modelos de regresión del paquete Scikit-learn.

2.0. Lista de modelos de regresión de Scikit-learn

Para mostrar una lista de los modelos de regresión de Scikit-learn disponibles, puede utilizar la secuencia de comandos:

# ScikitLearnRegressors.py
# The script lists all the regression algorithms available inb scikit-learn
# Copyright 2023, MetaQuotes Ltd.
# https://mql5.com

# print Python version
from platform import python_version  
print("The Python version is ", python_version()) 

# print scikit-learn version
import sklearn
print('The scikit-learn version is {}.'.format(sklearn.__version__))

# print scikit-learn regression models
from sklearn.utils import all_estimators

regressors = all_estimators(type_filter='regressor')
for index, (name, RegressorClass) in enumerate(regressors, start=1):
    print(f"Regressor {index}: {name}")

Salida:

The Python version is 3.10.8
The scikit-learn version is 1.3.2.
Regressor 1: ARDRegression
Regressor 2: AdaBoostRegressor
Regressor 3: BaggingRegressor
Regressor 4: BayesianRidge
Regressor 5: CCA
Regressor 6: DecisionTreeRegressor
Regressor 7: DummyRegressor
Regressor 8: ElasticNet
Regressor 9: ElasticNetCV
Regressor 10: ExtraTreeRegressor
Regressor 11: ExtraTreesRegressor
Regressor 12: GammaRegressor
Regressor 13: GaussianProcessRegressor
Regressor 14: GradientBoostingRegressor
Regressor 15: HistGradientBoostingRegressor
Regressor 16: HuberRegressor
Regressor 17: IsotonicRegression
Regressor 18: KNeighborsRegressor
Regressor 19: KernelRidge
Regressor 20: Lars
Regressor 21: LarsCV
Regressor 22: Lasso
Regressor 23: LassoCV
Regressor 24: LassoLars
Regressor 25: LassoLarsCV
Regressor 26: LassoLarsIC
Regressor 27: LinearRegression
Regressor 28: LinearSVR
Regressor 29: MLPRegressor
Regressor 30: MultiOutputRegressor
Regressor 31: MultiTaskElasticNet
Regressor 32: MultiTaskElasticNetCV
Regressor 33: MultiTaskLasso
Regressor 34: MultiTaskLassoCV
Regressor 35: NuSVR
Regressor 36: OrthogonalMatchingPursuit
Regressor 37: OrthogonalMatchingPursuitCV
Regressor 38: PLSCanonical
Regressor 39: PLSRegression
Regressor 40: PassiveAggressiveRegressor
Regressor 41: PoissonRegressor
Regressor 42: QuantileRegressor
Regressor 43: RANSACRegressor
Regressor 44: RadiusNeighborsRegressor
Regressor 45: RandomForestRegressor
Regressor 46: RegressorChain
Regressor 47: Ridge
Regressor 48: RidgeCV
Regressor 49: SGDRegressor
Regressor 50: SVR
Regressor 51: StackingRegressor
Regressor 52: TheilSenRegressor
Regressor 53: TransformedTargetRegressor
Regressor 54: TweedieRegressor
Regressor 55: VotingRegressor

Para mayor comodidad, en esta lista de regresores aparecen resaltados en distintos colores. Los modelos que requieren un modelo de regresión base aparecen resaltados en gris, mientras que los demás modelos pueden utilizarse de forma independiente. Observe que los modelos exportados con éxito al formato ONNX están marcados en verde, los modelos que encuentran errores durante la conversión en la versión actual de Scikit-learn 1.2.2 están marcados en rojo. Los métodos inadecuados para la tarea de prueba considerada aparecen resaltados en azul.

El análisis de la calidad de la regresión utiliza métricas de regresión, que son funciones de valores verdaderos y predichos. En el lenguaje MQL5 se dispone de varias métricas diferentes, detalladas en el artículo "Evaluación de modelos ONNX mediante métricas de regresión".

En este artículo se utilizarán tres métricas para comparar la calidad de distintos modelos:

Coeficiente de determinación (R-cuadrado) (R²);
Error absoluto medio / Mean Absolute Error (MAE);
Error medio cuadrado / Mean Squared Error (MSE).

2.1. Modelos de regresión de Scikit-learn que se convierten en modelos ONNX float y double

En esta sección se presentan modelos de regresión convertidos con éxito a formatos ONNX tanto en precisión flotante como doble.

Todos los modelos de regresión que se analizan a continuación se presentan en el siguiente formato:

Descripción del modelo, principio de funcionamiento, ventajas y limitaciones
Script en Python para crear el modelo, exportarlo a ficheros ONNX en formatos float y double, y ejecutar los modelos obtenidos utilizando ONNX Runtime en Python. Métricas como R2, MAE, MSE, calculadas con sklearn.metrics, se utilizan para evaluar la calidad de los modelos original y ONNX.
Script MQL5 para ejecutar modelos ONNX (float y double) mediante ONNX Runtime, con métricas calculadas mediante RegressionMetric().
Representación del modelo ONNX en Netron para float y doble precisión.

2.1.1. sklearn.linear_model.ARDRegression

ARDRegression (Automatic Relevance Determination Regression / Regresión de determinación automática de relevancia) es un método de regresión diseñado para abordar problemas de regresión determinando automáticamente la importancia (relevancia) de las características y estableciendo sus pesos durante el proceso de entrenamiento del modelo.

ARDRegression permite detectar y utilizar sólo las características más importantes para construir un modelo de regresión, lo que puede ser beneficioso cuando se trata de un gran número de características.

Principio de funcionamiento de ARDRegression:

Regresión lineal: ARDRegression se basa en la regresión lineal, asumiendo una relación lineal entre las variables independientes (características) y la variable objetivo.
Determinación automática de la importancia de las características: La principal distinción de ARDRegression es su determinación automática de qué características son las más importantes para predecir la variable objetivo. Esto se logra introduciendo distribuciones previas (regularización) sobre las ponderaciones, lo que permite que el modelo establezca automáticamente ponderaciones cero para características menos significativas.
Estimación de las probabilidades posteriores: ARDRegression calcula las probabilidades posteriores de cada característica, lo que permite determinar su importancia. Las características con probabilidades posteriores altas se consideran relevantes y reciben ponderaciones distintas de cero, mientras que las características con probabilidades posteriores bajas reciben ponderaciones cero.
Reducción de la dimensionalidad: Así, ARDRegression puede conducir a la reducción de la dimensionalidad de los datos mediante la eliminación de características insignificantes.

Ventajas de ARDRegression:

Determinación automática de características importantes: El método identifica y utiliza automáticamente sólo las características más importantes, mejorando potencialmente el rendimiento del modelo y reduciendo el riesgo de sobreajuste.
Resistencia a la multicolinealidad: ARDRegression maneja bien la multicolinealidad, incluso cuando las características están muy correlacionadas.

Limitaciones de ARDRegression:

Requiere selección de distribuciones previas: elegir distribuciones previas adecuadas puede requerir experimentación.
Complejidad computacional: El entrenamiento de ARDRegression puede ser costoso desde el punto de vista computacional, sobre todo para grandes conjuntos de datos.

ARDRegression es un método de regresión que determina automáticamente la importancia de las características y establece sus pesos basándose en probabilidades posteriores. Este método es útil cuando sólo se consideran las características significativas para construir un modelo de regresión y es necesario reducir la dimensionalidad de los datos.

2.1.1.1. Código para crear el modelo ARDRegression y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.ARDRegression, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX, y realiza predicciones usando datos de entrada float y double. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# ARDRegression.py
# The code demonstrates the process of training ARDRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ARDRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name="ARDRegression"
onnx_model_filename = data_path + "ard_regression"

# create an ARDRegression model
regression_model = ARDRegression()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)

print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)

print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

El script crea y entrena el modelo sklearn.linear_model.ARDRegression (el modelo original se considera en double), luego exporta el modelo a ONNX para float y double (ard_regression_float.onnx y ard_regression_double.onnx) y compara la precisión de su funcionamiento.

También genera los archivos ARDRegression_plot_float.png y ARDRegression_plot_double.png, que permiten una evaluación visual de los resultados de los modelos ONNX para float y double (Fig. 2-3).

Fig.2. Resultados de ARDRegression.py (float)

Fig.3. Resultados de ARDRegression.py (doble)

Visualmente, los modelos de ONNX para flotante y doble tienen el mismo aspecto (Fig. 2-3), puede encontrar información detallada en la pestaña Diario:

Python  ARDRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891289
Python  
Python  ARDRegression ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ard_regression_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382627587808
Python  Mean Absolute Error: 6.347568283744705
Python  Mean Squared Error: 49.778160054267204
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  ARDRegression ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ard_regression_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891289
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

En este ejemplo, el modelo original se consideró en double, luego se exportó a los modelos de ONNX ard_regression_float.onnx y ard_regression_double.onnx para float y double, respectivamente.

Si la precisión del modelo se evalúa mediante el Mean Absolute Error (MAE), la precisión del modelo ONNX para flotante es de hasta 6 decimales, mientras que el modelo ONNX que utiliza doble mostró una retención de precisión de hasta 15 decimales, en línea con la precisión del modelo original.

Las propiedades de los modelos ONNX pueden visualizarse en el MetaEditor (Fig. 4-5).

Fig.4. ard_regression_float.onnx ONNX-model en MetaEditor

Fig.5. ard_regression_double.onnx modelo ONNX en MetaEditor

Una comparación entre los modelos ONNX flotantes y dobles muestra que, en este caso, el cálculo de los modelos ONNX para ARDRegression se realiza de forma diferente: para los números flotantes, se utiliza el operador LinearRegressor() de ONNX-ML, mientras que para los números dobles, se utilizan los operadores ONNX MatMul(), Add() y Reshape() .

La implementación del modelo en ONNX depende del conversor; en los ejemplos de exportación a ONNX se utilizará la función skl2onnx.convert_sklearn() de la librería skl2onnx.

2.1.1.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados ard_regression_float.onnx y ard_regression_double.onnx ONNX y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                ARDRegression.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ARDRegression"
#define   ONNXFilenameFloat  "ard_regression_float.onnx"
#define   ONNXFilenameDouble "ard_regression_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

ARDRegression (EURUSD,H1)       Testing ONNX float: ARDRegression (ard_regression_float.onnx)
ARDRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382627587808
ARDRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475682837447049
ARDRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781600542671896
ARDRegression (EURUSD,H1)       
ARDRegression (EURUSD,H1)       Testing ONNX double: ARDRegression (ard_regression_double.onnx)
ARDRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382628120845
ARDRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475680128537597
ARDRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781593489128795

Comparación con el modelo doble original en Python:

Testing ONNX float: ARDRegression (ard_regression_float.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475682837447049
       
Testing ONNX double: ARDRegression (ard_regression_double.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475680128537597

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.1.3. Las representaciones ONNX de los modelos ard_regression_float.onnx y ard_regression_double.onnx

Netron (versión web) es una herramienta para visualizar modelos y analizar gráficos de computación, que puede utilizarse para modelos en formato ONNX (Open Neural Network Exchange).

Netron presenta gráficos de modelos y su arquitectura de forma clara e interactiva, lo que permite explorar la estructura y los parámetros de los modelos de aprendizaje profundo, incluidos los creados con ONNX.

Las principales características de Netron son:

Visualización de gráficos: Netron muestra la arquitectura del modelo en forma de gráfico, lo que permite ver las capas, las operaciones y las conexiones entre ellas. Puede comprender fácilmente la estructura y el flujo de datos dentro del modelo.
Exploración interactiva: Puede seleccionar nodos en el gráfico para obtener información adicional sobre cada operador y sus parámetros.
Compatibilidad con varios formatos: Netron admite diversos formatos de modelos de aprendizaje profundo, como ONNX, TensorFlow, PyTorch y CoreML, entre otros.
Capacidad de análisis de parámetros: Puede ver los parámetros y ponderaciones del modelo, lo que resulta útil para comprender los valores utilizados en las distintas partes del modelo.

Netron es conveniente para desarrolladores e investigadores en el campo del aprendizaje automático y el aprendizaje profundo, ya que simplifica la visualización y el análisis de modelos, ayudando en la comprensión y depuración de redes neuronales complejas.

Esta herramienta permite inspeccionar rápidamente los modelos, explorar su estructura y sus parámetros, facilitando el trabajo con redes neuronales profundas.

Para más detalles sobre Netron, consulte los artículos: Visualizar su red neuronal con Netron y Visualizar redes neuronales Keras con Netron.

Vídeo sobre Netron::

El modelo ard_regression_float.onnx se muestra en la Fig.6:

Fig.6. Representación ONNX del modelo ard_regression_float.onnx en Netron

El operador ai.onnx.ml LinearRegressor() ONNX forma parte del estándar ONNX, describiendo un modelo para tareas de regresión. Este operador se utiliza para la regresión, que consiste en predecir valores numéricos (continuos) a partir de características de entrada.

Toma como entrada los parámetros del modelo, como los pesos y el sesgo, junto con las características de entrada, y ejecuta la regresión lineal. La regresión lineal estima parámetros (ponderaciones) para cada característica de entrada y, a continuación, realiza una combinación lineal de estas características con las ponderaciones para generar una predicción.

Este operador realiza los siguientes pasos:

Toma las ponderaciones y el sesgo del modelo, junto con las características de entrada.
Para cada ejemplo de datos de entrada, realiza una combinación lineal de pesos con las características correspondientes.
Añade el sesgo al valor resultante.

El resultado es la predicción de la variable objetivo en la tarea de regresión.

Los parámetros de LinearRegressor() se muestran en la Fig.7.

Fig.7. Propiedades del operador LinearRegressor() del modelo ard_regression_float.onnx en Netron

El modelo ard_regression_double.onnx ONNX se muestra en la Fig.8:

Fig.8. Representación ONNX del modelo ard_regression_double.onnx en Netron

Los parámetros de los operadores ONNX MatMul(), Add() y Reshape() se muestran en la Fig.9-11.

Fig.9. Propiedades del operador MatMul en el modelo ard_regression_double.onnx de Netron

El operador MatMul (multiplicación de matrices) ONNX realiza la multiplicación de dos matrices.

Toma dos entradas: dos matrices y devuelve su producto matricial.

Si tenemos dos matrices, A y B, entonces el resultado de Matmul(A, B) es una matriz C, donde cada elemento C[i][j] se calcula como la suma de los productos de los elementos de la fila i de la matriz A por los elementos de la columna j de la matriz B.

Fig.10. Propiedades del operador Add en el modelo ard_regression_double.onnx de Netron

El operador Add() ONNX realiza la suma por elementos de dos tensores o matrices de la misma forma.

Toma dos entradas y devuelve el resultado, donde cada elemento del tensor resultante es igual a la suma de los elementos correspondientes de los tensores de entrada.

Fig.11. Propiedades del operador Reshape en el modelo ard_regression_double.onnx de Netron

El operador Reshape(-1,1) ONNX se utiliza para modificar la forma (o dimensión) de los datos de entrada. En este operador, el valor -1 para la dimensión indica que el tamaño de esa dimensión debe calcularse automáticamente basándose en las otras dimensiones para garantizar la coherencia de los datos.

El valor 1 en la segunda dimensión especifica que, tras la transformación de la forma, cada elemento tendrá una única subdimensión.

2.1.2. sklearn.linear_model.BayesianRidge

BayesianRidge es un método de regresión que utiliza un enfoque bayesiano para estimar los parámetros del modelo. Este método permite modelar la distribución previa de parámetros y actualizarla considerando los datos para obtener la distribución posterior de parámetros.

BayesianRidge es un método de regresión bayesiano diseñado para predecir la variable dependiente basándose en una o varias variables independientes.

Principio de funcionamiento de BayesianRidge:

Distribución previa de parámetros: Comienza con la definición de la distribución previa de parámetros del modelo. Esta distribución representa el conocimiento previo o las suposiciones sobre los parámetros del modelo antes de considerar los datos. En el caso de BayesianRidge se utilizan distribuciones previas en forma gaussiana.
Actualización de la distribución de parámetros: una vez establecida la distribución de parámetros anterior, se actualiza en función de los datos. Para ello se utiliza la teoría bayesiana, en la que la distribución posterior de los parámetros se calcula teniendo en cuenta los datos. Un aspecto esencial es la estimación de los hiperparámetros, que influyen en la forma de la distribución posterior.
Predicción: Después de estimar la distribución posterior de los parámetros, se pueden hacer predicciones para nuevas observaciones. El resultado es una distribución de las previsiones en lugar de un valor puntual único, lo que permite tener en cuenta la incertidumbre de las predicciones.

Ventajas de BayesianRidge:

Consideración de la incertidumbre: BayesianRidge tiene en cuenta la incertidumbre en los parámetros y predicciones del modelo. En lugar de predicciones puntuales, se proporcionan intervalos de confianza.
Regularización: El método de regresión bayesiano puede ser útil para la regularización del modelo, ayudando a prevenir el sobreajuste.
Selección automática de características: BayesianRidge puede determinar automáticamente la importancia de las características reduciendo los pesos de las características insignificantes.

Limitaciones de BayesianRidge:

Complejidad computacional: El método requiere recursos computacionales para estimar los parámetros y calcular la distribución posterior.
Alto nivel de abstracción: Puede ser necesario un conocimiento más profundo de la estadística bayesiana para comprender y utilizar BayesianRidge.
No siempre es la mejor opción: BayesianRidge puede no ser el método más adecuado en determinadas tareas de regresión, sobre todo cuando se trata de datos limitados.

BayesianRidge es útil en tareas de regresión en las que la incertidumbre de los parámetros y las predicciones es importante y en los casos en los que es necesaria la regularización del modelo.

2.1.2.1. Código para crear el modelo BayesianRidge y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.BayesianRidge, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX, y realiza predicciones usando datos de entrada float y double. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# BayesianRidge.py
# The code demonstrates the process of training BayesianRidge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "BayesianRidge"
onnx_model_filename = data_path + "bayesian_ridge"

# create a Bayesian Ridge regression model
regression_model = BayesianRidge()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ", compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  BayesianRidge Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891288
Python  
Python  BayesianRidge ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bayesian_ridge_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382627587808
Python  Mean Absolute Error: 6.347568283744705
Python  Mean Squared Error: 49.778160054267204
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  BayesianRidge ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bayesian_ridge_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382628120845
Python  Mean Absolute Error: 6.347568012853758
Python  Mean Squared Error: 49.77815934891288
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.12. Resultados de BayesianRidge.py (float ONNX)

2.1.2.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados bayesian_ridge_float.onnx y bayesian_ridge_double.onnx ONNX y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                BayesianRidge.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "BayesianRidge"
#define   ONNXFilenameFloat  "bayesian_ridge_float.onnx"
#define   ONNXFilenameDouble "bayesian_ridge_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

BayesianRidge (EURUSD,H1)       Testing ONNX float: BayesianRidge (bayesian_ridge_float.onnx)
BayesianRidge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382627587808
BayesianRidge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475682837447049
BayesianRidge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781600542671896
BayesianRidge (EURUSD,H1)       
BayesianRidge (EURUSD,H1)       Testing ONNX double: BayesianRidge (bayesian_ridge_double.onnx)
BayesianRidge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382628120845
BayesianRidge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3475680128537624
BayesianRidge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781593489128866

Comparación con el modelo doble original en Python:

Testing ONNX float: BayesianRidge (bayesian_ridge_float.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475682837447049

Testing ONNX double: BayesianRidge (bayesian_ridge_double.onnx)
Python  Mean Absolute Error: 6.347568012853758
MQL5:   Mean Absolute Error: 6.3475680128537624

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.2.3. Representación ONNX de bayesian_ridge_float.onnx y bayesian_ridge_double.onnx

Fig.13. Representación ONNX del bayesian_ridge_float.onnx en Netron

Fig.14. Representación ONNX de bayesian_ridge_double.onnx en Netron

Nota sobre los métodos ElasticNet y ElasticNetCV

ElasticNet y ElasticNetCV son dos métodos de aprendizaje automático relacionados que se utilizan para regularizar modelos de regresión, especialmente de regresión lineal. Comparten una funcionalidad común pero difieren en su forma de uso y aplicación.

ElasticNet (Elastic Net Regression):

Principio de funcionamiento: ElasticNet es un método de regresión que combina Lasso (regularización L1) y Ridge (regularización L2). Añade dos componentes de regularización a la función de pérdida: uno penaliza el modelo por grandes valores absolutos de coeficientes (como Lasso), y el otro penaliza el modelo por grandes cuadrados de coeficientes (como Ridge).
ElasticNet se suele utilizar cuando hay multicolinealidad en los datos (cuando las características están muy correlacionadas) y cuando es necesario reducir la dimensionalidad, así como controlar los valores de los coeficientes.

ElasticNetCV (Elastic Net Cross-Validation):

Principio de funcionamiento: ElasticNetCV es una extensión de ElasticNet que consiste en seleccionar automáticamente los hiperparámetros óptimos alpha (el coeficiente de mezcla entre la regularización L1 y L2) y lambda (la fuerza de regularización) mediante validación cruzada. Itera a través de varios valores alfa y lambda, eligiendo la combinación que obtiene mejores resultados en la validación cruzada.
Ventajas: ElasticNetCV ajusta automáticamente los parámetros del modelo basándose en la validación cruzada, lo que permite seleccionar los valores óptimos de los hiperparámetros sin necesidad de un ajuste manual. Esto lo hace más cómodo de usar y ayuda a evitar el sobreajuste del modelo.

Así, la principal diferencia entre ElasticNet y ElasticNetCV es que ElasticNet es el método de regresión aplicado a los datos, mientras que ElasticNetCV es una herramienta que encuentra automáticamente los valores óptimos de los hiperparámetros para el modelo ElasticNet mediante validación cruzada. ElasticNetCV es útil cuando se necesita encontrar los mejores parámetros del modelo y automatizar el proceso de ajuste.

2.1.3. sklearn.linear_model.ElasticNet

ElasticNet es un método de regresión que representa una combinación de regularización L1 (Lasso) y L2 (Ridge).

Este método se utiliza para la regresión, que consiste en predecir valores numéricos de una variable objetivo a partir de un conjunto de características. ElasticNet ayuda a controlar el sobreajuste y tiene en cuenta las penalizaciones L1 y L2 en los coeficientes del modelo.

Principio de funcionamiento de ElasticNet:

Datos de entrada: Se parte del conjunto de datos original donde tenemos características (variables independientes) y valores correspondientes de la variable objetivo.
Función objetivo: ElasticNet minimiza la función de pérdida que incluye dos componentes - Mean Squared Error (MSE) y dos regularizaciones: L1 (Lasso) y L2 (Ridge). Esto significa que la función objetivo tiene este aspecto:
Función objetivo = MSE + α * L1 + β * L2
Donde α y β son hiperparámetros que controlan los pesos de la regularización L1 y L2, respectivamente.
Encontrar α y β óptimos: El método de validación cruzada se suele utilizar para encontrar los mejores valores de α y β. Esto permite seleccionar valores que logren un equilibrio entre la reducción del sobreajuste y la preservación de las características esenciales.
Entrenamiento del modelo: ElasticNet entrena el modelo considerando los óptimos α y β minimizando la función objetivo.
Predicción: Una vez entrenado el modelo, ElasticNet puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de ElasticNet:

Capacidad de selección de características: ElasticNet puede seleccionar automáticamente las características más importantes estableciendo pesos a cero para las características insignificantes (similar a Lasso).
Control de sobreajuste: ElasticNet permite controlar el sobreajuste gracias a la regularización L1 y L2.
Tratamiento de la multicolinealidad: Este método es útil cuando existe multicolinealidad (alta correlación entre características) ya que la regularización L2 puede reducir la influencia de las características multicolineales.

Limitaciones de ElasticNet:

Requiere el ajuste de los hiperparámetros α y β, lo que puede ser una tarea no trivial.
Dependiendo de los parámetros elegidos, ElasticNet puede retener muy pocas o demasiadas características, afectando a la calidad del modelo.

ElasticNet es un potente método de regresión que puede resultar beneficioso en tareas en las que la selección de características y el control del sobreajuste son cruciales.

2.1.3.1. Código para crear el modelo ElasticNet y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.ElasticNet, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# ElasticNet.py
# The code demonstrates the process of training ElasticNet model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ElasticNet
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ElasticNet"
onnx_model_filename = data_path + "elastic_net"

# create an ElasticNet model
regression_model = ElasticNet()

# fit the model to the data
regression_model.fit(X,y)

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  ElasticNet Original model (double)
Python  R-squared (Coefficient of determination): 0.9962377031744798
Python  Mean Absolute Error: 6.344394662876524
Python  Mean Squared Error: 49.78556489812415
Python  
Python  ElasticNet ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962377032416807
Python  Mean Absolute Error: 6.344395027824294
Python  Mean Squared Error: 49.78556400887057
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  6
Python  float ONNX model precision:  5
Python  
Python  ElasticNet ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962377031744798
Python  Mean Absolute Error: 6.344394662876524
Python  Mean Squared Error: 49.78556489812415
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.15. Resultados de ElasticNet.py (float ONNX)

2.1.3.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados elastic_net_double.onnx y elastic_net_float.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                   ElasticNet.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ElasticNet"
#define   ONNXFilenameFloat  "elastic_net_float.onnx"
#define   ONNXFilenameDouble "elastic_net_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

ElasticNet (EURUSD,H1)  Testing ONNX float: ElasticNet (elastic_net_float.onnx)
ElasticNet (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9962377032416807
ElasticNet (EURUSD,H1)  MQL5:   Mean Absolute Error: 6.3443950278242944
ElasticNet (EURUSD,H1)  MQL5:   Mean Squared Error: 49.7855640088705869
ElasticNet (EURUSD,H1)  
ElasticNet (EURUSD,H1)  Testing ONNX double: ElasticNet (elastic_net_double.onnx)
ElasticNet (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9962377031744798
ElasticNet (EURUSD,H1)  MQL5:   Mean Absolute Error: 6.3443946628765220
ElasticNet (EURUSD,H1)  MQL5:   Mean Squared Error: 49.7855648981241217

Comparación con el modelo doble original en Python:

Testing ONNX float: ElasticNet (elastic_net_float.onnx)
Python  Mean Absolute Error: 6.344394662876524
MQL5:   Mean Absolute Error: 6.3443950278242944
  
Testing ONNX double: ElasticNet (elastic_net_double.onnx)
Python  Mean Absolute Error: 6.344394662876524
MQL5:   Mean Absolute Error: 6.3443946628765220

Precisión de ONNX float MAE: 5 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.3.3. Representación ONNX de elastic_net_float.onnx y elastic_net_double.onnx

Fig.16. Representación ONNX del elastic_net_float.onnx en Netron

Fig.17. Representación ONNX de elastic_net_double.onnx en Netron

2.1.4. sklearn.linear_model.ElasticNetCV

ElasticNetCV es una extensión del método ElasticNet diseñada para seleccionar automáticamente los valores óptimos de los hiperparámetros α y β (regularización L1 y L2) mediante validación cruzada

Esto permite encontrar la mejor combinación de regularizaciones para el modelo ElasticNet sin necesidad de ajustar manualmente los parámetros.

Principio de funcionamiento de ElasticNetCV:

Datos de entrada: Comienza con el conjunto de datos original que contiene características (variables independientes) y sus correspondientes valores de variable objetivo.
Definición del rango de α y β: El usuario especifica el rango de valores para α y β a considerar durante la optimización. Estos valores suelen elegirse en una escala logarítmica.
División de datos: El conjunto de datos se divide en múltiples pliegues para la validación cruzada. Cada pliegue se utiliza como conjunto de datos de prueba, mientras que los demás se utilizan para el entrenamiento.
Validación cruzada: Para cada combinación de α y β dentro del rango especificado, se realiza una validación cruzada. El modelo ElasticNet se entrena con los datos de entrenamiento y luego se evalúa con los datos de prueba.
Evaluación del rendimiento: Se calcula el error medio en los conjuntos de datos de prueba en la validación cruzada para cada combinación α y β.
Selección de los parámetros óptimos: Se determinan los valores de α y β correspondientes al error medio mínimo obtenido durante la validación cruzada.
Entrenamiento del modelo con parámetros óptimos: El modelo ElasticNetCV se entrena utilizando los valores óptimos encontrados de α y β.
Predicción: Tras el entrenamiento el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de ElasticNetCV:

Selección automática de hiperparámetros: ElasticNetCV encuentra automáticamente los valores óptimos de α y β, lo que simplifica el ajuste del modelo.
Prevención del sobreajuste: La validación cruzada ayuda a seleccionar un modelo con buena capacidad de generalización.
Robustez frente al ruido: Este método es robusto frente al ruido de los datos y puede identificar la mejor combinación de regularizaciones teniendo en cuenta el ruido.

Limitaciones de ElasticNetCV:

Complejidad computacional: La validación cruzada de un amplio rango de parámetros puede llevar mucho tiempo.
Los parámetros óptimos dependen de la elección del rango: Los resultados podrían depender de la elección del rango α y β, por lo que es importante ajustar cuidadosamente este rango.

ElasticNetCV es una potente herramienta para ajustar automáticamente la regularización en el modelo ElasticNet y mejorar su rendimiento.

2.1.4.1. Código para crear el modelo ElasticNetCV y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.ElasticNetCV, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# ElasticNetCV.py
# The code demonstrates the process of training ElasticNetCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ElasticNetCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ElasticNetCV"
onnx_model_filename = data_path + "elastic_net_cv"

# create an ElasticNetCV model
regression_model = ElasticNetCV()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  ElasticNetCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962137763338385
Python  Mean Absolute Error: 6.334487104423225
Python  Mean Squared Error: 50.10218299945999
Python  
Python  ElasticNetCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962137770260989
Python  Mean Absolute Error: 6.334486542922601
Python  Mean Squared Error: 50.10217383894468
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  ElasticNetCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962137763338385
Python  Mean Absolute Error: 6.334487104423225
Python  Mean Squared Error: 50.10218299945999
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

< Fig.18. Resultados de ElasticNetCV.py (float ONNX)

Fig.18. Resultados de ElasticNetCV.py (float ONNX)

2.1.4.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados elastic_net_cv_float.onnx y elastic_net_cv_double.onnx ONNX y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                 ElasticNetCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ElasticNetCV"
#define   ONNXFilenameFloat  "elastic_net_cv_float.onnx"
#define   ONNXFilenameDouble "elastic_net_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

ElasticNetCV (EURUSD,H1)        Testing ONNX float: ElasticNetCV (elastic_net_cv_float.onnx)
ElasticNetCV (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962137770260989
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3344865429226038
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Squared Error: 50.1021738389446938
ElasticNetCV (EURUSD,H1)        
ElasticNetCV (EURUSD,H1)        Testing ONNX double: ElasticNetCV (elastic_net_cv_double.onnx)
ElasticNetCV (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962137763338385
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3344871044232205
ElasticNetCV (EURUSD,H1)        MQL5:   Mean Squared Error: 50.1021829994599983

Comparación con el modelo doble original en Python:

Testing ONNX float: ElasticNetCV (elastic_net_cv_float.onnx)
Python  Mean Absolute Error: 6.334487104423225
MQL5:   Mean Absolute Error: 6.3344865429226038

Testing ONNX double: ElasticNetCV (elastic_net_cv_double.onnx)
Python  Mean Absolute Error: 6.334487104423225
MQL5:   Mean Absolute Error: 6.3344871044232205

Precisión de ONNX float MAE: 5 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.4.3. Representación ONNX de elastic_net_cv_float.onnx y elastic_net_cv_double.onnx

Fig.19. Representación ONNX del elastic_net_cv_float.onnx en Netron

Fig.20. Representación ONNX del elastic_net_cv_double.onnx en Netron

2.1.5. sklearn.linear_model.HuberRegressor

HuberRegressor - es un método de aprendizaje automático utilizado para tareas de regresión, que es una modificación del método OLS (Ordinary Least Squares / Mínimos cuadrados ordinarios) y está diseñado para ser robusto a los valores atípicos en los datos.

A diferencia de OLS, que minimiza los cuadrados de los errores, HuberRegressor minimiza una combinación de errores al cuadrado y errores absolutos. Esto permite que el método funcione con mayor solidez en presencia de valores atípicos en los datos.

Principio de funcionamiento del HuberRegressor:

Datos de entrada: Se parte del conjunto de datos original, donde hay características (variables independientes) y sus correspondientes valores de variable objetivo.
Función de pérdida de Huber: HuberRegressor utiliza la función de pérdida de Huber, que combina una función de pérdida cuadrática para errores pequeños y una función de pérdida lineal para errores grandes. Esto hace que el método sea más resistente a los valores atípicos.
Entrenamiento del modelo: El modelo se entrena con datos utilizando la función de pérdida de Huber. Durante el entrenamiento, ajusta los pesos (coeficientes) de cada característica y el sesgo.
Predicción: Tras el entrenamiento el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas del HuberRegressor:

Robustez frente a valores atípicos: HuberRegressor es más robusto a los valores atípicos en los datos en comparación con OLS, por lo que es útil en tareas en las que los datos pueden contener valores anómalos.
Estimación de errores: La función de pérdida de Huber contribuye a la estimación de los errores de predicción, lo que puede ser útil para analizar los resultados del modelo.
Nivel de regularización: HuberRegressor también puede incorporar un nivel de regularización, que puede reducir el sobreajuste.

Limitaciones del HuberRegressor:

No es tan preciso como OLS en ausencia de valores atípicos: En los casos en que no hay valores atípicos en los datos, OLS podría proporcionar resultados más precisos.
Ajuste de parámetros: HuberRegressor tiene un parámetro que define el umbral de lo que se considera "grande" para cambiar a la función de pérdida lineal. Este parámetro requiere ajuste.

HuberRegressor es valioso en tareas de regresión en las que los datos pueden contener valores atípicos y se requiere un modelo que sea robusto ante tales anomalías.

2.1.5.1. Código para crear el modelo HuberRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.HuberRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# HuberRegressor.py
# The code demonstrates the process of training HuberRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import HuberRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "HuberRegressor"
onnx_model_filename = data_path + "huber_regressor"

# create a Huber Regressor model
huber_regressor_model = HuberRegressor()

# fit the model to the data
huber_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = huber_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(huber_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(huber_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  HuberRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9962363935647066
Python  Mean Absolute Error: 6.341633708569641
Python  Mean Squared Error: 49.80289464784336
Python  
Python  HuberRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\huber_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962363944236795
Python  Mean Absolute Error: 6.341633300252807
Python  Mean Squared Error: 49.80288328126165
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  HuberRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\huber_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962363935647066
Python  Mean Absolute Error: 6.341633708569641
Python  Mean Squared Error: 49.80289464784336
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.21. Resultados del HuberRegressor.py (float ONNX)

2.1.5.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta el guardado huber_regressor_float.onnx y huber_regressor_double.onnx ONNX modelos y demostrando el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                               HuberRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "HuberRegressor"
#define   ONNXFilenameFloat  "huber_regressor_float.onnx"
#define   ONNXFilenameDouble "huber_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

HuberRegressor (EURUSD,H1)      Testing ONNX float: HuberRegressor (huber_regressor_float.onnx)
HuberRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962363944236795
HuberRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3416333002528074
HuberRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 49.8028832812616571
HuberRegressor (EURUSD,H1)      
HuberRegressor (EURUSD,H1)      Testing ONNX double: HuberRegressor (huber_regressor_double.onnx)
HuberRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962363935647066
HuberRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3416337085696410
HuberRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 49.8028946478433525

Comparación con el modelo doble original en Python:

Testing ONNX float: HuberRegressor (huber_regressor_float.onnx)
Python  Mean Absolute Error: 6.341633708569641
MQL5:   Mean Absolute Error: 6.3416333002528074
      
Testing ONNX double: HuberRegressor (huber_regressor_double.onnx)
Python  Mean Absolute Error: 6.341633708569641
MQL5:   Mean Absolute Error: 6.3416337085696410

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.5.3. Representación ONNX de huber_regressor_float.onnx y huber_regressor_double.onnx

Fig.22. Representación ONNX del huber_regressor_float.onnx en Netron

Fig.23. Representación ONNX del huber_regressor_double.onnx en Netron

2.1.6. sklearn.linear_model.Lars

LARS (Least Angle Regression / Regresión de ángulo mínimo) es un método de aprendizaje automático utilizado para tareas de regresión. Es un algoritmo que construye un modelo de regresión lineal seleccionando características activas (variables) durante el proceso de aprendizaje
.
LARS intenta encontrar el menor número de características que proporcionen la mejor aproximación a la variable objetivo.

Principio de funcionamiento de LARS:

Datos de entrada: Comienza con el conjunto de datos original, que comprende características (variables independientes) y sus correspondientes valores de variable objetivo.
Inicialización: Comienza con un modelo nulo, es decir, sin características activas. Todos los coeficientes se ponen a cero.
Selección de características: En cada paso, LARS selecciona la característica más correlacionada con los residuos del modelo. A continuación, esta característica se añade al modelo y su coeficiente correspondiente se ajusta mediante el método de mínimos cuadrados.
Regresión a lo largo de las características activas: Después de añadir la característica al modelo, LARS actualiza los coeficientes de todas las características activas para acomodar los cambios en el nuevo modelo.
Pasos repetitivos: Este proceso continúa hasta que se seleccionan todas las características o se cumple un criterio de parada especificado.
Predicción: Tras el entrenamiento del modelo, puede utilizarse para predecir los valores de la variable objetivo para nuevos datos.

Ventajas de LARS:

Eficacia: LARS puede ser un método eficiente, especialmente cuando hay muchas características, pero sólo unas pocas afectan significativamente a la variable objetivo.
Interpretabilidad: Dado que LARS pretende seleccionar sólo las características más informativas, el modelo sigue siendo relativamente interpretable.

Limitaciones de LARS:

Modelo lineal: LARS construye un modelo lineal, que puede ser insuficiente para modelar relaciones no lineales complejas.
Sensibilidad al ruido: El método puede ser sensible a los valores atípicos en los datos.
Incapacidad para manejar la multicolinealidad: Si las características están muy correlacionadas, LARS puede encontrar problemas de multicolinealidad.

LARS resulta útil en tareas de regresión en las que es esencial seleccionar las características más informativas y construir un modelo lineal con un número mínimo de características.

2.1.6.1. Código para crear el modelo Lars y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.Lars, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# Lars.py
# The code demonstrates the process of training Lars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lars
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "Lars"
onnx_model_filename = data_path + "lars"

# create a Lars Regressor model
lars_regressor_model = Lars()

# fit the model to the data
lars_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lars_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lars_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lars_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  Lars Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  
Python  Lars ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  Lars ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  15

Fig.24. Resultados de Lars.py (float ONNX)

2.1.6.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados lars_cv_float.onnx y lars_cv_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                         Lars.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "Lars"
#define   ONNXFilenameFloat  "lars_float.onnx"
#define   ONNXFilenameDouble "lars_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

Lars (EURUSD,H1)        Testing ONNX float: Lars (lars_float.onnx)
Lars (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
Lars (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3477377671679385
Lars (EURUSD,H1)        MQL5:   Mean Squared Error: 49.7781414740478638
Lars (EURUSD,H1)        
Lars (EURUSD,H1)        Testing ONNX double: Lars (lars_double.onnx)
Lars (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
Lars (EURUSD,H1)        MQL5:   Mean Absolute Error: 6.3477379263364302
Lars (EURUSD,H1)        MQL5:   Mean Squared Error: 49.7781401712817768

Comparación con el modelo doble original en Python:

Testing ONNX float: Lars (lars_float.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477377671679385

Testing ONNX double: Lars (lars_double.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477379263364302

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.6.3. Representación ONNX de lars_float.onnx y lars_double.onnx

Fig.25. Representación ONNX de lars_float.onnx en Netron

Fig.26. Representación ONNX de lars_double.onnx en Netron

2.1.7. sklearn.linear_model.LarsCV

LarsCV es una variación del método LARS (Least Angle Regression / Regresión de ángulo mínimo) que selecciona automáticamente el número óptimo de características a incluir en el modelo mediante validación cruzada.

Este método ayuda a encontrar un equilibrio entre un modelo que generalice los datos de forma eficaz y otro que utilice un número mínimo de características.

Principio de funcionamiento de LarsCV:

Datos de entrada: Comienza con el conjunto de datos original, que comprende características (variables independientes) y sus correspondientes valores de variable objetivo.
Inicialización: Comienza con un modelo nulo, lo que significa que no hay características activas. Todos los coeficientes se ponen a cero.
Validación cruzada: LarsCV realiza una validación cruzada para diferentes cantidades de características incluidas. Así se evalúa el rendimiento del modelo con distintos conjuntos de características.
Selección del número óptimo de características: LarsCV elige el número de características que produce el mejor rendimiento del modelo, determinado mediante validación cruzada.
Entrenamiento del modelo: El modelo se entrena utilizando el número elegido de características y sus respectivos coeficientes.
Predicción: Tras el entrenamiento el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de LarsCV:

Selección automática de características: LarsCV elige automáticamente el número óptimo de características, simplificando el proceso de configuración del modelo.
Interpretabilidad: Al igual que el LARS normal, LarsCV mantiene una interpretabilidad del modelo relativamente alta.
Eficacia: El método puede ser eficiente, especialmente cuando los conjuntos de datos tienen muchas características, pero sólo unas pocas son significativas.

Limitaciones de LarsCV:

Modelo lineal: LarsCV construye un modelo lineal, que puede ser insuficiente para modelar relaciones no lineales complejas.
Sensibilidad al ruido: El método puede ser sensible a los valores atípicos en los datos.
Incapacidad para manejar la multicolinealidad: Si las características están muy correlacionadas, LarsCV puede encontrar problemas de multicolinealidad.

LarsCV es útil en tareas de regresión en las que es importante elegir automáticamente el mejor conjunto de características utilizadas en el modelo y mantener la interpretabilidad del mismo.

2.1.7.1. Código para crear el modelo LarsCV y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.LarsCV, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# LarsCV.py
# The code demonstrates the process of training LarsCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LarsCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LarsCV"
onnx_model_filename = data_path + "lars_cv"

# create a LarsCV Regressor model
larscv_regressor_model = LarsCV()

# fit the model to the data
larscv_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = larscv_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(larscv_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(larscv_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  LarsCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  
Python  LarsCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382640824089
Python  Mean Absolute Error: 6.347737845846069
Python  Mean Squared Error: 49.778142539016564
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LarsCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  16

Fig.27. Resultados de LarsCV.py (float ONNX)

2.1.7.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados lars_cv_float.onnx y lars_cv_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                       LarsCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LarsCV"
#define   ONNXFilenameFloat  "lars_cv_float.onnx"
#define   ONNXFilenameDouble "lars_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

LarsCV (EURUSD,H1)      Testing ONNX float: LarsCV (lars_cv_float.onnx)
LarsCV (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962382640824089
LarsCV (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3477378458460691
LarsCV (EURUSD,H1)      MQL5:   Mean Squared Error: 49.7781425390165566
LarsCV (EURUSD,H1)      
LarsCV (EURUSD,H1)      Testing ONNX double: LarsCV (lars_cv_double.onnx)
LarsCV (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.9962382642612767
LarsCV (EURUSD,H1)      MQL5:   Mean Absolute Error: 6.3477379221400145
LarsCV (EURUSD,H1)      MQL5:   Mean Squared Error: 49.7781401721031642

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: LarsCV (lars_cv_float.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477378458460691

Testing ONNX double: LarsCV (lars_cv_double.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477379221400145

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 16 decimales.

2.1.7.3. Representación ONNX de lars_cv_float.onnx y lars_cv_double.onnx

Fig.28. Representación ONNX de lars_cv_float.onnx en Netron

Fig.29. Representación ONNX de lars_cv_double.onnx en Netron

2.1.8. sklearn.linear_model.Lasso

Lasso (Least Absolute Shrinkage and Selection Operator / Operador de selección y contracción mínima absoluta) es un método de regresión utilizado para seleccionar las características más importantes y reducir la dimensionalidad del modelo.

Lo consigue añadiendo una penalización por la suma de los valores absolutos de los coeficientes (regularización L1) en el problema de optimización de regresión lineal.

Principio de funcionamiento de Lasso:

Datos de entrada: Comienza con el conjunto de datos original, incluyendo características (variables independientes) y sus correspondientes valores de variable objetivo.
Función objetivo: La función objetivo en Lasso incluye la suma de los errores de regresión al cuadrado y una penalización sobre la suma de los valores absolutos de los coeficientes asociados a las características.
Optimización: El modelo Lasso se entrena minimizando la función objetivo, lo que hace que algunos coeficientes se conviertan en cero, excluyendo de hecho las características correspondientes del modelo.
Selección del valor de penalización óptimo: Lasso incluye un hiperparámetro que determina la intensidad de la regularización. La elección del valor óptimo de este hiperparámetro puede requerir una validación cruzada.
Generación de predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de Lasso:

Selección de características: Lasso selecciona automáticamente las características más importantes, excluyendo del modelo las menos significativas. Esto reduce la dimensionalidad de los datos y simplifica el modelo.
Regularización: La penalización sobre la suma de los valores absolutos de los coeficientes ayuda a evitar el sobreajuste del modelo y mejora su generalización.
Interpretabilidad: Como Lasso excluye algunas características, el modelo sigue siendo relativamente interpretable.

Limitaciones de Lasso:

Modelo lineal: Lasso construye un modelo lineal, que puede ser insuficiente para modelar relaciones no lineales complejas.
Sensibilidad al ruido: El método puede ser sensible a los valores atípicos en los datos.
Incapacidad para manejar la multicolinealidad: Si las características están muy correlacionadas, Lasso puede encontrar problemas de multicolinealidad.

Lasso es útil en tareas de regresión en las que es esencial seleccionar las características más importantes y reducir la dimensionalidad del modelo manteniendo la interpretabilidad.

2.1.8.1. Código para crear el modelo Lasso y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.Lasso, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# Lasso.py
# The code demonstrates the process of training Lasso model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "Lasso"
onnx_model_filename = data_path + "lasso"

# create a Lasso model
lasso_model = Lasso()

# fit the model to the data
lasso_model.fit(X, y)

# predict values for the entire dataset
y_pred = lasso_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lasso_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lasso_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  Lasso Original model (double)
Python  R-squared (Coefficient of determination): 0.9962381735682287
Python  Mean Absolute Error: 6.346393791922984
Python  Mean Squared Error: 49.77934029129379
Python  
Python  Lasso ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962381720269486
Python  Mean Absolute Error: 6.346395056911361
Python  Mean Squared Error: 49.77936068668213
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  Lasso ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962381735682287
Python  Mean Absolute Error: 6.346393791922984
Python  Mean Squared Error: 49.77934029129379
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.30. Resultados del Lasso.py (float ONNX)

2.1.8.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta el guardado lasso_float.onnx y lasso_double.onnx y demostrando el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                        Lasso.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "Lasso"
#define   ONNXFilenameFloat  "lasso_float.onnx"
#define   ONNXFilenameDouble "lasso_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

Lasso (EURUSD,H1)       Testing ONNX float: Lasso (lasso_float.onnx)
Lasso (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962381720269486
Lasso (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3463950569113612
Lasso (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7793606866821037
Lasso (EURUSD,H1)       
Lasso (EURUSD,H1)       Testing ONNX double: Lasso (lasso_double.onnx)
Lasso (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962381735682287
Lasso (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3463937919229840
Lasso (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7793402912937850

Comparación con el modelo doble original en Python:

Testing ONNX float: Lasso (lasso_float.onnx)
Python  Mean Absolute Error: 6.346393791922984
MQL5:   Mean Absolute Error: 6.3463950569113612

Testing ONNX double: Lasso (lasso_double.onnx)
Python  Mean Absolute Error: 6.346393791922984
MQL5:   Mean Absolute Error: 6.3463937919229840

Precisión de ONNX float MAE: 5 decimales, Precisión de ONNX double MAE: 15 decimales.

2.1.8.3. Representación ONNX de lasso_float.onnx y lasso_double.onnx

Fig.31. Representación ONNX del lasso_float.onnx en Netron

Fig.32. Representación ONNX del lasso_double.onnx en Netron

2.1.9. sklearn.linear_model.LassoCV

LassoCV es una variante del método Lasso (Least Absolute Shrinkage and Selection Operator / Operador de selección y contracción mínima absoluta) que selecciona automáticamente el valor óptimo del hiperparámetro de regularización (alfa) mediante validación cruzada.

Este método permite encontrar un equilibrio entre la reducción de la dimensionalidad del modelo (selección de características importantes) y la prevención del sobreajuste, lo que lo hace útil para tareas de regresión.

Principio de funcionamiento de LassoCV:

Datos de entrada: Comienza con el conjunto de datos original, incluyendo características (variables independientes) y sus correspondientes valores de variable objetivo.
Inicialización: LassoCV inicializa varios valores diferentes del hiperparámetro de regularización (alfa) que cubren un rango de bajo a alto.
Validación cruzada: Para cada valor alfa, LassoCV realiza una validación cruzada para evaluar el rendimiento del modelo. Se suelen utilizar parámetros como el MSE (Mean Squared Error / Error medio cuadrado) o el coeficiente de determinación (R²).
Selección del alfa óptimo: LassoCV selecciona el valor alfa en el que el modelo alcanza el mejor rendimiento determinado por validación cruzada.
Entrenamiento del modelo: El modelo Lasso se entrena utilizando el valor alfa elegido, excluyendo las características menos importantes y aplicando la regularización L1.
Generación de predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de LassoCV:

Selección automática de alfa: LassoCV selecciona automáticamente el valor alfa óptimo mediante validación cruzada, lo que simplifica el ajuste del modelo.
Selección de características: LassoCV elige automáticamente las características más importantes, reduciendo la dimensionalidad del modelo y simplificando su interpretación.
Regularización: El método evita el sobreajuste del modelo mediante la regularización L1.

Limitaciones de LassoCV:

Modelo lineal: LassoCV construye un modelo lineal, que puede ser insuficiente para modelar relaciones no lineales complejas.
Sensibilidad al ruido: El método puede ser sensible a los valores atípicos en los datos.
Incapacidad para manejar la multicolinealidad: Cuando las características están muy correlacionadas, LassoCV puede enfrentarse a problemas de multicolinealidad.

LassoCV resulta beneficioso en tareas de regresión en las que es importante seleccionar las características más importantes y reducir la dimensionalidad del modelo, al tiempo que se mantiene la interpretabilidad y se evita el sobreajuste.

2.1.9.1. Código para crear el modelo LassoCV y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.LassoCV, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# LassoCV.py
# The code demonstrates the process of training LassoCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LassoCV"
onnx_model_filename = data_path + "lasso_cv"

# create a LassoCV Regressor model
lassocv_regressor_model = LassoCV()

# fit the model to the data
lassocv_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lassocv_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lassocv_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lassocv_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  LassoCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962241428413416
Python  Mean Absolute Error: 6.33567334453819
Python  Mean Squared Error: 49.96500551028169
Python  
Python  LassoCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.996224142876629
Python  Mean Absolute Error: 6.335673221332177
Python  Mean Squared Error: 49.96500504333324
Python  R^2 matching decimal places:  10
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  6
Python  float ONNX model precision:  6
Python  
Python  LassoCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962241428413416
Python  Mean Absolute Error: 6.33567334453819
Python  Mean Squared Error: 49.96500551028169
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  14

Fig.33. Resultados de LassoCV.py (float ONNX)

2.1.9.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados lasso_cv_float.onnx y lasso_cv_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                      LassoCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoCV"
#define   ONNXFilenameFloat  "lasso_cv_float.onnx"
#define   ONNXFilenameDouble "lasso_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

2023.10.26 22:14:00.736 LassoCV (EURUSD,H1)     Testing ONNX float: LassoCV (lasso_cv_float.onnx)
2023.10.26 22:14:00.739 LassoCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962241428766290
2023.10.26 22:14:00.739 LassoCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3356732213321800
2023.10.26 22:14:00.739 LassoCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.9650050433332211
2023.10.26 22:14:00.748 LassoCV (EURUSD,H1)     
2023.10.26 22:14:00.748 LassoCV (EURUSD,H1)     Testing ONNX double: LassoCV (lasso_cv_double.onnx)
2023.10.26 22:14:00.753 LassoCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962241428413416
2023.10.26 22:14:00.753 LassoCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3356733445381899
2023.10.26 22:14:00.753 LassoCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.9650055102816992

Comparación con el modelo doble original en Python:

Testing ONNX float: LassoCV (lasso_cv_float.onnx)
Python  Mean Absolute Error: 6.33567334453819
MQL5:   Mean Absolute Error: 6.3356732213321800
        
Testing ONNX double: LassoCV (lasso_cv_double.onnx)
Python  Mean Absolute Error: 6.33567334453819
MQL5:   Mean Absolute Error: 6.3356733445381899

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.9.3. Representación ONNX de lasso_cv_float.onnx y lasso_cv_double.onnx

Fig.34. Representación ONNX del lasso_cv_float.onnx en Netron

Fig.35. ONNX representation of lasso_cv_double.onnx at Netron

2.1.10. sklearn.linear_model.LassoLars

LassoLars es una combinación de dos métodos: Lasso (Least Absolute Shrinkage and Selection Operator / Operador de selección y contracción mínima absoluta) y LARS (Least Angle Regression / Regresión de ángulo mínimo).

Este método se utiliza para tareas de regresión y combina las ventajas de ambos algoritmos, permitiendo la selección simultánea de características y la reducción de la dimensionalidad del modelo.

Principio de funcionamiento de LassoLars:

Datos de entrada: Comienza con el conjunto de datos original, incluyendo características (variables independientes) y sus correspondientes valores de variable objetivo.
Inicialización: LassoLars comienza con un modelo nulo, lo que significa que no hay características activas. Todos los coeficientes se ponen a cero.
Selección escalonada de características: Similar al método LARS, LassoLars selecciona, en cada paso, la característica más correlacionada con los residuos del modelo y la añade al modelo. A continuación, el coeficiente de esta característica se ajusta mediante el método de mínimos cuadrados.
Aplicación de la regularización L1: Simultáneamente con la selección escalonada de características, LassoLars aplica regularización L1, añadiendo una penalización por la suma de los valores absolutos de los coeficientes. Esto permite modelar relaciones complejas y elegir las características más importantes.
Hacer predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de LassoLars:

Selección de características: LassoLars selecciona automáticamente las características más importantes y reduce la dimensionalidad del modelo, ayudando a evitar el sobreajuste y simplificando la interpretación.
Interpretabilidad: El método mantiene la interpretabilidad del modelo, lo que facilita determinar qué características se incluyen y cómo influyen en la variable objetivo.
Regularización: LassoLars aplica la regularización L1, evitando el sobreajuste y mejorando la generalización del modelo.

Limitaciones de LassoLars:

Modelo lineal: LassoLars construye un modelo lineal, que puede ser insuficiente para modelar relaciones no lineales complejas.
Sensibilidad al ruido: El método puede ser sensible a los valores atípicos de los datos.
Complejidad computacional: La selección de características en cada paso y la aplicación de la regularización pueden requerir más recursos informáticos que la regresión lineal simple.

LassoLars es útil en tareas de regresión en las que es importante elegir las características más importantes, reducir la dimensionalidad del modelo y mantener la interpretabilidad.

2.1.10.1. Código para crear el modelo LassoLars y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.LassoLars, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# LassoLars.py
# The code demonstrates the process of training LassoLars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLars
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LassoLars"
onnx_model_filename = data_path + "lasso_lars"

# create a LassoLars Regressor model
lassolars_regressor_model = LassoLars(alpha=0.1)

# fit the model to the data
lassolars_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lassolars_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lassolars_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lassolars_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  LassoLars Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382633544077
Python  Mean Absolute Error: 6.3476035128950805
Python  Mean Squared Error: 49.778152172481896
Python  
Python  LassoLars ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382635045889
Python  Mean Absolute Error: 6.3476034814795375
Python  Mean Squared Error: 49.77815018516975
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LassoLars ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382633544077
Python  Mean Absolute Error: 6.3476035128950805
Python  Mean Squared Error: 49.778152172481896
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  16

Fig.36. Resultado del LassoLars.py (float)

2.1.10.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los guardados lasso_lars_float.onnx y lasso_lars_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                    LassoLars.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoLars"
#define   ONNXFilenameFloat  "lasso_lars_float.onnx"
#define   ONNXFilenameDouble "lasso_lars_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

LassoLars (EURUSD,H1)   Testing ONNX float: LassoLars (lasso_lars_float.onnx)
LassoLars (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382635045889
LassoLars (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3476034814795375
LassoLars (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781501851697357
LassoLars (EURUSD,H1)   
LassoLars (EURUSD,H1)   Testing ONNX double: LassoLars (lasso_lars_double.onnx)
LassoLars (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382633544077
LassoLars (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3476035128950858
LassoLars (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781521724819029

Comparación con el modelo doble original en Python:

Testing ONNX float: LassoLars (lasso_lars_float.onnx)
Python  Mean Absolute Error: 6.3476035128950805
MQL5:   Mean Absolute Error: 6.3476034814795375

Testing ONNX double: LassoLars (lasso_lars_double.onnx)
Python  Mean Absolute Error: 6.3476035128950805
MQL5:   Mean Absolute Error: 6.3476035128950858

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.10.3. Representación ONNX de lasso_lars_float.onnx y lasso_lars_double.onnx

Fig.37. Representación ONNX del lasso_lars_float.onnx en Netron

Fig.38. Representación ONNX del lasso_lars_double.onnx en Netron

2.1.11. sklearn.linear_model.LassoLarsCV

LassoLarsCV es un método que combina Lasso (Least Absolute Shrinkage and Selection Operator / Operador de selección y contracción mínima absoluta) y LARS (Least Angle Regression / Regresión de ángulo mínimo) con la selección automática del hiperparámetro de regularización óptimo (alfa) mediante validación cruzada.

Este método combina las ventajas de ambos algoritmos y permite determinar el valor alfa óptimo para el modelo, teniendo en cuenta la selección de características y la regularización.

Principio de funcionamiento de LassoLarsCV:

Datos de entrada: Comienza con el conjunto de datos original, incluyendo características (variables independientes) y sus correspondientes valores de variable objetivo.
Inicialización: LassoLarsCV comienza con un modelo nulo, en el que todos los coeficientes se ponen a cero.
Definición de Rango Alfa: Se determina un rango de valores para el hiperparámetro alfa, que se tendrá en cuenta durante el proceso de selección. Normalmente, se utiliza una escala logarítmica de valores alfa.
Validación cruzada: Para cada valor alfa del rango elegido, LassoLarsCV realiza una validación cruzada para evaluar el rendimiento del modelo con este valor alfa. Normalmente, se utilizan métricas como MSE (Mean Squared Error / Error medio cuadrado) o el coeficiente de determinación (R²).
Selección del alfa óptimo: LassoLarsCV elige el valor alfa en el que el modelo alcanza el mejor rendimiento basándose en los resultados de la validación cruzada.
Entrenamiento del modelo: El modelo LassoLars se entrena utilizando el valor alfa seleccionado, excluyendo las características menos importantes y aplicando la regularización L1.
Hacer predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de LassoLarsCV:

Selección automática de alfa: LassoLarsCV selecciona automáticamente el hiperparámetro alfa óptimo mediante validación cruzada, lo que simplifica el ajuste del modelo.
Selección de características: LassoLarsCV elige automáticamente las características más importantes y reduce la dimensionalidad del modelo.
Regularización: El método aplica la regularización L1, que evita el sobreajuste y mejora la generalización del modelo.

Limitaciones de LassoLarsCV:

Modelo lineal: LassoLarsCV construye un modelo lineal, que puede ser insuficiente para modelar relaciones no lineales complejas.
Sensibilidad al ruido: El método puede ser sensible a los valores atípicos de los datos.
Complejidad computacional: La selección de características en cada paso y la aplicación de la regularización pueden requerir más recursos informáticos que la regresión lineal simple.

LassoLarsCV es útil en tareas de regresión en las que es esencial elegir las características más importantes, reducir la dimensionalidad del modelo, evitar el sobreajuste y ajustar automáticamente los hiperparámetros del modelo.

2.1.11.1. Código para crear el modelo LassoLarsCV y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.LassoLarsCV, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# LassoLarsCV.py
# The code demonstrates the process of training LassoLars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLarsCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LassoLarsCV"
onnx_model_filename = data_path + "lasso_lars_cv"

# create a LassoLarsCV Regressor model
lassolars_cv_regressor_model = LassoLarsCV(cv=5)

# fit the model to the data
lassolars_cv_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lassolars_cv_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lassolars_cv_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lassolars_cv_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  LassoLarsCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  
Python  LassoLarsCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382640824089
Python  Mean Absolute Error: 6.347737845846069
Python  Mean Squared Error: 49.778142539016564
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LassoLarsCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642612767
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.77814017210321
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  16

Fig.39. Resultados de LassoLarsCV.py (float ONNX)

2.1.11.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados lasso_lars_cv_float.onnx y lasso_lars_cv_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                  LassoLarsCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoLarsCV"
#define   ONNXFilenameFloat  "lasso_lars_cv_float.onnx"
#define   ONNXFilenameDouble "lasso_lars_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

LassoLarsCV (EURUSD,H1) Testing ONNX float: LassoLarsCV (lasso_lars_cv_float.onnx)
LassoLarsCV (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382640824089
LassoLarsCV (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477378458460691
LassoLarsCV (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781425390165566
LassoLarsCV (EURUSD,H1) 
LassoLarsCV (EURUSD,H1) Testing ONNX double: LassoLarsCV (lasso_lars_cv_double.onnx)
LassoLarsCV (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382642612767
LassoLarsCV (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477379221400145
LassoLarsCV (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781401721031642

Comparación con el modelo doble original en Python:

Testing ONNX float: LassoLarsCV (lasso_lars_cv_float.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477378458460691
        
Testing ONNX double: LassoLarsCV (lasso_lars_cv_double.onnx)
Python  Mean Absolute Error: 6.3477379221400145
MQL5:   Mean Absolute Error: 6.3477379221400145

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 16 decimales.

2.1.11.3. Representación ONNX de lasso_lars_cv_float.onnx y lasso_lars_cv_double.onnx

Fig.40. Representación ONNX del lasso_lars_cv_float.onnx en Netron

Fig.41. Representación ONNX del lasso_lars_cv_double.onnx en Netron

2.1.12. sklearn.linear_model.LassoLarsIC

LassoLarsIC es un método de regresión que combina Lasso (Least Absolute Shrinkage and Selection Operator / Operador de selección y contracción mínima absoluta) y el IC (Information Criterion / Criterio de información)para seleccionar automáticamente el conjunto óptimo de características.

Utiliza criterios de información como el AIC (Akaike Information Criterion / Criterio de Información de Akaike) y el BIC (Bayesian Information Criterion / Criterio de Información Bayesiano) para determinar qué características incluir en el modelo y aplica la regularización L1 para estimar los coeficientes del modelo.

Principio de funcionamiento de LassoLarsIC:

Datos de entrada: Comienza con el conjunto de datos original, incluyendo características (variables independientes) y sus correspondientes valores de variable objetivo.
Inicialización: LassoLarsIC comienza con un modelo nulo, es decir, sin características activas. Todos los coeficientes se ponen a cero.
Selección de características mediante el criterio de información: El método evalúa el criterio de información (por ejemplo, AIC o BIC) para diferentes conjuntos de características, partiendo de un modelo vacío e incorporando gradualmente características al modelo. El criterio de información evalúa la calidad del modelo, teniendo en cuenta el equilibrio entre el ajuste a los datos y la complejidad del modelo.
Selección del conjunto de características óptimo: LassoLarsIC elige el conjunto de características para el que el criterio de información alcanza el mejor valor. Este conjunto de características se incluirá en el modelo.
Aplicación de la regularización L1: La regularización L1 se aplica a las características seleccionadas, ayudando en la estimación de los coeficientes del modelo.
Hacer predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos.

Ventajas de LassoLarsIC:

Selección automática de características: LassoLarsIC elige automáticamente el conjunto óptimo de características, reduciendo la dimensionalidad del modelo y evitando el sobreajuste.
Criterios de información: El uso de criterios de información permite equilibrar la calidad y la complejidad del modelo.
Regularización: El método aplica la regularización L1, que evita el sobreajuste y mejora la generalización del modelo.

Limitaciones de LassoLarsIC:

Modelo lineal: LassoLarsIC construye un modelo lineal, que puede ser insuficiente para modelar relaciones no lineales complejas..
Sensibilidad al ruido: El método puede ser sensible a los valores atípicos de los datos.
Complejidad computacional: La evaluación de los criterios de información para varios conjuntos de características puede requerir recursos informáticos adicionales.

LassoLarsIC es valioso en tareas de regresión en las que es crucial seleccionar automáticamente el mejor conjunto de características y reducir la dimensionalidad del modelo basándose en criterios de información.

2.1.12.1. Código para crear el modelo LassoLarsIC y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.LassoLarsIC, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# LassoLarsIC.py
# The code demonstrates the process of training LassoLarsIC model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLarsIC
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name="LassoLarsIC"
onnx_model_filename = data_path + "lasso_lars_ic"

# create a LassoLarsIC Regressor model
lasso_lars_ic_regressor_model = LassoLarsIC(criterion='aic')

# fit the model to the data
lasso_lars_ic_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = lasso_lars_ic_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(lasso_lars_ic_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(lasso_lars_ic_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  LassoLarsIC Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  
Python  LassoLarsIC ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_ic_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LassoLarsIC ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_ic_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336425
Python  Mean Squared Error: 49.778140171281784
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  15

Fig.42. Resultados del LassoLarsIC.py (float ONNX)

2.1.12.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados lasso_lars_ic_float.onnx y lasso_lars_ic_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                  LassoLarsIC.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LassoLarsIC"
#define   ONNXFilenameFloat  "lasso_lars_ic_float.onnx"
#define   ONNXFilenameDouble "lasso_lars_ic_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

LassoLarsIC (EURUSD,H1) Testing ONNX float: LassoLarsIC (lasso_lars_ic_float.onnx)
LassoLarsIC (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
LassoLarsIC (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477377671679385
LassoLarsIC (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781414740478638
LassoLarsIC (EURUSD,H1) 
LassoLarsIC (EURUSD,H1) Testing ONNX double: LassoLarsIC (lasso_lars_ic_double.onnx)
LassoLarsIC (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
LassoLarsIC (EURUSD,H1) MQL5:   Mean Absolute Error: 6.3477379263364302
LassoLarsIC (EURUSD,H1) MQL5:   Mean Squared Error: 49.7781401712817768

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: LassoLarsIC (lasso_lars_ic_float.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477377671679385
 
Testing ONNX double: LassoLarsIC (lasso_lars_ic_double.onnx)
Python  Mean Absolute Error: 6.347737926336425
MQL5:   Mean Absolute Error: 6.3477379263364302

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.12.3. Representación ONNX de lasso_lars_ic_float.onnx y lasso_lars_ic_double.onnx

Fig.43. Representación ONNX del lasso_lars_ic_float.onnx en Netron

Fig.44. Representación ONNX del lasso_lars_ic_double.onnx en Netron

2.1.13. sklearn.linear_model.LinearRegression

LinearRegression es uno de los métodos más sencillos y utilizados en aprendizaje automático para tareas de regresión.

Se utiliza para construir modelos lineales que predicen valores numéricos (continuos) de la variable objetivo basándose en una combinación lineal de características de entrada.

Principio de funcionamiento de la regresión lineal:

Modelo lineal: El modelo de regresión lineal supone que existe una relación lineal entre las variables independientes (características) y la variable objetivo. Esta relación puede expresarse mediante la ecuación de regresión lineal:y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ, donde y es la variable objetivo, β₀ -es el coeficiente de intercepción, β₁, β₂, ... βₚ - son los coeficientes de las características, x₁, x₂, ... xₚ son los valores de las características.
Estimación de parámetros: El objetivo de LinearRegression es estimar los coeficientes β₀, β₁, β₂, ... βₚ, que mejor se ajusten a los datos. Para ello se suele utilizar el método OLS (Ordinary Least Squares / Mínimos cuadrados ordinarios), que minimiza la suma de las diferencias al cuadrado entre los valores reales y los previstos.
Evaluación del modelo: Para evaluar la calidad del modelo de regresión lineal se utilizan diversas métricas, como MSE (Mean Squared Error / Error medio cuadrado) o el coeficiente de determinación (R2), entre otras.

Ventajas de la regresión lineal:

Simplicidad e interpretabilidad: LinearRegression es un método sencillo y de fácil interpretación, que permite analizar la influencia de cada característica en la variable objetivo.
Alta velocidad de entrenamiento y predicción: El modelo de regresión lineal tiene una alta velocidad de entrenamiento y predicción, por lo que es una buena opción para grandes conjuntos de datos.
Aplicabilidad: LinearRegression puede aplicarse con éxito a diversas tareas de regresión.

Limitaciones de la regresión lineal:

Linealidad: Este método asume linealidad en la relación entre las características y la variable objetivo, lo que podría ser insuficiente para modelar dependencias no lineales complejas.
Sensibilidad a los valores atípicos: LinearRegression es sensible a los valores atípicos en los datos, que pueden afectar a la calidad del modelo.

LinearRegression es un método de regresión sencillo y muy utilizado que construye un modelo lineal para predecir valores numéricos de la variable objetivo basándose en una combinación lineal de características de entrada. Es adecuado para problemas con una relación lineal y cuando la interpretabilidad del modelo es importante.

2.1.13.1. Código para crear el modelo LinearRegression y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.LinearRegression, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX, y realiza predicciones usando datos de entrada float y double. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# LinearRegression.py
# The code demonstrates the process of training LinearRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LinearRegression"
onnx_model_filename = data_path + "linear_regression"

# create a Linear Regression model
linear_model = LinearRegression()

# fit the model to the data
linear_model.fit(X, y)

# predict values for the entire dataset
y_pred = linear_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(linear_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(linear_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  LinearRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  
Python  LinearRegression ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_regression_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  LinearRegression ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_regression_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.45. Resultados de LinearRegression.py (float ONNX)

2.1.13.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los guardados linear_regression_float.onnx y linear_regression_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                             LinearRegression.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LinearRegression"
#define   ONNXFilenameFloat  "linear_regression_float.onnx"
#define   ONNXFilenameDouble "linear_regression_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

LinearRegression (EURUSD,H1)    Testing ONNX float: LinearRegression (linear_regression_float.onnx)
LinearRegression (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
LinearRegression (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3477377671679385
LinearRegression (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7781414740478638
LinearRegression (EURUSD,H1)    
LinearRegression (EURUSD,H1)    Testing ONNX double: LinearRegression (linear_regression_double.onnx)
LinearRegression (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
LinearRegression (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3477379263364266
LinearRegression (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7781401712817768

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: LinearRegression (linear_regression_float.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477377671679385

Testing ONNX double: LinearRegression (linear_regression_double.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477379263364266

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.13.3. Representación ONNX de linear_regression_float.onnx y linear_regression_double.onnx

Fig.46. Representación ONNX de linear_regression_float.onnx en Netron

Fig.47. Representación ONNX de linear_regression_double.onnx en Netron

Nota sobre los métodos Ridge y RidgeCV

Ridge y RidgeCV son dos métodos relacionados en el aprendizaje automático utilizados para la regularización en la regresión Ridge. Comparten una funcionalidad similar, pero difieren en su uso y ajuste de parámetros.
<segmento 4513>
Principio de funcionamiento de Ridge (regresión Ridge):

Ridge es un método de regresión con regularización L2. Significa que añade la suma de coeficientes al cuadrado (norma L2) a la función de pérdida minimizada por el modelo. Este término de regularización adicional ayuda a reducir las magnitudes de los coeficientes del modelo, evitando así el sobreajuste.
Utilización del parámetro alfa: En el método Ridge, el parámetro alfa (también conocido como fuerza de regularización) está preestablecido y no se modifica automáticamente. Los usuarios deben seleccionar un valor alfa adecuado basándose en su conocimiento de los datos y los experimentos.

Principio de funcionamiento de RidgeCV (Ridge Cross-Validation):

RidgeCV es una extensión del método Ridge, que consiste en seleccionar automáticamente el valor óptimo del parámetro alfa mediante validación cruzada. En lugar de ajustar manualmente alfa, RidgeCV itera a través de diferentes valores de alfa y elige el que proporciona el mejor rendimiento en la validación cruzada.
Ventaja del ajuste automático: La principal ventaja de RidgeCV es su determinación automática del valor alfa óptimo sin necesidad de ajuste manual. Esto hace que el proceso de ajuste sea más cómodo y evita posibles errores en la selección de alfa.

La diferencia clave entre Ridge y RidgeCV es que Ridge requiere que los usuarios especifiquen explícitamente el valor del parámetro alfa, mientras que RidgeCV encuentra automáticamente el valor alfa óptimo mediante validación cruzada. RidgeCV suele ser la opción preferida cuando se trabaja con una gran cantidad de datos y se pretende evitar el ajuste manual de los parámetros.

2.1.14. sklearn.linear_model.Ridge

Ridge es un método de regresión utilizado en el aprendizaje automático para resolver problemas de regresión. Forma parte de la familia de los modelos lineales y representa una regresión lineal regularizada.

La principal característica de la regresión Ridge es que añade la regularización L2 al método estándar MCO (Ordinary Least Squares).

Cómo funciona la regresión de Ridge:

Regresión lineal: Similar a la regresión lineal normal, la regresión Ridge pretende encontrar una relación lineal entre las variables independientes (características) y la variable objetivo.
Regularización L2: La principal distinción de la regresión Ridge es añadir regularización L2 a la función de pérdida. Esto significa que se añade una penalización por valores grandes de los coeficientes de regresión a la suma de las diferencias al cuadrado entre los valores reales y los predichos.
Penalización de los coeficientes: La regularización L2 impone una penalización a los valores de los coeficientes de regresión. Como resultado, algunos coeficientes tienden a estar más cerca de cero, lo que reduce el sobreajuste y mejora la estabilidad del modelo.
Hiperparámetro α: Uno de los parámetros esenciales en la regresión Ridge es el hiperparámetro α (alfa), que determina el grado de regularización. Los valores α más altos conducen a una regularización más fuerte, lo que resulta en modelos más simples con valores de coeficiente más bajos.

Ventajas de la regresión Ridge:

Reducción del sobreajuste: La regularización L2 en Ridge ayuda a reducir el sobreajuste, haciendo que el modelo sea más robusto frente al ruido en los datos.
Tratamiento de la multicolinealidad: La regresión Ridge se las arregla bien con los problemas de multicolinealidad, sobre todo cuando las características están muy correlacionadas.
Abordar la maldición de la dimensionalidad: Ridge ayuda en escenarios con muchas características, donde OLS podría ser inestable.

Limitaciones de la regresión Ridge:

No elimina características: La regresión Ridge no elimina los coeficientes de las características, sólo los reduce, lo que significa que algunas características podrían permanecer en el modelo.
Elección de α óptimo: Seleccionar el valor correcto del hiperparámetro α puede requerir una validación cruzada.

La regresión Ridge es un método de regresión que introduce la regularización L2 en la regresión lineal estándar para reducir el sobreajuste, mejorar la estabilidad y abordar los problemas de multicolinealidad. Este método es útil cuando es necesario equilibrar la precisión y la estabilidad del modelo.

2.1.14.1. Código para crear el modelo Ridge y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.Ridge, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# Ridge.py
# The code demonstrates the process of training Ridge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "Ridge"
onnx_model_filename = data_path + "ridge"

# create a Ridge model
regression_model = Ridge()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  Ridge Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382641178552
Python  Mean Absolute Error: 6.347684462929819
Python  Mean Squared Error: 49.77814206996523
Python  
Python  Ridge ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382634837793
Python  Mean Absolute Error: 6.347684915729416
Python  Mean Squared Error: 49.77815046053819
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  Ridge ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641178552
Python  Mean Absolute Error: 6.347684462929819
Python  Mean Squared Error: 49.77814206996523
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.49. Resultados del Ridge.py (float ONNX)

2.1.14.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados ridge_float.onnx y ridge_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                        Ridge.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "Ridge"
#define   ONNXFilenameFloat  "ridge_float.onnx"
#define   ONNXFilenameDouble "ridge_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

Ridge (EURUSD,H1)       Testing ONNX float: Ridge (ridge_float.onnx)
Ridge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382634837793
Ridge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3476849157294160
Ridge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781504605381784
Ridge (EURUSD,H1)       
Ridge (EURUSD,H1)       Testing ONNX double: Ridge (ridge_double.onnx)
Ridge (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382641178552
Ridge (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3476844629298235
Ridge (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781420699652131

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: Ridge (ridge_float.onnx)
Python  Mean Absolute Error: 6.347684462929819
MQL5:   Mean Absolute Error: 6.3476849157294160
       
Testing ONNX double: Ridge (ridge_double.onnx)
Python  Mean Absolute Error: 6.347684462929819
MQL5:   Mean Absolute Error: 6.3476844629298235

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.14.3. Representación ONNX de ridge_float.onnx y ridge_double.onnx

Fig.50. Representación ONNX del ridge_float.onnx en Netron

Fig.51. Representación ONNX del ridge_double.onnx en Netron

2.1.15. sklearn.linear_model.RidgeCV

RidgeCV - es una extensión de la regresión Ridge que incluye la selección automática del mejor hiperparámetro α (alfa), que determina el grado de regularización en la regresión Ridge. El hiperparámetro α controla el equilibrio entre la minimización de la suma de errores al cuadrado (como en la regresión lineal ordinaria) y la minimización del valor de los coeficientes de regresión (regularización). RidgeCV selecciona automáticamente el valor óptimo de α en función de los parámetros y criterios especificados.

Cómo funciona RidgeCV:

Datos de entrada: RidgeCV toma datos de entrada consistentes en características (variables independientes) y la variable objetivo (continua).
Elección de α: La regresión Ridge requiere la selección del hiperparámetro α, que determina el grado de regularización. RidgeCV selecciona automáticamente el valor óptimo de α a partir del rango dado.
Validación cruzada: RidgeCV utiliza la validación cruzada, como la validación cruzada k-fold, para evaluar qué valor α proporciona la mejor generalización del modelo en datos independientes.
α óptimo: Una vez completado el proceso de entrenamiento, RidgeCV elige el valor α que ofrece el mejor rendimiento en la validación cruzada y utiliza este valor para entrenar el modelo de regresión Ridge final.

Ventajas de RidgeCV:

Selección automática de α: RidgeCV permite seleccionar automáticamente el valor óptimo del hiperparámetro α, lo que simplifica el proceso de ajuste del modelo.
Equilibrio entre regularización y rendimiento: Este método ayuda a encontrar el equilibrio óptimo entre la regularización (reduciendo el sobreajuste) y el rendimiento del modelo.

Limitaciones de RidgeCV:

Complejidad computacional: La validación cruzada puede requerir importantes recursos computacionales, especialmente cuando se utiliza un amplio rango de valores α.

RidgeCV es un método de regresión Ridge con selección automática del hiperparámetro óptimo α mediante validación cruzada. Este método agiliza el proceso de selección de hiperparámetros y permite encontrar el mejor equilibrio entre regularización y rendimiento del modelo.

2.1.15.1. Código para crear el modelo RidgeCV y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.RidgeCV, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# RidgeCV.py
# The code demonstrates the process of training RidgeCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RidgeCV"
onnx_model_filename = data_path + "ridge_cv"

# create a RidgeCV model
regression_model = RidgeCV()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  RidgeCV Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382499160807
Python  Mean Absolute Error: 6.34720334999352
Python  Mean Squared Error: 49.77832999861571
Python  
Python  RidgeCV ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_cv_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382499108485
Python  Mean Absolute Error: 6.3472036427935485
Python  Mean Squared Error: 49.77833006785168
Python  R^2 matching decimal places:  11
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  6
Python  
Python  RidgeCV ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_cv_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382499160807
Python  Mean Absolute Error: 6.34720334999352
Python  Mean Squared Error: 49.77832999861571
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  14

Fig.52. Resultados de RidgeCV.py (float ONNX)

2.1.15.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados ridge_cv_float.onnx y ridge_cv_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                      RidgeCV.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RidgeCV"
#define   ONNXFilenameFloat  "ridge_cv_float.onnx"
#define   ONNXFilenameDouble "ridge_cv_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

RidgeCV (EURUSD,H1)     Testing ONNX float: RidgeCV (ridge_cv_float.onnx)
RidgeCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382499108485
RidgeCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3472036427935485
RidgeCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7783300678516909
RidgeCV (EURUSD,H1)     
RidgeCV (EURUSD,H1)     Testing ONNX double: RidgeCV (ridge_cv_double.onnx)
RidgeCV (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382499160807
RidgeCV (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3472033499935216
RidgeCV (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7783299986157246

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: RidgeCV (ridge_cv_float.onnx)
Python  Mean Absolute Error: 6.34720334999352
MQL5:   Mean Absolute Error: 6.3472036427935485

Testing ONNX double: RidgeCV (ridge_cv_double.onnx)
Python  Mean Absolute Error: 6.34720334999352
MQL5:   Mean Absolute Error: 6.3472033499935216

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.15.3. Representación ONNX de ridge_cv_float.onnx y ridge_cv_double.onnx

Fig.53. Representación ONNX del ridge_cv_float.onnx en Netron

Fig.54. ONNX representation of ridge_cv_double.onnx in Netron

2.1.16. sklearn.linear_model.OrthogonalMatchingPursuit

OrthogonalMatchingPursuit (OMP) es un algoritmo utilizado para resolver problemas de selección de características y regresión lineal.

Es uno de los métodos para seleccionar las características más significativas, lo que puede ser útil para reducir la dimensionalidad de los datos y mejorar la capacidad de generalización del modelo.

Cómo funciona OrthogonalMatchingPursuit:

Datos de entrada: Se parte de un conjunto de datos que contiene características (variables independientes) y valores de la variable objetivo (continua).
Selección del número de características: Uno de los pasos iniciales cuando se utiliza OrthogonalMatchingPursuit es determinar el número de características que desea incluir en el modelo. Este número puede predefinirse o elegirse utilizando criterios como AIC (Akaike Information Criterion / Criterio de Información de Akaike) o el criterio del error mínimo.
Adición iterativa de características: El algoritmo comienza con un modelo vacío y añade iterativamente las características que mejor explican los residuos del modelo. En cada iteración, se elige una nueva característica ortogonal a las características seleccionadas anteriormente. La característica óptima se selecciona en función de su correlación con los residuos del modelo.
Entrenamiento del modelo: Tras añadir el número especificado de características, el modelo se entrena con los datos teniendo en cuenta únicamente estas características seleccionadas.
Hacer predicciones: Tras el entrenamiento, el modelo puede predecir los valores de la variable objetivo con nuevos datos.

Ventajas de OrthogonalMatchingPursuit:

Reducción de la dimensionalidad: OMP puede reducir la dimensionalidad de los datos seleccionando sólo las características más informativas.
Interpretabilidad: Dado que OMP sólo selecciona un pequeño número de características, los modelos creados con él pueden ser más interpretables.

Limitaciones de OrthogonalMatchingPursuit:

Sensibilidad al número de características seleccionadas: El número de características seleccionadas debe ajustarse correctamente, ya que una elección incorrecta puede dar lugar a un ajuste excesivo o insuficiente.
No tiene en cuenta la multicolinealidad: OMP puede no tener en cuenta la multicolinealidad entre características, lo que podría afectar a la selección de características óptimas.
Complejidad computacional: OMP es caro computacionalmente, especialmente para grandes conjuntos de datos.

OrthogonalMatchingPursuit es un algoritmo de selección de características y regresión lineal que permite seleccionar las características más informativas para el modelo. Este método puede ser valioso para reducir la dimensionalidad de los datos y mejorar la interpretabilidad del modelo.

2.1.16.1. Código para crear el modelo OrthogonalMatchingPursuit y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.OrthogonalMatchingPursuit, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX, y realiza predicciones usando datos de entrada float y double. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# OrthogonalMatchingPursuit.py
# The code demonstrates the process of training OrthogonalMatchingPursuit model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import OrthogonalMatchingPursuit
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "OrthogonalMatchingPursuit"
onnx_model_filename = data_path + "orthogonal_matching_pursuit"

# create an OrthogonalMatchingPursuit model
regression_model = OrthogonalMatchingPursuit()

# fit the model to the data
regression_model.fit(X, y)

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  OrthogonalMatchingPursuit Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281784
Python  
Python  OrthogonalMatchingPursuit ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\orthogonal_matching_pursuit_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  OrthogonalMatchingPursuit ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\orthogonal_matching_pursuit_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281784
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  16

Fig.55. Resultados del OrthogonalMatchingPursuit.py (float ONNX)

2.1.16.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados orthogonal_matching_pursuit_float.onnx y orthogonal_matching_pursuit_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                    OrthogonalMatchingPursuit.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "OrthogonalMatchingPursuit"
#define   ONNXFilenameFloat  "orthogonal_matching_pursuit_float.onnx"
#define   ONNXFilenameDouble "orthogonal_matching_pursuit_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

OrthogonalMatchingPursuit (EURUSD,H1)   Testing ONNX float: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_float.onnx)
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3477377671679385
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781414740478638
OrthogonalMatchingPursuit (EURUSD,H1)   
OrthogonalMatchingPursuit (EURUSD,H1)   Testing ONNX double: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_double.onnx)
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3477379263364275
OrthogonalMatchingPursuit (EURUSD,H1)   MQL5:   Mean Squared Error: 49.7781401712817768

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_float.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477377671679385
        
Testing ONNX double: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_double.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477379263364275

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 16 decimales.

2.1.16.3. Representación ONNX de orthogonal_matching_pursuit_float.onnx y orthogonal_matching_pursuit_double.onnx

Fig.56. Representación ONNX del orthogonal_matching_pursuit_float.onnx en Netron

Fig.57. Representación ONNX de la ortogonal_matching_pursuit_double.onnx en Netron

2.1.17. sklearn.linear_model.PassiveAggressiveRegressor

PassiveAggressiveRegressor es un método de aprendizaje automático utilizado para tareas de regresión.

Este método es una variante del algoritmo Pasivo-Agresivo (PA) que puede emplearse para entrenar un modelo capaz de predecir valores continuos de la variable objetivo.

Cómo funciona PassiveAggressiveRegressor:

Datos de entrada: Se parte de un conjunto de datos compuesto por características (variables independientes) y valores de la variable objetivo (continua).
Aprendizaje supervisado: PassiveAggressiveRegressor es un método de aprendizaje supervisado entrenado en pares (X, y), donde X representa las características, e y corresponde a los valores de la variable objetivo.
Aprendizaje adaptativo: La idea principal del método Pasivo-Agresivo es el aprendizaje adaptativo. El modelo aprende minimizando el error de predicción en cada ejemplo de entrenamiento. Se actualiza corrigiendo los pesos para reducir el error de predicción.
Parámetro C: PassiveAggressiveRegressor tiene un hiperparámetro C, que controla la intensidad con la que el modelo se adapta a los errores. Un valor C más alto significa actualizaciones de peso más agresivas, mientras que un valor C más bajo hace que el modelo sea menos agresivo.
Predicción: Una vez entrenado, el modelo puede predecir valores de variables objetivo para nuevos datos.

Ventajas de PassiveAggressiveRegressor:

Adaptabilidad: El método puede adaptarse a los cambios en los datos y actualizar el modelo para minimizar los errores de predicción.
Eficacia para grandes conjuntos de datos: PassiveAggressiveRegressor puede ser un método eficaz para la regresión, sobre todo cuando se entrena con volúmenes considerables de datos.

Limitaciones de PassiveAggressiveRegressor:

Sensibilidad a la elección del parámetro C: seleccionar correctamente el valor de C puede requerir afinar y experimentar.
Pueden ser necesarias características adicionales: En algunos casos, pueden ser necesarias características de ingeniería adicionales para que el entrenamiento del modelo tenga éxito.

PassiveAggressiveRegressor es un método de aprendizaje automático para tareas de regresión que aprende de forma adaptativa minimizando los errores de predicción en los datos de entrenamiento. Este método puede ser valioso para manejar grandes conjuntos de datos y requiere ajustar el parámetro C para un rendimiento óptimo.

2.1.17.1. Código para crear el modelo PassiveAggressiveRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.PassiveAggressiveRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# PassiveAggressiveRegressor.py
# The code demonstrates the process of training PassiveAggressiveRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import PassiveAggressiveRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PassiveAggressiveRegressor"
onnx_model_filename = data_path + "passive_aggressive_regressor"

# create a PassiveAggressiveRegressor model
regression_model = PassiveAggressiveRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  PassiveAggressiveRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9894376841493092
Python  Mean Absolute Error: 9.64524669506544
Python  Mean Squared Error: 139.76857373191007
Python  
Python  PassiveAggressiveRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\passive_aggressive_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9894376801868329
Python  Mean Absolute Error: 9.645248834431873
Python  Mean Squared Error: 139.76862616640122
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  5
Python  
Python  PassiveAggressiveRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\passive_aggressive_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9894376841493092
Python  Mean Absolute Error: 9.64524669506544
Python  Mean Squared Error: 139.76857373191007
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  14

Fig.58. Resultados del PassiveAggressiveRegressor.py (doble ONNX)

2.1.17.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados passive_aggressive_regressor_float.onnx y passive_aggressive_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                   PassiveAggressiveRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "PassiveAggressiveRegressor"
#define   ONNXFilenameFloat  "passive_aggressive_regressor_float.onnx"
#define   ONNXFilenameDouble "passive_aggressive_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

PassiveAggressiveRegressor (EURUSD,H1)  Testing ONNX float: PassiveAggressiveRegressor (passive_aggressive_regressor_float.onnx)
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9894376801868329
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Absolute Error: 9.6452488344318716
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Squared Error: 139.7686261664012761
PassiveAggressiveRegressor (EURUSD,H1)  
PassiveAggressiveRegressor (EURUSD,H1)  Testing ONNX double: PassiveAggressiveRegressor (passive_aggressive_regressor_double.onnx)
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9894376841493092
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Absolute Error: 9.6452466950654419
PassiveAggressiveRegressor (EURUSD,H1)  MQL5:   Mean Squared Error: 139.7685737319100667

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: PassiveAggressiveRegressor (passive_aggressive_regressor_float.onnx)
Python  Mean Absolute Error: 9.64524669506544
MQL5:   Mean Absolute Error: 9.6452488344318716
        
Testing ONNX double: PassiveAggressiveRegressor (passive_aggressive_regressor_double.onnx)
Python  Mean Absolute Error: 9.64524669506544
MQL5:   Mean Absolute Error: 9.6452466950654419

Precisión de ONNX float MAE: 5 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.17.3. Representación ONNX de passive_aggressive_regressor_float.onnx y passive_aggressive_regressor_double.onnx

Fig.59. Representación ONNX de passive_aggressive_regressor_float.onnx en Netron

Fig.60. Representación ONNX del regulador_agresivo_pasivo.onnx en Netron

2.1.18. sklearn.linear_model.QuantileRegressor

QuantileRegressor es un método de aprendizaje automático utilizado para estimar cuantiles (percentiles específicos) de la variable objetivo en tareas de regresión.

En lugar de predecir el valor medio de la variable objetivo, como suele hacerse en las tareas de regresión, QuantileRegressor predice valores correspondientes a cuantiles especificados, como la mediana (percentil 50) o los percentiles 25 y 75.

Cómo funciona QuantileRegressor:

Datos de entrada: Comienza con un conjunto de datos que contiene características (variables independientes) y la variable objetivo (continua).
Quantile focus: En lugar de predecir valores exactos de la variable objetivo, QuantileRegressor modela la distribución condicional de la variable objetivo y predice valores para determinados cuantiles de esta distribución.
Entrenamiento para diferentes cuantiles: El entrenamiento de un modelo QuantileRegressor implica entrenar modelos separados para cada cuantil deseado. Cada uno de estos modelos predice un valor correspondiente a su cuantil.
Parámetro de cuantiles: El parámetro principal de este método es la elección de los cuantiles deseados para los que se quieren obtener predicciones. Por ejemplo, si necesita predicciones para la mediana, tendrá que entrenar el modelo en el percentil 50.
Predicción de cuantiles: Tras el entrenamiento, el modelo puede utilizarse para predecir valores correspondientes a cuantiles especificados en datos nuevos.

Ventajas de QuantileRegressor:

Flexibilidad: QuantileRegressor proporciona flexibilidad en la predicción de varios cuantiles, lo que puede ser útil en tareas en las que diferentes percentiles de la distribución son importantes.
Robustez frente a valores atípicos: Un enfoque orientado a los cuantiles puede ser robusto frente a los valores atípicos, ya que no tiene en cuenta la media, que puede estar muy influida por los valores extremos.

Limitaciones de QuantileRegressor:

Necesidad de selección de cuantiles: La elección de cuantiles óptimos puede requerir ciertos conocimientos sobre la tarea.
Mayor complejidad computacional: Entrenar modelos separados para diferentes cuantiles puede aumentar la complejidad computacional de la tarea.

QuantileRegressor es un método de aprendizaje automático diseñado para predecir valores correspondientes a cuantiles especificados de la variable objetivo. Este método puede ser útil en tareas en las que interesan varios percentiles de la distribución y en casos en los que los datos pueden contener valores atípicos.

2.1.18.1. Código para crear el modelo QuantileRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.QuantileRegressor, lo entrena en datos sintéticos, guarda el modelo en formato ONNX, y realiza predicciones usando datos de entrada float y double. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# QuantileRegressor.py
# The code demonstrates the process of training QuantileRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import QuantileRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "QuantileRegressor"
onnx_model_filename = data_path + "quantile_regressor"

# create a QuantileRegressor model
regression_model = QuantileRegressor(solver='highs')

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  QuantileRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9959915738839231
Python  Mean Absolute Error: 6.3693091850025185
Python  Mean Squared Error: 53.0425343337143
Python  
Python  QuantileRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\quantile_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9959915739158818
Python  Mean Absolute Error: 6.3693091422201125
Python  Mean Squared Error: 53.042533910812814
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  7
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  7
Python  
Python  QuantileRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\quantile_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9959915738839231
Python  Mean Absolute Error: 6.3693091850025185
Python  Mean Squared Error: 53.0425343337143
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  13
Python  double ONNX model precision:  16

Fig.61. Resultados del QuantileRegressor.py (float ONNX)Este código ejecuta los guardados <b0>quantile_regressor_float.onnx</b0> y <b1>quantile_regressor_double.onnx</b1> y demostrando el uso de métricas de regresión en MQL5.

2.1.18.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los guardados quantile_regressor_float.onnx y quantile_regressor_double.onnx y demostrando el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                            QuantileRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "QuantileRegressor"
#define   ONNXFilenameFloat  "quantile_regressor_float.onnx"
#define   ONNXFilenameDouble "quantile_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

QuantileRegressor (EURUSD,H1)   Testing ONNX float: QuantileRegressor (quantile_regressor_float.onnx)
QuantileRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9959915739158818
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3693091422201169
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 53.0425339108128071
QuantileRegressor (EURUSD,H1)   
QuantileRegressor (EURUSD,H1)   Testing ONNX double: QuantileRegressor (quantile_regressor_double.onnx)
QuantileRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9959915738839231
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3693091850025185
QuantileRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 53.0425343337142721

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: QuantileRegressor (quantile_regressor_float.onnx)
Python  Mean Absolute Error: 6.3693091850025185
MQL5:   Mean Absolute Error: 6.3693091422201169

Testing ONNX double: QuantileRegressor (quantile_regressor_double.onnx)
Python  Mean Absolute Error: 6.3693091850025185
MQL5:   Mean Absolute Error: 6.3693091850025185

Precisión de ONNX float MAE: 7 decimales, Precisión de ONNX double MAE: 16 decimales.

2.1.18.3. Representación ONNX de quantile_regressor_float.onnx y quantile_regressor_double.onnx

Fig.62. Representación ONNX del quantile_regressor_float.onnx en Netron

Fig.63. Representación ONNX de quantile_regressor_double.onnx en Netron

2.1.19. sklearn.linear_model.RANSACRegressor

RANSACRegressor es un método de aprendizaje automático utilizado para resolver problemas de regresión mediante el método RANSAC (Random Sample Consensus / Consenso de muestra aleatoria).

El método RANSAC está diseñado para manejar datos que contienen valores atípicos o imperfecciones, lo que permite obtener un modelo de regresión más robusto al excluir la influencia de los valores atípicos.

Cómo funciona RANSACRegressor:

Datos de entrada: Comienza con un conjunto de datos que contiene características (variables independientes) y la variable objetivo (continua).
Selección de subconjuntos aleatorios: RANSAC comienza eligiendo subconjuntos aleatorios de datos utilizados para entrenar el modelo de regresión. Estos subconjuntos se denominan "hipótesis".
Ajuste del modelo a las hipótesis: Para cada hipótesis elegida, se entrena un modelo de regresión. En el caso de RANSACRegressor, normalmente se utiliza la regresión lineal, y el modelo se ajusta al subconjunto de datos.
Evaluación de valores atípicos: Tras entrenar el modelo, se evalúa su ajuste a todos los datos. Para cada punto de datos se calcula el error entre los valores previstos y los reales.
Identificación de valores atípicos: Los puntos de datos con errores superiores a un umbral especificado se consideran valores atípicos. Estos valores atípicos pueden influir en el entrenamiento del modelo y distorsionar los resultados.
Actualización del modelo: todos los puntos de datos no considerados atípicos se utilizan para actualizar el modelo de regresión. Este proceso puede repetirse varias veces con diferentes hipótesis aleatorias.
Modelo final: Tras varias iteraciones, RANSACRegressor selecciona el mejor modelo entrenado en el subconjunto de datos y lo devuelve como modelo de regresión final.

Ventajas de RANSACRegressor:

Robustez frente a valores atípicos: RANSACRegressor es un método robusto frente a valores atípicos, ya que los excluye del entrenamiento.
Regresión robusta: Este método permite crear un modelo de regresión más fiable cuando los datos contienen valores atípicos o imperfecciones.

Limitaciones de RANSACRegressor:

Sensibilidad al umbral de error: Elegir un umbral de error para determinar qué puntos se consideran atípicos podría requerir experimentación.
Complejidad de la selección de hipótesis: Elegir buenas hipótesis en la fase inicial puede no ser una tarea sencilla.

RANSACRegressor es un método de aprendizaje automático utilizado para problemas de regresión basado en el método RANSAC. Este método permite crear un modelo de regresión más sólido cuando los datos contienen valores atípicos o imperfecciones, al excluir su influencia en el modelo.

2.1.19.1. Código para crear el modelo RANSACRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.RANSACRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# RANSACRegressor.py
# The code demonstrates the process of training RANSACRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RANSACRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RANSACRegressor"
onnx_model_filename = data_path + "ransac_regressor"

# create a RANSACRegressor model
regression_model = RANSACRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("ONNX: MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  RANSACRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  
Python  RANSACRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ransac_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382641628886
Python  Mean Absolute Error: 6.3477377671679385
Python  Mean Squared Error: 49.77814147404787
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  ONNX: MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  RANSACRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ransac_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.347737926336427
Python  Mean Squared Error: 49.77814017128179
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.64. Resultados del RANSACRegressor.py (float ONNX)

2.1.19.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados ransac_regressor_float.onnx y ransac_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                              RANSACRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RANSACRegressor"
#define   ONNXFilenameFloat  "ransac_regressor_float.onnx"
#define   ONNXFilenameDouble "ransac_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

RANSACRegressor (EURUSD,H1)     Testing ONNX float: RANSACRegressor (ransac_regressor_float.onnx)
RANSACRegressor (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382641628886
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3477377671679385
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7781414740478638
RANSACRegressor (EURUSD,H1)     
RANSACRegressor (EURUSD,H1)     Testing ONNX double: RANSACRegressor (ransac_regressor_double.onnx)
RANSACRegressor (EURUSD,H1)     MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Absolute Error: 6.3477379263364266
RANSACRegressor (EURUSD,H1)     MQL5:   Mean Squared Error: 49.7781401712817768

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: RANSACRegressor (ransac_regressor_float.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477377671679385
     
Testing ONNX double: RANSACRegressor (ransac_regressor_double.onnx)
Python  Mean Absolute Error: 6.347737926336427
MQL5:   Mean Absolute Error: 6.3477379263364266

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.19.3. Representación ONNX de ransac_regressor_float.onnx y ransac_regressor_double.onnx

Fig.65. Representación ONNX del ransac_regressor_float.onnx en Netron

Fig.66. Representación ONNX del ransac_regressor_double.onnx en Netron

2.1.20. sklearn.linear_model.TheilSenRegressor

La regresión de Theil-Sen (estimador de Theil-Sen) es un método de estimación de regresión utilizado para aproximar relaciones lineales entre variables independientes y la variable objetivo.

Ofrece una estimación más robusta en comparación con la regresión lineal ordinaria en presencia de valores atípicos y ruido en los datos.

Cómo funciona la regresión de Theil-Sen:

Selección de puntos: Inicialmente, Theil-Sen selecciona pares aleatorios de puntos de datos del conjunto de datos de entrenamiento.
Cálculo de la pendiente: Para cada par de puntos de datos, el método calcula la pendiente de la recta que pasa por estos puntos, creando un conjunto de pendientes.
Pendiente mediana: A continuación, el método halla la pendiente mediana a partir del conjunto de pendientes. Esta pendiente mediana se utiliza como estimación de la pendiente de la regresión lineal.
Desviaciones de la mediana: Para cada punto de datos, el método calcula la desviación (diferencia entre el valor real y el valor previsto en función de la pendiente mediana) y halla la mediana de estas desviaciones. Esto crea una estimación para el coeficiente del intercepto de la regresión lineal.
Estimación final: Las estimaciones finales de los coeficientes de pendiente e intercepto se utilizan para construir el modelo de regresión lineal.

Ventajas de la regresión de Theil-Sen:

Resistencia a los valores atípicos: La regresión de Theil-Sen es más resistente a los valores atípicos y al ruido de los datos que la regresión lineal normal.
Suposiciones menos estrictas: El método no requiere suposiciones estrictas sobre la distribución de datos o la forma de dependencia, lo que lo hace más versátil.
Adecuada para datos multicolineales: La regresión de Theil-Sen funciona bien con datos en los que las variables independientes están muy correlacionadas (problema de multicolinealidad).

Limitaciones de la regresión de Theil-Sen:

Complejidad computacional: El cálculo de las pendientes de la mediana para todos los pares de puntos de datos puede llevar mucho tiempo, especialmente en el caso de grandes conjuntos de datos.
Estimación del coeficiente de intercepción: Las desviaciones de la mediana se utilizan para estimar el coeficiente de intercepción, lo que puede dar lugar a sesgos en presencia de valores atípicos.

La regresión de Theil-Sen es un método de estimación de la regresión que proporciona una evaluación estable de la relación lineal entre las variables independientes y la variable objetivo, especialmente en presencia de valores atípicos y ruido en los datos. Este método es útil cuando se necesita una estimación estable en condiciones de datos reales.

2.1.20.1. Código para crear el TheilSenRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.TheilSenRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# TheilSenRegressor.py
# The code demonstrates the process of training TheilSenRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import TheilSenRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "TheilSenRegressor"
onnx_model_filename = data_path + "theil_sen_regressor"

# create a TheilSen Regressor model
regression_model = TheilSenRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  TheilSenRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9962329196940459
Python  Mean Absolute Error: 6.338686004537594
Python  Mean Squared Error: 49.84886353898735
Python  
Python  TheilSenRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\theil_sen_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.996232919516505
Python  Mean Absolute Error: 6.338686370832071
Python  Mean Squared Error: 49.84886588834327
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  6
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  6
Python  
Python  TheilSenRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\theil_sen_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962329196940459
Python  Mean Absolute Error: 6.338686004537594
Python  Mean Squared Error: 49.84886353898735
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.67. Resultados del TheilSenRegressor.py (float ONNX)

2.1.20.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados theil_sen_regressor_float.onnx y theil_sen_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                            TheilSenRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "TheilSenRegressor"
#define   ONNXFilenameFloat  "theil_sen_regressor_float.onnx"
#define   ONNXFilenameDouble "theil_sen_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

TheilSenRegressor (EURUSD,H1)   Testing ONNX float: TheilSenRegressor (theil_sen_regressor_float.onnx)
TheilSenRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962329195165051
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3386863708320735
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 49.8488658883432691
TheilSenRegressor (EURUSD,H1)   
TheilSenRegressor (EURUSD,H1)   Testing ONNX double: TheilSenRegressor (theil_sen_regressor_double.onnx)
TheilSenRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9962329196940459
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 6.3386860045375943
TheilSenRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 49.8488635389873735

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: TheilSenRegressor (theil_sen_regressor_float.onnx)
Python  Mean Absolute Error: 6.338686004537594
MQL5:   Mean Absolute Error: 6.3386863708320735
        
Testing ONNX double: TheilSenRegressor (theil_sen_regressor_double.onnx)
Python  Mean Absolute Error: 6.338686004537594
MQL5:   Mean Absolute Error: 6.3386860045375943

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 15 decimales.

2.1.20.3. Representación ONNX de theil_sen_regressor_float.onnx y theil_sen_regressor_double.onnx

Fig.68. Representación ONNX del theil_sen_regressor_float.onnx en Netron

Fig.69. Representación ONNX del theil_sen_regressor_double.onnx en Netron

2.1.21. sklearn.linear_model.LinearSVR

LinearSVR (Linear Support Vector Regression / Regresión de vectores de soporte lineal) es un modelo de aprendizaje automático para tareas de regresión basado en el método SVM (Support Vector Machines / Máquinas de vectores de soporte).

Este método se utiliza para encontrar relaciones lineales entre las características y la variable objetivo utilizando un kernel lineal.

Cómo funciona LinearSVR:

Datos de entrada: LinearSVR comienza con un conjunto de datos que incluye características (variables independientes) y sus correspondientes valores de variable objetivo.
Selección de un modelo lineal: El modelo asume que existe una relación lineal entre las características y la variable objetivo, descrita por una ecuación de regresión lineal.
Entrenamiento del modelo: LinearSVR encuentra valores óptimos para los coeficientes del modelo minimizando una función de pérdida que considera el error de predicción y un error aceptable (épsilon).
Generación de predicciones: Tras el entrenamiento, el modelo puede predecir los valores de la variable objetivo para nuevos datos basándose en los coeficientes descubiertos.

Ventajas de LinearSVR:

Regresión de vectores de apoyo: LinearSVR emplea el método SVM (Support Vector Machines / Máquinas de vectores de soporte), que permite encontrar la separación óptima entre los datos considerando un error aceptable.
Soporte para múltiples características: El modelo puede manejar múltiples características y procesar datos en altas dimensiones.
Regularización: LinearSVR incluye regularización, lo que ayuda a combatir el sobreajuste y garantiza predicciones más estables.

Limitaciones de LinearSVR:

Linealidad: LinearSVR se limita mediante el uso de relaciones lineales entre las características y la variable objetivo. En el caso de relaciones complejas y no lineales, el modelo podría ser insuficientemente flexible.
Sensibilidad a los valores atípicos: El modelo puede ser sensible a los valores atípicos de los datos y al error aceptable (épsilon).
Incapacidad para captar relaciones complejas: LinearSVR, al igual que otros modelos lineales, es incapaz de capturar relaciones no lineales complejas entre las características y la variable objetivo.

LinearSVR es un modelo de aprendizaje automático de regresión que utiliza el método SVM (Support Vector Machines / Máquinas de vectores de soporte) para encontrar relaciones lineales entre las características y la variable objetivo. Admite la regularización y puede utilizarse en tareas en las que es esencial controlar el error aceptable. Sin embargo, el modelo está limitado por su dependencia lineal y podría ser sensible a los valores atípicos.

2.1.21.1. Código para crear el modelo LinearSVR y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.LinearSVR, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# LinearSVR.py
# The code demonstrates the process of training LinearSVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import LinearSVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "LinearSVR"
onnx_model_filename = data_path + "linear_svr"

# create a Linear SVR model
linear_svr_model = LinearSVR()

# fit the model to the data
linear_svr_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = linear_svr_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(linear_svr_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(linear_svr_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  LinearSVR Original model (double)
Python  R-squared (Coefficient of determination): 0.9944935515149387
Python  Mean Absolute Error: 7.026852359381935
Python  Mean Squared Error: 72.86550241109444
Python  
Python  LinearSVR ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_svr_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9944935580726729
Python  Mean Absolute Error: 7.026849848037511
Python  Mean Squared Error: 72.86541563418206
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  4
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  4
Python  
Python  LinearSVR ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_svr_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9944935515149387
Python  Mean Absolute Error: 7.026852359381935
Python  Mean Squared Error: 72.86550241109444
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  14
Python  double ONNX model precision:  15

Fig.70. Resultados de LinearSVR.py (float ONNX)

2.1.21.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta el guardado linear_svr_float.onnx y linear_svr_double.onnx y demostrando el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                    LinearSVR.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "LinearSVR"
#define   ONNXFilenameFloat  "linear_svr_float.onnx"
#define   ONNXFilenameDouble "linear_svr_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

LinearSVR (EURUSD,H1)   Testing ONNX float: LinearSVR (linear_svr_float.onnx)
LinearSVR (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9944935580726729
LinearSVR (EURUSD,H1)   MQL5:   Mean Absolute Error: 7.0268498480375108
LinearSVR (EURUSD,H1)   MQL5:   Mean Squared Error: 72.8654156341820567
LinearSVR (EURUSD,H1)   
LinearSVR (EURUSD,H1)   Testing ONNX double: LinearSVR (linear_svr_double.onnx)
LinearSVR (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9944935515149387
LinearSVR (EURUSD,H1)   MQL5:   Mean Absolute Error: 7.0268523593819374
LinearSVR (EURUSD,H1)   MQL5:   Mean Squared Error: 72.8655024110944680

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: LinearSVR (linear_svr_float.onnx)
Python  Mean Absolute Error: 7.026852359381935
MQL5:   Mean Absolute Error: 7.0268498480375108
   
Testing ONNX double: LinearSVR (linear_svr_double.onnx)
Python  Mean Absolute Error: 7.026852359381935
MQL5:   Mean Absolute Error: 7.0268523593819374

Precisión de ONNX float MAE: 4 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.21.3. Representación ONNX de linear_svr_float.onnx y linear_svr_double.onnx

Fig.71. Representación ONNX del linear_svr_float.onnx en Netron

Fig.72. Representación ONNX del linear_svr_double.onnx en Netron

2.1.22. sklearn.neural_network.MLPRegressor

MLPRegressor (Multi-Layer Perceptron Regressor / Regresor de perceptrón multicapa) es un modelo de aprendizaje automático que utiliza redes neuronales artificiales para tareas de regresión.

Se trata de una red neuronal multicapa compuesta por varias capas de neuronas (incluidas las capas de entrada, oculta y de salida) que se entrenan para predecir valores continuos de la variable objetivo.

Cómo funciona MLPRegressor:

Datos de entrada: Comienza con un conjunto de datos que contiene características (variables independientes) y sus correspondientes valores de variable objetivo.
Creación de una red neuronal multicapa: MLPRegressor emplea una red neuronal multicapa con múltiples capas ocultas de neuronas. Estas neuronas se conectan mediante conexiones ponderadas y funciones de activación.
Entrenamiento del modelo: MLPRegressor entrena la red neuronal ajustando los pesos y los sesgos para minimizar una función de pérdida que mide la disparidad entre las predicciones de la red y los valores reales de la variable objetivo. Esto se consigue mediante algoritmos de retropropagación.
Generación de predicciones: Tras el entrenamiento, el modelo puede predecir valores de variables objetivo para nuevos datos.

Ventajas de MLPRegressor:

Flexibilidad: Las redes neuronales multicapa pueden modelar relaciones no lineales complejas entre las características y la variable objetivo.
Versatilidad: MLPRegressor se puede utilizar para diversas tareas de regresión, incluyendo problemas de series temporales, aproximación de funciones y más.
Capacidad de generalización: Las redes neuronales aprenden de los datos y pueden generalizar las dependencias encontradas en los datos de entrenamiento a nuevos datos.

Limitaciones de MLPRegressor:

Complejidad del modelo base: Las grandes redes neuronales pueden ser costosas desde el punto de vista informático y requieren muchos datos para su entrenamiento.
Ajuste de hiperparámetros: Elegir los hiperparámetros óptimos (número de capas, número de neuronas en cada capa, tasa de aprendizaje, etc.) puede requerir experimentación.
Susceptibilidad al sobreajuste: Las grandes redes neuronales pueden ser propensas al sobreajuste si no hay suficientes datos o una regularización insuficiente.

MLPRegressor representa un potente modelo de aprendizaje automático basado en redes neuronales multicapa y puede utilizarse para una amplia gama de tareas de regresión. Este modelo es flexible, pero requiere un ajuste meticuloso y un entrenamiento con grandes volúmenes de datos para lograr resultados óptimos.

2.1.22.1. Código para crear el modelo MLPRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.neural_network.MLPRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# MLPRegressor.py
# The code demonstrates the process of training MLPRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "MLPRegressor"
onnx_model_filename = data_path + "mlp_regressor"

# create an MLP Regressor model
mlp_regressor_model = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', max_iter=1000)

# fit the model to the data
mlp_regressor_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = mlp_regressor_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(mlp_regressor_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(mlp_regressor_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  MLPRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9874070836467945
Python  Mean Absolute Error: 10.62249788982753
Python  Mean Squared Error: 166.63901957615224
Python  
Python  MLPRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\mlp_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9874070821340352
Python  Mean Absolute Error: 10.62249972216809
Python  Mean Squared Error: 166.63903959413219
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  MLPRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\mlp_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9874070836467945
Python  Mean Absolute Error: 10.622497889827532
Python  Mean Squared Error: 166.63901957615244
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  12
Python  double ONNX model precision:  14

Fig.73. Resultados del MLPRegressor.py (float ONNX)

2.1.22.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados mlp_regressor_float.onnx y mlp_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                 MLPRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "MLPRegressor"
#define   ONNXFilenameFloat  "mlp_regressor_float.onnx"
#define   ONNXFilenameDouble "mlp_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

MLPRegressor (EURUSD,H1)        Testing ONNX float: MLPRegressor (mlp_regressor_float.onnx)
MLPRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9875198695654352
MLPRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 10.5596681685341309
MLPRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 165.1465507645494597
MLPRegressor (EURUSD,H1)        
MLPRegressor (EURUSD,H1)        Testing ONNX double: MLPRegressor (mlp_regressor_double.onnx)
MLPRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9875198617341387
MLPRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 10.5596715833884609
MLPRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 165.1466543942046599

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: MLPRegressor (mlp_regressor_float.onnx)
Python  Mean Absolute Error: 10.62249788982753
MQL5:   Mean Absolute Error: 10.6224997221680901

Testing ONNX double: MLPRegressor (mlp_regressor_double.onnx)
Python  Mean Absolute Error: 10.62249788982753
MQL5:   Mean Absolute Error: 10.6224978898275282

Precisión de ONNX float MAE: 5 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.22.3. Representación ONNX de mlp_regressor_float.onnx y mlp_regressor_double.onnx

Fig.74. Representación ONNX del mlp_regressor_float.onnx en Netron

Fig.75. Representación ONNX del mlp_regressor_double.onnx en Netron

2.1.23. sklearn.cross_decomposition.PLSRegression

PLSRegression (Partial Least Squares Regression / Regresión por mínimos cuadrados parciales) es un método de aprendizaje automático utilizado para resolver problemas de regresión.

Forma parte de la familia de métodos PLS y se aplica para analizar y modelizar relaciones entre dos conjuntos de variables, donde un conjunto sirve como predictores, y el otro conjunto son las variables objetivo.

Cómo funciona PLSRegression:

Datos de entrada: Comienza con dos conjuntos de datos, etiquetados como X e Y. El conjunto X contiene variables independientes (predictores), y el conjunto Y contiene variables objetivo (dependientes).
Selección de combinaciones lineales: PLSRegression identifica las combinaciones lineales (componentes) en los conjuntos X e Y que maximizan la covarianza entre ellos. Estos componentes se denominan componentes PLS.
Maximización de la covarianza: El objetivo principal de PLSRegression es encontrar componentes PLS que maximicen la covarianza entre X e Y. Esto permite extraer las relaciones más informativas entre los predictores y las variables objetivo.
Entrenamiento del modelo: Una vez encontrados los componentes PLS, se pueden utilizar para crear un modelo que prediga los valores Y en función de X.
Generación de predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir los valores Y de los nuevos datos utilizando los valores X correspondientes.

Ventajas de la regresión PLS:

Análisis de correlaciones: PLSRegression permite analizar y modelizar las correlaciones entre dos conjuntos de variables, lo que puede ser útil para comprender las relaciones entre los predictores y las variables objetivo.
Reducción de la dimensionalidad: El método también puede utilizarse para reducir la dimensionalidad de los datos identificando los componentes PLS más importantes.

Limitaciones de la regresión PLS:

Sensibilidad a la elección del número de componentes: La selección del número óptimo de componentes PLS puede requerir cierta experimentación.
Dependencia de la estructura de los datos: Los resultados de PLSRegression pueden depender en gran medida de la estructura de los datos y de las correlaciones entre ellos.

PLSRegression es un método de aprendizaje automático utilizado para analizar y modelizar correlaciones entre dos conjuntos de variables, donde un conjunto actúa como predictores y el otro son las variables objetivo. Este método permite estudiar las relaciones dentro de los datos y puede ser útil para reducir la dimensionalidad de los datos y predecir los valores de las variables objetivo basándose en predictores.

2.1.23.1. Código para crear el modelo PLSRegression y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.cross_decomposition.PLSRegression, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# PLSRegression.py
# The code demonstrates the process of training PLSRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PLSRegression"
onnx_model_filename = data_path + "pls_regression"

# create a PLSRegression model
pls_model = PLSRegression(n_components=1)

# fit the model to the data
pls_model.fit(X, y)

# predict values for the entire dataset
y_pred = pls_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(pls_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(pls_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  PLSRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281805
Python  
Python  PLSRegression ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\pls_regression_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382638567003
Python  Mean Absolute Error: 6.3477379221400145
Python  Mean Squared Error: 49.778145525764096
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  8
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  8
Python  
Python  PLSRegression ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\pls_regression_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9962382642613388
Python  Mean Absolute Error: 6.3477379263364275
Python  Mean Squared Error: 49.778140171281805
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  15
Python  double ONNX model precision:  16

Fig.76. Resultados de PLSRegression.py (float ONNX)

2.1.23.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados pls_regression_float.onnx y pls_regression_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                PLSRegression.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "PLSRegression"
#define   ONNXFilenameFloat  "pls_regression_float.onnx"
#define   ONNXFilenameDouble "pls_regression_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

PLSRegression (EURUSD,H1)       Testing ONNX float: PLSRegression (pls_regression_float.onnx)
PLSRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382638567003
PLSRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3477379221400145
PLSRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781455257640815
PLSRegression (EURUSD,H1)       
PLSRegression (EURUSD,H1)       Testing ONNX double: PLSRegression (pls_regression_double.onnx)
PLSRegression (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9962382642613388
PLSRegression (EURUSD,H1)       MQL5:   Mean Absolute Error: 6.3477379263364275
PLSRegression (EURUSD,H1)       MQL5:   Mean Squared Error: 49.7781401712817839

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: PLSRegression (pls_regression_float.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477379221400145
       
Testing ONNX double: PLSRegression (pls_regression_double.onnx)
Python  Mean Absolute Error: 6.3477379263364275
MQL5:   Mean Absolute Error: 6.3477379263364275

Precisión de ONNX float MAE: 8 decimales, Precisión de ONNX double MAE: 16 decimales.

2.1.23.3. Representación ONNX de pls_regression_float.onnx y pls_regression_double.onnx

Fig.77. Representación ONNX de pls_regression_float.onnx en Netron

Fig.78. Representación ONNX de pls_regression_double.onnx en Netron

2.1.24. sklearn.linear_model.TweedieRegressor

TweedieRegressor es un método de regresión diseñado para resolver problemas de regresión utilizando la distribución de Tweedie. La distribución de Tweedie es una distribución de probabilidad que puede describir una amplia gama de datos, incluidos los datos con estructura de varianza variable. TweedieRegressor se aplica en tareas de regresión donde la variable objetivo posee características que se alinean con la distribución de Tweedie.

Cómo funciona TweedieRegressor:

Variable objetivo y distribución de Tweedie: TweedieRegressor asume que la variable objetivo sigue una distribución de Tweedie. La distribución de Tweedie depende del parámetro "p", que determina la forma de la distribución y el grado de varianza.
Entrenamiento del modelo: TweedieRegressor entrena un modelo de regresión para predecir la variable objetivo basándose en variables independientes (características). El modelo maximiza la verosimilitud para los datos correspondientes a la distribución de Tweedie.
Elección del parámetro "p La selección del parámetro 'p' es un aspecto crucial cuando se utiliza TweedieRegressor. Este parámetro define la forma y la varianza de la distribución. Diferentes valores "p" corresponden a diferentes tipos de datos; por ejemplo, p=1 corresponde a la distribución de Poisson, mientras que p=2 corresponde a la distribución normal.
Transformación de respuestas: A veces el modelo puede requerir transformaciones de las respuestas (variables objetivo) antes del entrenamiento. Esta transformación se refiere al parámetro "p" y puede implicar funciones logarítmicas u otras transformaciones para ajustarse a la distribución de Tweedie.

Ventajas de TweedieRegressor:

Capacidad para modelizar datos con varianza variable: La distribución de Tweedie puede adaptarse a datos con diferentes estructuras de varianza, lo que resulta valioso para datos del mundo real en los que la varianza puede variar.
Variedad de parámetros 'p': La posibilidad de elegir distintos valores 'p' permite modelar diversos tipos de datos.

Limitaciones de TweedieRegressor:

Complejidad en la elección del parámetro 'p': La selección del valor 'p' correcto puede requerir conocimientos sobre los datos y experimentación.
Conformidad con la distribución de Tweedie: Para que la aplicación de TweedieRegressor tenga éxito, la variable objetivo debe corresponder a la distribución de Tweedie. El incumplimiento puede dar lugar a un rendimiento deficiente del modelo.

TweedieRegressor es un método de regresión que utiliza la distribución de Tweedie para modelar datos con estructuras de varianza variables. Este método es útil en tareas de regresión en las que la variable objetivo se ajusta a la distribución de Tweedie y puede ajustarse con diferentes valores del parámetro 'p' para una mejor adaptación de los datos.

2.1.24.1. Código para crear el modelo TweedieRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.TweedieRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# TweedieRegressor.py
# The code demonstrates the process of training TweedieRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import TweedieRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "TweedieRegressor"
onnx_model_filename = data_path + "tweedie_regressor"

# create a Tweedie Regressor model
regression_model = TweedieRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

2023.10.31 11:39:36.223 Python  TweedieRegressor Original model (double)
2023.10.31 11:39:36.223 Python  R-squared (Coefficient of determination): 0.9962368328117072
2023.10.31 11:39:36.223 Python  Mean Absolute Error: 6.342397897667562
2023.10.31 11:39:36.223 Python  Mean Squared Error: 49.797082198408745
2023.10.31 11:39:36.223 Python  
2023.10.31 11:39:36.223 Python  TweedieRegressor ONNX model (float)
2023.10.31 11:39:36.223 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\tweedie_regressor_float.onnx
2023.10.31 11:39:36.253 Python  Information about input tensors in ONNX:
2023.10.31 11:39:36.253 Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
2023.10.31 11:39:36.253 Python  Information about output tensors in ONNX:
2023.10.31 11:39:36.253 Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
2023.10.31 11:39:36.253 Python  R-squared (Coefficient of determination) 0.9962368338709323
2023.10.31 11:39:36.253 Python  Mean Absolute Error: 6.342397072978867
2023.10.31 11:39:36.253 Python  Mean Squared Error: 49.797068181938165
2023.10.31 11:39:36.253 Python  R^2 matching decimal places:  8
2023.10.31 11:39:36.253 Python  MAE matching decimal places:  6
2023.10.31 11:39:36.253 Python  MSE matching decimal places:  4
2023.10.31 11:39:36.253 Python  float ONNX model precision:  6
2023.10.31 11:39:36.613 Python  
2023.10.31 11:39:36.613 Python  TweedieRegressor ONNX model (double)
2023.10.31 11:39:36.613 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\tweedie_regressor_double.onnx
2023.10.31 11:39:36.613 Python  Information about input tensors in ONNX:
2023.10.31 11:39:36.613 Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
2023.10.31 11:39:36.613 Python  Information about output tensors in ONNX:
2023.10.31 11:39:36.628 Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
2023.10.31 11:39:36.628 Python  R-squared (Coefficient of determination) 0.9962368328117072
2023.10.31 11:39:36.628 Python  Mean Absolute Error: 6.342397897667562
2023.10.31 11:39:36.628 Python  Mean Squared Error: 49.797082198408745
2023.10.31 11:39:36.628 Python  R^2 matching decimal places:  16
2023.10.31 11:39:36.628 Python  MAE matching decimal places:  15
2023.10.31 11:39:36.628 Python  MSE matching decimal places:  15
2023.10.31 11:39:36.628 Python  double ONNX model precision:  15

Fig.79. Resultados del TweedieRegressor.py (float ONNX)

2.1.24.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados tweedie_regressor_float.onnx y tweedie_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                             TweedieRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "TweedieRegressor"
#define   ONNXFilenameFloat  "tweedie_regressor_float.onnx"
#define   ONNXFilenameDouble "tweedie_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

2023.10.31 11:42:20.113 TweedieRegressor (EURUSD,H1)    Testing ONNX float: TweedieRegressor (tweedie_regressor_float.onnx)
2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962368338709323
2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3423970729788666
2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7970681819381653
2023.10.31 11:42:20.125 TweedieRegressor (EURUSD,H1)    
2023.10.31 11:42:20.125 TweedieRegressor (EURUSD,H1)    Testing ONNX double: TweedieRegressor (tweedie_regressor_double.onnx)
2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9962368328117072
2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 6.3423978976675608
2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 49.7970821984087593

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: TweedieRegressor (tweedie_regressor_float.onnx)
Python  Mean Absolute Error: 6.342397897667562
MQL5:   Mean Absolute Error: 6.3423970729788666

Testing ONNX double: TweedieRegressor (tweedie_regressor_double.onnx)
Python  Mean Absolute Error: 6.342397897667562
MQL5:   Mean Absolute Error: 6.3423978976675608

Precisión de ONNX float MAE: 6 decimales, Precisión de ONNX double MAE: 14 decimales.

2.1.24.3. Representación ONNX de tweedie_regressor_float.onnx y tweedie_regressor_double.onnx

Fig.80. Representación ONNX del tweedie_regressor_float.onnx en Netron

Fig.81. Representación ONNX del tweedie_regressor_double.onnx en Netron

2.1.25. sklearn.linear_model.PoissonRegressor

PoissonRegressor es un método de aprendizaje automático aplicado a resolver tareas de regresión basadas en la distribución de Poisson..

Este método es adecuado cuando la variable dependiente (variable objetivo) son datos de recuento, que representan el número de eventos ocurridos en un periodo de tiempo fijo o en un intervalo espacial fijo. PoissonRegressor modela la relación entre los predictores (variables independientes) y la variable objetivo asumiendo que esta relación se ajusta a la distribución de Poisson.

Cómo funciona PoissonRegressor:

Datos de entrada: Se parte de un conjunto de datos que incluye características (variables independientes) y la variable objetivo, que representa el recuento de eventos.
Distribución de Poisson: El método PoissonRegressor modela la variable objetivo asumiendo que sigue la distribución de Poisson. La distribución de Poisson es adecuada para modelizar sucesos que se producen con una intensidad media fija dentro de un intervalo de tiempo o un rango espacial determinados.
Entrenamiento del modelo: PoissonRegressor entrena un modelo que estima los parámetros de la distribución de Poisson, considerando los predictores. El modelo intenta encontrar el mejor ajuste para los datos observados utilizando la función de verosimilitud que corresponde a la distribución de Poisson.
Predicción de valores de recuento: Tras el entrenamiento, el modelo puede utilizarse para predecir valores de recuento (el número de eventos) en datos nuevos, y estas predicciones también siguen la distribución de Poisson.

Ventajas de PoissonRegressor:

Adecuado para datos de recuento: PoissonRegressor es adecuado para tareas en las que la variable objetivo representa datos de recuento, como el número de pedidos, llamadas, etc.
Especificidad de la distribución: Dado que el modelo se adhiere a la distribución de Poisson, puede ser más preciso para los datos que están bien descritos por esta distribución.

Limitaciones de PoissonRegressor:

Sólo es adecuado para datos de recuento: PoissonRegressor no es adecuado para la regresión cuando la variable objetivo es continua y no de recuento.
Dependencia de la selección de características: La calidad del modelo puede depender en gran medida de la selección y la ingeniería de las características.

PoissonRegressor es un método de aprendizaje automático utilizado para resolver tareas de regresión cuando la variable objetivo representa datos de recuento y se modela utilizando la distribución de Poisson. Este método es beneficioso para tareas relacionadas con sucesos que ocurren con una intensidad fija dentro de intervalos temporales o espaciales específicos.

2.1.25.1. Código para crear el modelo PoissonRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.PoissonRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# PoissonRegressor.py
# The code demonstrates the process of training PoissonRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PoissonRegressor"
onnx_model_filename = data_path + "poisson_regressor"

# create a PoissonRegressor model
regression_model = PoissonRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  PoissonRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9204304782362495
Python  Mean Absolute Error: 27.59790466048524
Python  Mean Squared Error: 1052.9242570153044
Python  
Python  PoissonRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\poisson_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9204305082536851
Python  Mean Absolute Error: 27.59790825165078
Python  Mean Squared Error: 1052.9238598018305
Python  R^2 matching decimal places:  6
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  2
Python  float ONNX model precision:  5
Python  
Python  PoissonRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\poisson_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9204304782362495
Python  Mean Absolute Error: 27.59790466048524
Python  Mean Squared Error: 1052.9242570153044
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  13
Python  double ONNX model precision:  14

Fig.82. Resultados del PoissonRegressor.py (float ONNX)

2.1.25.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados poisson_regressor_float.onnx y poisson_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                             PoissonRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "PoissonRegressor"
#define   ONNXFilenameFloat  "poisson_regressor_float.onnx"
#define   ONNXFilenameDouble "poisson_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

PoissonRegressor (EURUSD,H1)    Testing ONNX float: PoissonRegressor (poisson_regressor_float.onnx)
PoissonRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9204305082536851
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 27.5979082516507788
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 1052.9238598018305311
PoissonRegressor (EURUSD,H1)    
PoissonRegressor (EURUSD,H1)    Testing ONNX double: PoissonRegressor (poisson_regressor_double.onnx)
PoissonRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9204304782362493
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 27.5979046604852343
PoissonRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 1052.9242570153051020

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: PoissonRegressor (poisson_regressor_float.onnx)
Python  Mean Absolute Error: 27.59790466048524
MQL5:   Mean Absolute Error: 27.5979082516507788
    
Testing ONNX double: PoissonRegressor (poisson_regressor_double.onnx)
Python  Mean Absolute Error: 27.59790466048524
MQL5:   Mean Absolute Error: 27.5979046604852343

Precisión de ONNX float MAE: 5 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.25.3. Representación ONNX de poisson_regressor_float.onnx y poisson_regressor_double.onnx

Fig.83. Representación ONNX del poisson_regressor_float.onnx en Netron

Fig.84. Representación ONNX del poisson_regressor_double.onnx en Netron

2.1.26. sklearn.neighbors.RadiusNeighborsRegressor

RadiusNeighborsRegressor es un método de aprendizaje automático utilizado para tareas de regresión. Es una variante del método k-Nearest Neighbors (k-NN) diseñado para predecir valores de la variable objetivo basándose en los vecinos más cercanos en el espacio de características. Sin embargo, en lugar de un número fijo de vecinos (como en el método k-NN), RadiusNeighborsRegressor utiliza un radio fijo para determinar los vecinos de cada muestra.

Cómo funciona RadiusNeighborsRegressor:

Datos de entrada: Partiendo de un conjunto de datos que incluye características (variables independientes) y la variable objetivo (continua).
Establecer el radio: RadiusNeighborsRegressor requiere establecer un radio fijo para determinar los vecinos más cercanos para cada muestra en el espacio de características.
Definición de vecino: Para cada muestra, se determinan todos los puntos de datos dentro del radio especificado, convirtiéndose en vecinos de esa muestra.
Promedio ponderado: Para predecir el valor de la variable objetivo de cada muestra, se utilizan los valores de las variables objetivo de sus vecinas. Esto suele hacerse mediante promedios ponderados, en los que los pesos dependen de la distancia entre las muestras.
Predicción: Tras el entrenamiento, el modelo puede utilizarse para predecir los valores de la variable objetivo en nuevos datos basándose en los vecinos más próximos en el espacio de características.

Ventajas de RadiusNeighborsRegressor:

Versatilidad: RadiusNeighborsRegressor puede utilizarse para tareas de regresión, especialmente cuando el número de vecinos puede variar significativamente en función del radio.
Resistencia a los valores atípicos: Un enfoque basado en los vecinos puede ser resistente a los valores atípicos porque el modelo sólo tiene en cuenta los puntos de datos cercanos.

Limitaciones de RadiusNeighborsRegressor:

Dependencia de la selección del radio: Elegir el radio adecuado puede requerir afinar y experimentar.
Complejidad computacional: el manejo de grandes conjuntos de datos puede requerir importantes recursos computacionales.

RadiusNeighborsRegressor es un método de aprendizaje automático utilizado para tareas de regresión basado en el método k-Nearest Neighbors con un radio fijo. Este método puede ser valioso en situaciones en las que el número de vecinos puede cambiar en función del radio y en casos en los que los datos contienen valores atípicos.

2.1.26.1. Código para crear el RadiusNeighborsRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.neighbors.RadiusNeighborsRegressor, lo entrena en datos sintéticos, guarda el modelo en formato ONNX y realiza predicciones utilizando datos de entrada tanto flotantes como dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# RadiusNeighborsRegressor.py
# The code demonstrates the process of training RadiusNeighborsRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import RadiusNeighborsRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RadiusNeighborsRegressor"
onnx_model_filename = data_path + "radius_neighbors_regressor"

# create a RadiusNeighborsRegressor model
regression_model = RadiusNeighborsRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  RadiusNeighborsRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9999521132921395
Python  Mean Absolute Error: 0.591458244376554
Python  Mean Squared Error: 0.6336732353950723
Python  
Python  RadiusNeighborsRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\radius_neighbors_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999999971
Python  Mean Absolute Error: 4.393654615473253e-06
Python  Mean Squared Error: 3.829042036424747e-11
Python  R^2 matching decimal places:  4
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  RadiusNeighborsRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\radius_neighbors_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 1.0
Python  Mean Absolute Error: 0.0
Python  Mean Squared Error: 0.0
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  double ONNX model precision:  0

Fig.85. Resultados de RadiusNeighborsRegressor.py (float ONNX)

2.1.26.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados radius_neighbors_regressor_float.onnx y radius_neighbors_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                     RadiusNeighborsRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RadiusNeighborsRegressor"
#define   ONNXFilenameFloat  "radius_neighbors_regressor_float.onnx"
#define   ONNXFilenameDouble "radius_neighbors_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

RadiusNeighborsRegressor (EURUSD,H1)    Testing ONNX float: RadiusNeighborsRegressor (radius_neighbors_regressor_float.onnx)
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9999999999999971
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000043936546155
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000382904
RadiusNeighborsRegressor (EURUSD,H1)    
RadiusNeighborsRegressor (EURUSD,H1)    Testing ONNX double: RadiusNeighborsRegressor (radius_neighbors_regressor_double.onnx)
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 1.0000000000000000
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000000000000000
RadiusNeighborsRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000000000

2.1.26.3. Representación ONNX de radius_neighbors_regressor_float.onnx y radius_neighbors_regressor_double.onnx

Fig.86. Representación ONNX del radius_neighbors_regressor_float.onnx en Netron

Fig.87. Representación ONNX del radius_neighbors_regressor_double.onnx en Netron

2.1.27. sklearn.neighbors.KNeighborsRegressor

KNeighborsRegressor es un método de aprendizaje automático utilizado para tareas de regresión.

Pertenece a la categoría de algoritmos k-Nearest Neighbors (k-NN) y se utiliza para predecir valores numéricos de la variable objetivo basándose en la proximidad (similitud) entre objetos del conjunto de datos de entrenamiento.

Cómo funciona KNeighborsRegressor:

Datos de entrada: Comienza con el conjunto de datos inicial, que incluye características (variables independientes) y los valores correspondientes de la variable objetivo.
Selección del número de vecinos (k): Es necesario elegir el número de vecinos más cercanos (k) que se tendrán en cuenta durante la predicción. Este número es uno de los hiperparámetros del modelo.
Cálculo de la proximidad: Para los nuevos datos (puntos para los que se necesitan predicciones), se calcula la distancia o similitud entre estos datos y todos los objetos del conjunto de datos de entrenamiento.
Elección de los k vecinos más próximos: se seleccionan los k objetos del conjunto de datos de entrenamiento más próximos a los nuevos datos.
Predicción: En las tareas de regresión, la predicción del valor de la variable objetivo para los nuevos datos se calcula como el valor medio de las variables objetivo de los k vecinos más cercanos.

Ventajas de KNeighborsRegressor:

Facilidad de uso: KNeighborsRegressor es un algoritmo sencillo que no requiere un preprocesamiento complejo de los datos.
Naturaleza no paramétrica: El método no asume una forma funcional específica de dependencia entre las características y la variable objetivo, lo que permite modelar diversas relaciones.
Reproducibilidad: Los resultados de KNeighborsRegressor pueden reproducirse, ya que las predicciones se basan en la proximidad de los datos.

Limitaciones de KNeighborsRegressor:

Complejidad computacional: Calcular las distancias a todos los puntos del conjunto de datos de entrenamiento puede ser costoso desde el punto de vista computacional para grandes volúmenes de datos.
Sensibilidad a la elección del número de vecinos: La selección del valor óptimo de k requiere un ajuste y puede afectar significativamente al rendimiento del modelo.
Sensibilidad al ruido: El método puede ser sensible al ruido de los datos y a los valores atípicos.

KNeighborsRegressor es útil en tareas de regresión en las que es esencial tener en cuenta la vecindad de los objetos para predecir la variable objetivo. Puede ser especialmente útil en situaciones en las que la relación entre las características y la variable objetivo no es lineal y es compleja.

2.1.27.1. Código para crear el modelo KNeighborsRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.neighbors.KNeighborsRegressor, lo entrena en datos sintéticos, guarda el modelo en formato ONNX y realiza predicciones utilizando datos de entrada tanto flotantes como dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# KNeighborsRegressor.py
# The code demonstrates the process of training KNeighborsRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "KNeighborsRegressor"
onnx_model_filename = data_path + "kneighbors_regressor"

# create a KNeighbors Regressor model
kneighbors_model = KNeighborsRegressor(n_neighbors=5)

# fit the model to the data
kneighbors_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = kneighbors_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(kneighbors_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(kneighbors_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  KNeighborsRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9995599863346534
Python  Mean Absolute Error: 1.7414210057117578
Python  Mean Squared Error: 5.822594523532273
Python  
Python  KNeighborsRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\kneighbors_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9995599867417418
Python  Mean Absolute Error: 1.7414195457976402
Python  Mean Squared Error: 5.8225891366283875
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  4
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  4
Python  
Python  KNeighborsRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\kneighbors_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9995599863346534
Python  Mean Absolute Error: 1.7414210057117583
Python  Mean Squared Error: 5.822594523532269
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  14
Python  MSE matching decimal places:  13
Python  double ONNX model precision:  14

Fig.88. Resultados de KNeighborsRegressor.py (float ONNX)

2.1.27.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados kneighbors_regressor_float.onnx y kneighbors_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                          KNeighborsRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "KNeighborsRegressor"
#define   ONNXFilenameFloat  "kneighbors_regressor_float.onnx"
#define   ONNXFilenameDouble "kneighbors_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

KNeighborsRegressor (EURUSD,H1) Testing ONNX float: KNeighborsRegressor (kneighbors_regressor_float.onnx)
KNeighborsRegressor (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9995599860116634
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Absolute Error: 1.7414200607817711
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Squared Error: 5.8225987975798184
KNeighborsRegressor (EURUSD,H1) 
KNeighborsRegressor (EURUSD,H1) Testing ONNX double: KNeighborsRegressor (kneighbors_regressor_double.onnx)
KNeighborsRegressor (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9995599863346534
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Absolute Error: 1.7414210057117601
KNeighborsRegressor (EURUSD,H1) MQL5:   Mean Squared Error: 5.8225945235322705

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: KNeighborsRegressor (kneighbors_regressor_float.onnx)
Python  Mean Absolute Error: 1.7414210057117578
MQL5:   Mean Absolute Error: 1.7414200607817711
 
Testing ONNX double: KNeighborsRegressor (kneighbors_regressor_double.onnx)
Python  Mean Absolute Error: 1.7414210057117578
MQL5:   Mean Absolute Error: 1.7414210057117601

Precisión de ONNX float MAE: 5 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.27.3. Representación ONNX de kneighbors_regressor_float.onnx y kneighbors_regressor_double.onnx

Fig.89. Representación ONNX del kneighbors_regressor_float.onnx en Netron

Fig.90. Representación ONNX del kneighbors_regressor_double.onnx en Netron

2.1.28. sklearn.gaussian_process.GaussianProcessRegressor

GaussianProcessRegressor es un método de aprendizaje automático utilizado para tareas de regresión que permite modelar la incertidumbre en las predicciones.

El proceso gaussiano (PG) es una potente herramienta del aprendizaje automático bayesiano y se utiliza para modelar funciones complejas y predecir valores de variables objetivo teniendo en cuenta la incertidumbre.

Cómo funciona GaussianProcessRegressor:

Datos de entrada: Comienza con el conjunto de datos inicial, que incluye características (variables independientes) y los valores correspondientes de la variable objetivo.
Modelización del proceso gaussiano: El proceso gaussiano emplea un proceso gaussiano, que es una colección de variables aleatorias descritas por una distribución gaussiana (normal). PG modela no sólo los valores medios de cada punto de datos, sino también la covarianza (o similitud) entre estos puntos.
Elección de la función de covarianza: Un aspecto crucial de la PG es la selección de la función de covarianza (o kernel) que determina la interconexión y la fuerza entre los puntos de datos. Se pueden utilizar distintas funciones de covarianza en función de la naturaleza de los datos y de la tarea.
Entrenamiento del modelo: GaussianProcessRegressor entrena el PG utilizando los datos de entrenamiento. Durante el entrenamiento, el modelo ajusta los parámetros de la función de covarianza y evalúa la incertidumbre en las predicciones.
Predicción: Tras el entrenamiento el modelo puede utilizarse para predecir valores de variables objetivo para nuevos datos. Una característica importante de la PG es que predice no sólo el valor medio, sino también un intervalo de confianza que estima el nivel de confianza en las predicciones.

Ventajas de GaussianProcessRegressor:

Modelización de la incertidumbre: La PG permite tener en cuenta la incertidumbre en las predicciones, lo que resulta beneficioso en tareas en las que conocer la confianza en los valores predichos es crucial.
Flexibilidad: PG puede modelar varias funciones, y sus funciones de covarianza pueden adaptarse a diferentes tipos de datos.
Pocos hiperparámetros: PG tiene un número relativamente pequeño de hiperparámetros, lo que simplifica el ajuste del modelo.

Limitaciones de GaussianProcessRegressor:

Complejidad computacional: PG puede ser computacionalmente costosa, especialmente con un gran volumen de datos.
Ineficiencia en espacios de alta dimensionalidad: PG puede perder eficiencia en tareas con numerosas características debido a la maldición de la dimensionalidad.

GaussianProcessRegressor es útil en tareas de regresión en las que es crucial modelar la incertidumbre y proporcionar predicciones fiables. Este método se utiliza con frecuencia en el aprendizaje automático bayesiano y el metaanálisis.

2.1.28.1. Código para crear el modelo GaussianProcessRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.gaussian_process.GaussianProcessRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# GaussianProcessRegressor.py
# The code demonstrates the process of training GaussianProcessRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "GaussianProcessRegressor"
onnx_model_filename = data_path + "gaussian_process_regressor"

# create a GaussianProcessRegressor model
kernel = 1.0 * RBF()
gp_model = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)

# fit the model to the data
gp_model.fit(X, y)

# predict values for the entire dataset
y_pred = gp_model.predict(X, return_std=False)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(gp_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("ONNX: MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(gp_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  GaussianProcessRegressor Original model (double)
Python  R-squared (Coefficient of determination): 1.0
Python  Mean Absolute Error: 3.504041501400934e-13
Python  Mean Squared Error: 1.6396606443650807e-25
Python  
Python  GaussianProcessRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gaussian_process_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: GPmean, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999999936
Python  Mean Absolute Error: 6.454076974495848e-06
Python  Mean Squared Error: 8.493606782250733e-11
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  GaussianProcessRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gaussian_process_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: GPmean, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 1.0
Python  Mean Absolute Error: 3.504041501400934e-13
Python  Mean Squared Error: 1.6396606443650807e-25
Python  R^2 matching decimal places:  1
Python  MAE matching decimal places:  19
Python  MSE matching decimal places:  20
Python  double ONNX model precision:  19

Fig.91. Resultados del GaussianProcessRegressor.py (float ONNX)

2.1.28.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados gaussian_process_regressor_float.onnx y gaussian_process_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                     GaussianProcessRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "GaussianProcessRegressor"
#define   ONNXFilenameFloat  "gaussian_process_regressor_float.onnx"
#define   ONNXFilenameDouble "gaussian_process_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

GaussianProcessRegressor (EURUSD,H1)    Testing ONNX float: GaussianProcessRegressor (gaussian_process_regressor_float.onnx)
GaussianProcessRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9999999999999936
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000064540769745
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000849361
GaussianProcessRegressor (EURUSD,H1)    
GaussianProcessRegressor (EURUSD,H1)    Testing ONNX double: GaussianProcessRegressor (gaussian_process_regressor_double.onnx)
GaussianProcessRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 1.0000000000000000
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 0.0000000000003504
GaussianProcessRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 0.0000000000000000

2.1.28.3. Representación ONNX de gaussian_process_regressor_float.onnx y gaussian_process_regressor_double.onnx

Fig.92. Representación ONNX del gaussian_process_regressor_float.onnx en Netron

Fig.93. Representación ONNX del gaussian_process_regressor_double.onnx en Netron

2.1.29. sklearn.linear_model.GammaRegressor

GammaRegressor es un método de aprendizaje automático diseñado para tareas de regresión en las que la variable objetivo sigue una distribución gamma.

La distribución gamma es una distribución de probabilidad utilizada para modelizar variables aleatorias positivas y continuas. Este método permite modelizar y predecir valores numéricos positivos, como el coste, el tiempo o las proporciones.

Cómo funciona GammaRegressor:

Datos de entrada: Se parte del conjunto de datos inicial, donde hay características (variables independientes) y valores correspondientes de la variable objetivo siguiendo la distribución gamma.
Selección de la función de pérdida: GammaRegressor utiliza una función de pérdida que corresponde a la distribución gamma y considera las peculiaridades de esta distribución. Esto permite modelar los datos teniendo en cuenta la no negatividad y la desviación a la derecha de la distribución gamma.
Entrenamiento del modelo: El modelo se entrena con los datos utilizando la función de pérdida elegida. Durante el entrenamiento, ajusta los parámetros del modelo para minimizar la función de pérdida.
Predicción: Tras el entrenamiento, el modelo puede utilizarse para predecir los valores de la variable objetivo para nuevos datos.

Ventajas de GammaRegressor:

Modelización de valores positivos: Este método está diseñado específicamente para modelar valores numéricos positivos, lo que puede ser útil en tareas en las que la variable objetivo tiene un límite inferior.
Consideración de la forma de la distribución gamma GammaRegressor tiene en cuenta las características de la distribución gamma, lo que permite modelar con mayor precisión los datos que siguen esta distribución.
Utilidad en econometría e investigación médica: La distribución gamma se utiliza con frecuencia para modelizar costes, tiempos de espera y otras variables aleatorias positivas en econometría e investigación médica.

Limitaciones de GammaRegressor:

Limitación del tipo de datos: Este método sólo es adecuado para tareas de regresión en las que la variable objetivo sigue la distribución gamma o distribuciones similares. Para los datos que no se ajustan a dicha distribución, este método puede no ser eficaz.
Requiere elegir una función de pérdida: Elegir una función de pérdida adecuada puede requerir conocer la distribución de la variable objetivo y sus características.

GammaRegressor es útil en tareas en las que se necesita modelar y predecir valores numéricos positivos que se alineen con la distribución gamma.

2.1.29.1. Código para crear el modelo GammaRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.GammaRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# GammaRegressor.py
# The code demonstrates the process of training GammaRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import GammaRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 10+4*X + 10*np.sin(X*0.5)

model_name = "GammaRegressor"
onnx_model_filename = data_path + "gamma_regressor"

# create a Gamma Regressor model
regression_model = GammaRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  GammaRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.7963797339354436
Python  Mean Absolute Error: 37.266200319422815
Python  Mean Squared Error: 2694.457784927322
Python  
Python  GammaRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gamma_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.7963795030042045
Python  Mean Absolute Error: 37.266211754095956
Python  Mean Squared Error: 2694.4608407846144
Python  R^2 matching decimal places:  6
Python  MAE matching decimal places:  4
Python  MSE matching decimal places:  1
Python  float ONNX model precision:  4
Python  
Python  GammaRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gamma_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.7963797339354436
Python  Mean Absolute Error: 37.266200319422815
Python  Mean Squared Error: 2694.457784927322
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  15
Python  MSE matching decimal places:  12
Python  double ONNX model precision:  15

Fig.94. Resultados del GammaRegressor.py (float ONNX)

2.1.29.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados gamma_regressor_float.onnx y gamma_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                               GammaRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "GammaRegressor"
#define   ONNXFilenameFloat  "gamma_regressor_float.onnx"
#define   ONNXFilenameDouble "gamma_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(10+4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

GammaRegressor (EURUSD,H1)      Testing ONNX float: GammaRegressor (gamma_regressor_float.onnx)
GammaRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.7963795030042045
GammaRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 37.2662117540959628
GammaRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 2694.4608407846144473
GammaRegressor (EURUSD,H1)      
GammaRegressor (EURUSD,H1)      Testing ONNX double: GammaRegressor (gamma_regressor_double.onnx)
GammaRegressor (EURUSD,H1)      MQL5:   R-Squared (Coefficient of determination): 0.7963797339354435
GammaRegressor (EURUSD,H1)      MQL5:   Mean Absolute Error: 37.2662003194228220
GammaRegressor (EURUSD,H1)      MQL5:   Mean Squared Error: 2694.4577849273218817

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: GammaRegressor (gamma_regressor_float.onnx)
Python  Mean Absolute Error: 37.266200319422815
MQL5:   Mean Absolute Error: 37.2662117540959628
      
Testing ONNX double: GammaRegressor (gamma_regressor_double.onnx)
Python  Mean Absolute Error: 37.266200319422815
MQL5:   Mean Absolute Error: 37.2662003194228220

Precisión de ONNX float MAE: 4 decimales, Precisión de ONNX double MAE: 13 decimales.

2.1.29.3. Representación ONNX de gamma_regressor_float.onnx y gamma_regressor_double.onnx

Fig.95. Representación ONNX del gamma_regressor_float.onnx en Netron

Fig.96. Representación ONNX del gamma_regressor_double.onnx en Netron

2.1.30. sklearn.linear_model.SGDRegressor

SGDRegressor es un método de regresión que utiliza SGD (Stochastic Gradient Descent / Descenso gradiente estocástico) para entrenar un modelo de regresión. Forma parte de la familia de los modelos lineales y puede emplearse para tareas de regresión. Los principales atributos del SGDRegressor son su eficacia y su capacidad para manejar grandes volúmenes de datos.

Cómo funciona SGDRegressor:

Regresión lineal: De forma similar a Ridge y Lasso, SGDRegressor pretende encontrar una relación lineal entre las variables independientes (características) y la variable objetivo en un problema de regresión.
Descenso de gradiente estocástico: La base del SGDRegressor es el descenso de gradiente estocástico. En lugar de calcular los gradientes en todo el conjunto de datos de entrenamiento, actualiza el modelo basándose en minilotes de datos seleccionados aleatoriamente. Esto permite entrenar eficazmente los modelos y trabajar con conjuntos de datos considerables.
Regularización: SGDRegressor admite la regularización L1 y L2 (Lasso y Ridge). Esto ayuda a controlar el sobreajuste y mejora la estabilidad del modelo.
Hiperparámetros: Al igual que Ridge y Lasso, SGDRegressor permite ajustar hiperparámetros como el parámetro de regularización (α, alfa) y el tipo de regularización.

Ventajas del SGDRegressor:

Eficacia: SGDRegressor funciona bien con grandes conjuntos de datos y entrena eficazmente modelos con datos extensos.
Capacidad de regularización: La opción de aplicar regularización L1 y L2 hace que este método sea adecuado para gestionar problemas de sobreajuste.
Descenso de gradiente adaptativo: El descenso de gradiente estocástico permite adaptarse a datos cambiantes y entrenar modelos sobre la marcha.

Limitaciones del SGDRegressor:

Sensibilidad a la elección de hiperparámetros: el ajuste de hiperparámetros como la tasa de aprendizaje y el coeficiente de regularización puede requerir experimentación.
No siempre converge al mínimo global: Debido a la naturaleza estocástica del descenso de gradiente, el SGDRegressor no siempre converge al mínimo global de la función de pérdida.

SGDRegressor es un método de regresión que utiliza el descenso de gradiente estocástico para entrenar un modelo de regresión. Es eficiente, capaz de manejar grandes conjuntos de datos y admite la regularización para gestionar el sobreajuste.

2.1.30.1. Código para crear el modelo SGDRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.SGDRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# SGDRegressor2.py
# The code demonstrates the process of training SGDRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,10,0.1).reshape(-1,1)
y = 4*X + np.sin(X*10)

model_name = "SGDRegressor"
onnx_model_filename = data_path + "sgd_regressor"

# create an SGDRegressor model
regression_model = SGDRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  SGDRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9961197872743282
Python  Mean Absolute Error: 0.6405924406136998
Python  Mean Squared Error: 0.5169867345998348
Python  
Python  SGDRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\sgd_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9961197876338647
Python  Mean Absolute Error: 0.6405924014799271
Python  Mean Squared Error: 0.5169866866963753
Python  R^2 matching decimal places:  9
Python  MAE matching decimal places:  7
Python  MSE matching decimal places:  6
Python  float ONNX model precision:  7
Python  
Python  SGDRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\sgd_regressor_double.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: double_input, Data Type: tensor(double), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(double), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9961197872743282
Python  Mean Absolute Error: 0.6405924406136998
Python  Mean Squared Error: 0.5169867345998348
Python  R^2 matching decimal places:  16
Python  MAE matching decimal places:  16
Python  MSE matching decimal places:  16
Python  double ONNX model precision:  16

Fig.97. Resultados del SGDRegressor.py (float ONNX)

2.1.30.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados sgd_regressor_float.onnx y sgd_rgressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                 SGDRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "SGDRegressor"
#define   ONNXFilenameFloat  "sgd_regressor_float.onnx"
#define   ONNXFilenameDouble "sgd_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i*0.1;
      y[i]=(double)(4*x[i] + sin(x[i]*10));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

SGDRegressor (EURUSD,H1)        Testing ONNX float: SGDRegressor (sgd_regressor_float.onnx)
SGDRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9961197876338647
SGDRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 0.6405924014799272
SGDRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 0.5169866866963754
SGDRegressor (EURUSD,H1)        
SGDRegressor (EURUSD,H1)        Testing ONNX double: SGDRegressor (sgd_regressor_double.onnx)
SGDRegressor (EURUSD,H1)        MQL5:   R-Squared (Coefficient of determination): 0.9961197872743282
SGDRegressor (EURUSD,H1)        MQL5:   Mean Absolute Error: 0.6405924406136998
SGDRegressor (EURUSD,H1)        MQL5:   Mean Squared Error: 0.5169867345998348

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: SGDRegressor (sgd_regressor_float.onnx)
Python  Mean Absolute Error: 0.6405924406136998
MQL5:   Mean Absolute Error: 0.6405924014799272
        
Testing ONNX double: SGDRegressor (sgd_regressor_double.onnx)
Python  Mean Absolute Error: 0.6405924406136998
MQL5:   Mean Absolute Error: 0.6405924406136998

Precisión de ONNX float MAE: 7 decimales, Precisión de ONNX double MAE: 16 decimales.

2.1.30.3. Representación ONNX de sgd_regressor_float.onnx y sgd_regressor_double.onnx

Fig.98. Representación ONNX del sgd_regressor_float.onnx en Netron

Fig.99. Representación ONNX del sgd_rgressor_double.onnx en Netron

2.2. Modelos de regresión de la biblioteca Scikit-learn convertidos sólo en modelos ONNX de precisión flotante

Esta sección cubre los modelos que sólo pueden funcionar con precisión de flotador. Convertirlos a ONNX con doble precisión conduce a errores relacionados con las limitaciones del subconjunto ai.onnx.ml de operadores ONNX.

2.2.1. sklearn.linear_model.AdaBoostRegressor

AdaBoostRegressor - es un método de aprendizaje automático utilizado para la regresión, que implica la predicción de valores numéricos (por ejemplo, precios inmobiliarios, volúmenes de ventas, etc.).

Este método es una variación del algoritmo AdaBoost (Adaptive Boosting / Impulso adaptativo), desarrollado inicialmente para tareas de clasificación.

Cómo funciona AdaBoostRegressor:

Conjunto de datos original: Comienza con el conjunto de datos original que contiene características (variables independientes) y sus correspondientes variables objetivo (variables dependientes que pretendemos predecir).
Inicialización ponderada: Inicialmente, cada punto de datos (observación) tiene pesos iguales, y el modelo se construye basándose en este conjunto de datos ponderados.
Entrenamiento de aprendices débiles: AdaBoostRegressor construye varios modelos de regresión débiles (por ejemplo, árboles de decisión) que intentan predecir la variable objetivo. Estos modelos se denominan "aprendices débiles". Cada aprendiz débil se entrena con los datos teniendo en cuenta los pesos de cada observación.
Selección de los pesos de los aprendices débiles: AdaBoostRegressor calcula los pesos de cada aprendiz débil en función de su rendimiento en las predicciones. Los alumnos que hacen predicciones más precisas reciben ponderaciones más altas, y viceversa.
Actualización de las ponderaciones de las observaciones: Las ponderaciones de las observaciones se actualizan de modo que las observaciones que anteriormente se predijeron incorrectamente reciban mayores ponderaciones, aumentando así su importancia para el siguiente modelo.
Predicción final: AdaBoostRegressor combina las predicciones de todos los aprendices débiles, asignando pesos en función de su rendimiento. El resultado es la predicción final del modelo.

Ventajas de AdaBoostRegressor:

Adaptabilidad: AdaBoostRegressor se adapta a funciones complejas y trata mejor las relaciones no lineales.
Reducción del sobreajuste: AdaBoostRegressor utiliza la regularización a través de la actualización de los pesos de observación, ayudando a prevenir el sobreajuste.
Conjunto potente: Mediante la combinación de múltiples modelos débiles, AdaBoostRegressor puede crear modelos fuertes que pueden predecir la variable objetivo con bastante precisión.

Limitaciones de AdaBoostRegressor:

Sensibilidad a los valores atípicos: AdaBoostRegressor es sensible a los valores atípicos en los datos, lo que afecta a la calidad de la predicción.
Costes computacionales elevados: Construir múltiples aprendices débiles puede requerir más recursos computacionales y tiempo.
No siempre es la mejor opción: AdaBoostRegressor no siempre es la elección óptima y, en algunos casos, otros métodos de regresión podrían funcionar mejor.

AdaBoostRegressor es un útil método de aprendizaje automático aplicable a diversas tareas de regresión, especialmente en situaciones en las que los datos contienen dependencias complejas.

2.2.1.1. Código para crear el modelo AdaBoostRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.AdaBoostRegressor, lo entrena en datos sintéticos, guarda el modelo en formato ONNX, y realiza predicciones usando datos de entrada float y double. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# AdaBoostRegressor.py
# The code demonstrates the process of training AdaBoostRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "AdaBoostRegressor"
onnx_model_filename = data_path + "adaboost_regressor"

# create an AdaBoostRegressor model
regression_model = AdaBoostRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  AdaBoostRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9991257208809748
Python  Mean Absolute Error: 2.3678022748065457
Python  Mean Squared Error: 11.569124350863143
Python  
Python  AdaBoostRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9991257199849699
Python  Mean Absolute Error: 2.36780399225718
Python  Mean Squared Error: 11.569136207480646
Python  R^2 matching decimal places:  7
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  AdaBoostRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_double.onnx

Aquí el modelo se exportó a modelos ONNX para float y double. El modelo flotante de ONNX se ejecutó correctamente, mientras que se produjo un error de ejecución con el modelo doble (errores en la pestaña Errores):

AdaBoostRegressor.py started    AdaBoostRegressor.py    1       1
Traceback (most recent call last):      AdaBoostRegressor.py    1       1
    onnx_session = ort.InferenceSession(onnx_filename)  AdaBoostRegressor.py    159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_double.onnx failed:Type Error:       onnxruntime_inference_collection.py     424     1
AdaBoostRegressor.py finished in 3207 ms                5       1

Fig.100. Resultados del AdaBoostRegressor.py (float ONNX)

2.2.1.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados adaboost_regressor_float.onnx y adaboost_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                            AdaBoostRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "AdaBoostRegressor"
#define   ONNXFilenameFloat  "adaboost_regressor_float.onnx"
#define   ONNXFilenameDouble "adaboost_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

AdaBoostRegressor (EURUSD,H1)   
AdaBoostRegressor (EURUSD,H1)   Testing ONNX float: AdaBoostRegressor (adaboost_regressor_float.onnx)
AdaBoostRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9991257199849699
AdaBoostRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 2.3678039922571803
AdaBoostRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 11.5691362074806463
AdaBoostRegressor (EURUSD,H1)   
AdaBoostRegressor (EURUSD,H1)   Testing ONNX double: AdaBoostRegressor (adaboost_regressor_double.onnx)
AdaBoostRegressor (EURUSD,H1)   ONNX: cannot create session (OrtStatus: 1 'Type Error: Type parameter (T) of Optype (Mul) bound to different types (tensor(float) and tensor(double) in node (Mul).'), inspect code 'Scripts\Regression\AdaBoostRegressor.mq5' (133:16)
AdaBoostRegressor (EURUSD,H1)   model_name=AdaBoostRegressor OnnxCreate error 5800

El modelo flotante de ONNX se ejecutó correctamente, mientras que se produjo un error de ejecución con el modelo doble.

2.2.1.3. Representación ONNX de adaboost_regressor_float.onnx y adaboost_regressor_double.onnx

Fig.101. Representación ONNX del adaboost_regressor_float.onnx en Netron

Fig.102. Representación ONNX del adaboost_regressor_double.onnx en Netron

2.2.2. sklearn.linear_model.BaggingRegressor

BaggingRegressor es un método de aprendizaje automático utilizado para tareas de regresión.

Representa un método ensemble basado en la idea de "bagging" (Bootstrap Aggregating / Agregación por "bootstrap"), que consiste en construir múltiples modelos de regresión base y combinar sus predicciones para obtener un resultado más estable y preciso.

Cómo funciona BaggingRegressor:

Conjunto de datos original: Comienza con el conjunto de datos original que contiene características (variables independientes) y sus correspondientes variables objetivo (variables dependientes que pretendemos predecir).
Generación de subconjuntos: BaggingRegressor crea aleatoriamente varios subconjuntos (muestras con reemplazo) a partir de los datos originales. Cada subconjunto contiene un conjunto aleatorio de observaciones de los datos originales.
Formación de modelos de regresión base: Para cada subconjunto, BaggingRegressor construye un modelo de regresión base independiente (por ejemplo, árbol de decisión, bosque aleatorio, modelo de regresión lineal, etc.).
Predicciones a partir de modelos base: Cada modelo base se utiliza para predecir la variable objetivo basándose en el subconjunto correspondiente.
Promedio o combinación: BaggingRegressor promedia o combina las predicciones de todos los modelos base para obtener la predicción de regresión final.

Ventajas del BaggingRegressor:

Reducción de la varianza: BaggingRegressor reduce la varianza del modelo, haciéndolo más robusto a las fluctuaciones de los datos.
Reducción del sobreajuste: Como el modelo se entrena en diferentes subconjuntos de datos, BaggingRegressor suele reducir el riesgo de sobreajuste.
Generalización mejorada: Al combinar predicciones de varios modelos, BaggingRegressor suele proporcionar previsiones más precisas y estables.
Amplia gama de modelos base: BaggingRegressor puede utilizar distintos tipos de modelos de regresión base, lo que lo convierte en un método flexible.

Limitaciones de BaggingRegressor:

No siempre es capaz de mejorar el rendimiento cuando el modelo de base ya funciona bien con los datos.
BaggingRegressor podría requerir más recursos computacionales y tiempo en comparación con el entrenamiento de un único modelo.

BaggingRegressor es un potente método de aprendizaje automático que puede ser beneficioso en tareas de regresión, especialmente con datos ruidosos y la necesidad de mejorar la estabilidad de la predicción.

2.2.2.1. Código para crear el modelo BaggingRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.BaggingRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# BaggingRegressor.py
# The code demonstrates the process of training BaggingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "BaggingRegressor"
onnx_model_filename = data_path + "bagging_regressor"

# create a Bagging Regressor model
regression_model = BaggingRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  
Python  BaggingRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9998128324923137
Python  Mean Absolute Error: 1.0257279210387649
Python  Mean Squared Error: 2.4767424083953005
Python  
Python  BaggingRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9998128317934672
Python  Mean Absolute Error: 1.0257282792130034
Python  Mean Squared Error: 2.4767516560614187
Python  R^2 matching decimal laces:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  4
Python  float ONNX model precision:  5
Python  
Python  BaggingRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_double.onnx

Pestaña Errores:

BaggingRegressor.py started     BaggingRegressor.py     1       1
Traceback (most recent call last):      BaggingRegressor.py     1       1
    onnx_session = ort.InferenceSession(onnx_filename)  BaggingRegressor.py     161     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_double.onnx failed:Type Error: T      onnxruntime_inference_collection.py     424     1
BaggingRegressor.py finished in 3173 ms         5       1

Fig.103. Resultados del BaggingRegressor.py (float ONNX)

2.2.2.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los guardados bagging_regressor_float.onnx y bagging_regressor_double.onnx y demostrando el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                             BaggingRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "BaggingRegressor"
#define   ONNXFilenameFloat  "bagging_regressor_float.onnx"
#define   ONNXFilenameDouble "bagging_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

BaggingRegressor (EURUSD,H1)    Testing ONNX float: BaggingRegressor (bagging_regressor_float.onnx)
BaggingRegressor (EURUSD,H1)    MQL5:   R-Squared (Coefficient of determination): 0.9998128317934672
BaggingRegressor (EURUSD,H1)    MQL5:   Mean Absolute Error: 1.0257282792130034
BaggingRegressor (EURUSD,H1)    MQL5:   Mean Squared Error: 2.4767516560614196
BaggingRegressor (EURUSD,H1)    
BaggingRegressor (EURUSD,H1)    Testing ONNX double: BaggingRegressor (bagging_regressor_double.onnx)
BaggingRegressor (EURUSD,H1)    ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (ReduceMean) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\BaggingRegressor.mq5' (133:16)
BaggingRegressor (EURUSD,H1)    model_name=BaggingRegressor OnnxCreate error 5800

El modelo ONNX calculado en float se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo en double.

2.2.2.3. Representación ONNX de bagging_regressor_float.onnx y bagging_regressor_double.onnx

Fig.104. ONNX representation of bagging_regressor_float.onnx in Netron

Fig.105. Representación ONNX del bagging_regressor_double.onnx en Netron

2.2.3. sklearn.linear_model.DecisionTreeRegressor

DecisionTreeRegressor es un método de aprendizaje automático utilizado para tareas de regresión, prediciendo valores numéricos de la variable objetivo basándose en un conjunto de características (variables independientes).

Este método se basa en la construcción de árboles de decisión que dividen el espacio de características en intervalos y predicen el valor de la variable objetivo para cada intervalo.

Principio de funcionamiento de DecisionTreeRegressor:

Inicio de la construcción: Partiendo del conjunto de datos inicial que contiene las características (variables independientes) y los valores correspondientes de la variable objetivo.
Selección de características y división: El árbol de decisión selecciona una característica y un valor umbral que divide los datos en dos o más subgrupos. Esta división se realiza para minimizar el error cuadrático medio (la desviación cuadrática media entre los valores previstos y reales de la variable objetivo) dentro de cada subgrupo.
Construcción recursiva: El proceso de selección y división de características se repite para cada subgrupo, creando subárboles. Este proceso se realiza de forma recursiva hasta que se cumplen determinados criterios de parada, como la profundidad máxima del árbol o el mínimo de muestras en un nodo.
Nodos hoja: Cuando se cumplen los criterios de parada, se crean los nodos hoja, que predicen los valores numéricos de la variable objetivo para las muestras que entran en un nodo hoja determinado.
Predicción: Para los nuevos datos, se aplica el árbol de decisión, y las nuevas observaciones recorren el árbol hasta llegar a un nodo hoja que predice el valor numérico de la variable objetivo.

Ventajas de DecisionTreeRegressor:

Interpretabilidad: Los árboles de decisión son fáciles de entender y visualizar, lo que los hace útiles para explicar la toma de decisiones mediante modelos.
Robustez frente a valores atípicos: Los árboles de decisión pueden ser robustos a los datos atípicos.
Manejo de datos numéricos y categóricos: Los árboles de decisión pueden procesar características tanto numéricas como categóricas sin preprocesamiento adicional.
Selección automática de características: Los árboles pueden seleccionar automáticamente las características importantes, ignorando las menos relevantes.

Limitaciones de DecisionTreeRegressor:

Vulnerabilidad al sobreajuste: Los árboles de decisión pueden ser propensos al sobreajuste, especialmente si son demasiado profundos.
Problemas de generalización: Los árboles de decisión pueden no generalizar bien a datos no incluidos en el conjunto de entrenamiento.
No siempre es una opción óptima: en algunos casos, otros métodos de regresión, como la regresión lineal o k-nearest neighbors, pueden dar mejores resultados.

DecisionTreeRegressor es un método valioso para tareas de regresión, especialmente cuando es crucial comprender la lógica de toma de decisiones del modelo y visualizar el proceso.

2.2.3.1. Código para crear el modelo DecisionTreeRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.linear_model.DecisionTreeRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX, y realiza predicciones usando datos de entrada float y double. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# DecisionTreeRegressor.py
# The code demonstrates the process of training DecisionTreeRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "DecisionTreeRegressor"
onnx_model_filename = data_path + "decision_tree_regressor"

# create a Decision Tree Regressor model
regression_model = DecisionTreeRegressor()

# fit the model to the data
regression_model.fit(X, y)

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  DecisionTreeRegressor Original model (double)
Python  R-squared (Coefficient of determination): 1.0
Python  Mean Absolute Error: 0.0
Python  Mean Squared Error: 0.0
Python  
Python  DecisionTreeRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999999971
Python  Mean Absolute Error: 4.393654615473253e-06
Python  Mean Squared Error: 3.829042036424747e-11
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  DecisionTreeRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_double.onnx

Pestaña Errores:

DecisionTreeRegressor.py started        DecisionTreeRegressor.py        1       1
Traceback (most recent call last):      DecisionTreeRegressor.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  DecisionTreeRegressor.py        160     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_double.onnx failed:Type Er      onnxruntime_inference_collection.py     424     1
DecisionTreeRegressor.py finished in 2957 ms            5       1

Fig.106. Resultados del DecisionTreeRegressor.py (float ONNX)

2.2.3.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados decision_tree_regressor_float.onnx y decision_tree_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                        DecisionTreeRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "DecisionTreeRegressor"
#define   ONNXFilenameFloat  "decision_tree_regressor_float.onnx"
#define   ONNXFilenameDouble "decision_tree_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

DecisionTreeRegressor (EURUSD,H1)       Testing ONNX float: DecisionTreeRegressor (decision_tree_regressor_float.onnx)
DecisionTreeRegressor (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9999999999999971
DecisionTreeRegressor (EURUSD,H1)       MQL5:   Mean Absolute Error: 0.0000043936546155
DecisionTreeRegressor (EURUSD,H1)       MQL5:   Mean Squared Error: 0.0000000000382904
DecisionTreeRegressor (EURUSD,H1)       
DecisionTreeRegressor (EURUSD,H1)       Testing ONNX double: DecisionTreeRegressor (decision_tree_regressor_double.onnx)
DecisionTreeRegressor (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\DecisionTreeRegressor.mq5' (133:16)
DecisionTreeRegressor (EURUSD,H1)       model_name=DecisionTreeRegressor OnnxCreate error 5800

El modelo ONNX calculado en float se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo en double.

2.2.3.3. Representación ONNX de decision_tree_regressor_float.onnx y decision_tree_regressor_double.onnx

Fig.107. Representación ONNX de decision_tree_regressor_float.onnx en Netron

Fig.108. Representación ONNX de decision_tree_regressor_double.onnx en Netron

2.2.4. sklearn.tree.ExtraTreeRegressor

ExtraTreeRegressor, o "Extremely Randomized Trees Regressor" (Regresor de árboles extremadamente aleatorizados), es un método de regresión ensemble basado en árboles de decisión.

Este método es una variación de los bosques aleatorios y difiere en que, en lugar de elegir la mejor división para cada nodo del árbol, utiliza divisiones aleatorias para cada nodo. Esto lo hace más aleatorio y rápido, lo que puede ser ventajoso en determinadas situaciones.

Principio de funcionamiento de ExtraTreeRegressor:

Inicio de la construcción: Partiendo del conjunto de datos inicial que contiene las características (variables independientes) y los valores correspondientes de la variable objetivo.
Aleatoriedad en las divisiones: A diferencia de los árboles de decisión normales, en los que se elige la mejor división, ExtraTreeRegressor utiliza valores de umbral aleatorios para dividir los nodos del árbol. Esto hace que el proceso de división sea más aleatorio y menos propenso al sobreajuste.
Construcción del árbol: El árbol se construye dividiendo los nodos en función de características aleatorias y valores umbral. Este proceso continúa hasta que se cumplen determinados criterios de parada, como la profundidad máxima del árbol o el número mínimo de muestras en un nodo.
Conjunto de árboles: ExtraTreeRegressor construye múltiples árboles aleatorios de este tipo, cuyo número se controla mediante el hiperparámetro "n_estimators".
Predicción: Para predecir la variable objetivo para nuevos datos, ExtraTreeRegressor simplemente promedia las predicciones de todos los árboles del conjunto.

Ventajas de ExtraTreeRegressor:

Reducción del sobreajuste: El uso de divisiones de nodos aleatorias hace que el método sea menos propenso al sobreajuste en comparación con los árboles de decisión normales.
Alta paralelización: Dado que los árboles se construyen de forma independiente, ExtraTreeRegressor puede paralelizarse fácilmente para el entrenamiento en múltiples procesadores.
Entrenamiento rápido: Comparado con otros métodos como el gradient boosting, ExtraTreeRegressor puede entrenarse más rápido.

Limitaciones de ExtraTreeRegressor:

Puede ser menos preciso: En algunos casos, especialmente con conjuntos de datos pequeños, ExtraTreeRegressor puede ser menos preciso en comparación con métodos más complejos.
Menos interpretable: En comparación con los modelos lineales, los árboles de decisión y otros métodos más sencillos, ExtraTreeRegressor suele ser menos interpretable.

ExtraTreeRegressor puede ser un método útil para la regresión en situaciones en las que se necesita reducir el sobreajuste y un entrenamiento rápido.

2.2.4.1. Código para crear el modelo ExtraTreeRegressor y exportarlo a ONNX para float y double

This code creates the sklearn.tree.ExtraTreeRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# ExtraTreeRegressor.py
# The code demonstrates the process of training ExtraTreeRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import ExtraTreeRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ExtraTreeRegressor"
onnx_model_filename = data_path + "extra_tree_regressor"

# create an ExtraTreeRegressor model
regression_model = ExtraTreeRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

2023.10.30 14:40:57.665 Python  ExtraTreeRegressor Original model (double)
2023.10.30 14:40:57.665 Python  R-squared (Coefficient of determination): 1.0
2023.10.30 14:40:57.665 Python  Mean Absolute Error: 0.0
2023.10.30 14:40:57.665 Python  Mean Squared Error: 0.0
2023.10.30 14:40:57.681 Python  
2023.10.30 14:40:57.681 Python  ExtraTreeRegressor ONNX model (float)
2023.10.30 14:40:57.681 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_float.onnx
2023.10.30 14:40:57.681 Python  Information about input tensors in ONNX:
2023.10.30 14:40:57.681 Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
2023.10.30 14:40:57.681 Python  Information about output tensors in ONNX:
2023.10.30 14:40:57.681 Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
2023.10.30 14:40:57.681 Python  R-squared (Coefficient of determination) 0.9999999999999971
2023.10.30 14:40:57.681 Python  Mean Absolute Error: 4.393654615473253e-06
2023.10.30 14:40:57.681 Python  Mean Squared Error: 3.829042036424747e-11
2023.10.30 14:40:57.681 Python  R^2 matching decimal places:  0
2023.10.30 14:40:57.681 Python  MAE matching decimal places:  0
2023.10.30 14:40:57.681 Python  MSE matching decimal places:  0
2023.10.30 14:40:57.681 Python  float ONNX model precision:  0
2023.10.30 14:40:58.011 Python  
2023.10.30 14:40:58.011 Python  ExtraTreeRegressor ONNX model (double)
2023.10.30 14:40:58.011 Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_double.onnx

Pestaña Errores:

ExtraTreeRegressor.py started   ExtraTreeRegressor.py   1       1
Traceback (most recent call last):      ExtraTreeRegressor.py   1       1
    onnx_session = ort.InferenceSession(onnx_filename)  ExtraTreeRegressor.py   159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_double.onnx failed:Type Error      onnxruntime_inference_collection.py     424     1
ExtraTreeRegressor.py finished in 2980 ms               5       1

Fig.109. Resultados del ExtraTreeRegressor.py (float ONNX)

2.2.4.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados extra_tree_regressor_float.onnx y extra_tree_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                           ExtraTreeRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ExtraTreeRegressor"
#define   ONNXFilenameFloat  "extra_tree_regressor_float.onnx"
#define   ONNXFilenameDouble "extra_tree_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

ExtraTreeRegressor (EURUSD,H1)  Testing ONNX float: ExtraTreeRegressor (extra_tree_regressor_float.onnx)
ExtraTreeRegressor (EURUSD,H1)  MQL5:   R-Squared (Coefficient of determination): 0.9999999999999971
ExtraTreeRegressor (EURUSD,H1)  MQL5:   Mean Absolute Error: 0.0000043936546155
ExtraTreeRegressor (EURUSD,H1)  MQL5:   Mean Squared Error: 0.0000000000382904
ExtraTreeRegressor (EURUSD,H1)  
ExtraTreeRegressor (EURUSD,H1)  Testing ONNX double: ExtraTreeRegressor (extra_tree_regressor_double.onnx)
ExtraTreeRegressor (EURUSD,H1)  ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\ExtraTreeRegressor.mq5' (133:16)
ExtraTreeRegressor (EURUSD,H1)  model_name=ExtraTreeRegressor OnnxCreate error 5800

El modelo ONNX en flotante se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo ONNX en doble.

2.2.4.3. Representación ONNX extra_tree_regressor_float.onnx y extra_tree_regressor_double.onnx

Fig.110. Representación ONNX del extra_tree_regressor_float.onnx en Netron

Fig.111. Representación ONNX del extra_tree_regressor_double.onnx en Netron

2.2.5. sklearn.ensemble.ExtraTreesRegressor

ExtraTreesRegressor (Extremely Randomized Trees Regressor / Regresor de árboles extremadamente aleatorizados) es un método de aprendizaje automático que representa una variación de Random Forests para tareas de regresión.

Este método emplea un conjunto de árboles de decisión para predecir valores numéricos de la variable objetivo basándose en un conjunto de características.

Cómo funciona ExtraTreesRegressor:

Inicio de la construcción: Comienza con el conjunto de datos original, incluyendo características (variables independientes) y sus correspondientes valores de la variable objetivo.
Aleatoriedad en las divisiones: A diferencia de los árboles de decisión normales, en los que se selecciona la mejor división para dividir los nodos, ExtraTreesRegressor utiliza valores de umbral aleatorios para dividir los nodos del árbol. Esta aleatoriedad hace que el proceso de división sea más arbitrario y menos propenso al sobreajuste.
Construcción de árboles: ExtraTreesRegressor construye múltiples árboles de decisión en el conjunto. El número de árboles se controla mediante el hiperparámetro "n_estimators". Cada árbol se entrena con una submuestra aleatoria de datos (con reemplazo) y subconjuntos aleatorios de características.
Predicción: Para predecir la variable objetivo para los nuevos datos, ExtraTreesRegressor agrega las predicciones de todos los árboles del conjunto (normalmente haciendo una media).

Ventajas de ExtraTreesRegressor:

Reducción del sobreajuste: El uso de divisiones aleatorias de nodos y submuestreo de datos hace que el método sea menos propenso al sobreajuste en comparación con los árboles de decisión convencionales.
Alta paralelización: Como los árboles se construyen de forma independiente, ExtraTreesRegressor puede paralelizarse fácilmente para el entrenamiento en múltiples procesadores.
Resistencia a los valores atípicos: El método suele ser resistente a los valores atípicos de los datos.
Manejo de datos numéricos y categóricos: ExtraTreesRegressor puede manejar características numéricas y categóricas sin preprocesamiento adicional.

Limitaciones de ExtraTreesRegressor:

Puede ser necesario ajustar los hiperparámetros: Aunque ExtraTreesRegressor suele funcionar bien con los parámetros predeterminados, puede ser necesario ajustar los hiperparámetros para obtener el máximo rendimiento.
Menos interpretable: Al igual que otros métodos ensemble, ExtraTreesRegressor es menos interpretable en comparación con modelos más simples como la regresión lineal.

ExtraTreesRegressor puede ser un método beneficioso para la regresión en diversas tareas, especialmente cuando es necesario reducir el sobreajuste y mejorar la generalización del modelo.

2.2.5.1. Código para crear el modelo ExtraTreesRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.ensemble.ExtraTreesRegressor, lo entrena en datos sintéticos, guarda el modelo en formato ONNX y realiza predicciones utilizando datos de entrada tanto flotantes como dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# ExtraTreesRegressor.py
# The code demonstrates the process of training ExtraTreesRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "ExtraTreesRegressor"
onnx_model_filename = data_path + "extra_trees_regressor"

# create an Extra Trees Regressor model
regression_model = ExtraTreesRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  ExtraTreesRegressor Original model (double)
Python  R-squared (Coefficient of determination): 1.0
Python  Mean Absolute Error: 2.2302160118670144e-13
Python  Mean Squared Error: 8.41048471722451e-26
Python  
Python  ExtraTreesRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999999999998015
Python  Mean Absolute Error: 3.795239380975701e-05
Python  Mean Squared Error: 2.627067474763585e-09
Python  R^2 matching decimal places:  0
Python  MAE matching decimal places:  0
Python  MSE matching decimal places:  0
Python  float ONNX model precision:  0
Python  
Python  ExtraTreesRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_double.onnx

Pestaña Errores:

ExtraTreesRegressor.py started  ExtraTreesRegressor.py  1       1
Traceback (most recent call last):      ExtraTreesRegressor.py  1       1
    onnx_session = ort.InferenceSession(onnx_filename)  ExtraTreesRegressor.py  160     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_double.onnx failed:Type Erro      onnxruntime_inference_collection.py     424     1
ExtraTreesRegressor.py finished in 4654 ms              5       1

Fig.112. Resultados del ExtraTreesRegressor.py (float ONNX)

2.2.5.2. Código MQL5 para ejecutar modelos ONNX

Este código crea los modelos extra_trees_regressor_float.onnx y extra_trees_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                          ExtraTreesRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "ExtraTreesRegressor"
#define   ONNXFilenameFloat  "extra_trees_regressor_float.onnx"
#define   ONNXFilenameDouble "extra_trees_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

ExtraTreesRegressor (EURUSD,H1) Testing ONNX float: ExtraTreesRegressor (extra_trees_regressor_float.onnx)
ExtraTreesRegressor (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.9999999999998015
ExtraTreesRegressor (EURUSD,H1) MQL5:   Mean Absolute Error: 0.0000379523938098
ExtraTreesRegressor (EURUSD,H1) MQL5:   Mean Squared Error: 0.0000000026270675
ExtraTreesRegressor (EURUSD,H1) 
ExtraTreesRegressor (EURUSD,H1) Testing ONNX double: ExtraTreesRegressor (extra_trees_regressor_double.onnx)
ExtraTreesRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\ExtraTreesRegressor.mq5' (133:16)
ExtraTreesRegressor (EURUSD,H1) model_name=ExtraTreesRegressor OnnxCreate error 5800

El modelo ONNX en flotante se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo ONNX en doble.

2.2.5.3. Representación ONNX de extra_trees_regressor_float.onnx y extra_trees_regressor_double.onnx

Fig.113. Representación ONNX del extra_trees_regressor_float.onnx en Netron

Fig.114. Representación ONNX del extra_trees_regressor_double.onnx en Netron

2.2.6. sklearn.svm.NuSVR

NuSVR es un método de aprendizaje automático utilizado para tareas de regresión. Este método se basa en SVM (Support Vector Machine / Máquinas de vectores de soporte) pero se aplica a tareas de regresión en lugar de a tareas de clasificación.

NuSVR es una variación de SVM diseñada para resolver tareas de regresión mediante la predicción de valores continuos de la variable objetivo.

Cómo funciona NuSVR:

Datos de entrada: Se parte de un conjunto de datos que incluye características (variables independientes) y valores de la variable objetivo (continua).
Selección del núcleo: NuSVR utiliza kernels como el lineal, el polinómico o la función de base radial (RBF) para transformar los datos en un espacio de mayor dimensión donde se puede encontrar un hiperplano de separación lineal.
Definición del parámetro Nu: El parámetro Nu controla la complejidad del modelo y define cuántos ejemplos de entrenamiento se considerarán atípicos. El valor Nu debe oscilar entre 0 y 1, lo que influye en el número de vectores de soporte.
Construcción de vectores de soporte: NuSVR tiene como objetivo encontrar un hiperplano de separación óptimo que maximice la brecha entre este hiperplano y los puntos de muestra más cercanos.
Entrenamiento del modelo: El modelo se entrena para minimizar el error de regresión y cumplir las restricciones asociadas al parámetro Nu.
Hacer predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir los valores de la variable objetivo sobre nuevos datos.

Ventajas de NuSVR:

Gestión de valores atípicos: NuSVR permite controlar los valores atípicos mediante el parámetro Nu, regulando el número de ejemplos de entrenamiento considerados como atípicos.
Múltiples núcleos: El método admite varios tipos de núcleos, lo que permite modelar relaciones no lineales complejas.

Limitaciones de NuSVR:

Selección del parámetro Nu: La elección del valor correcto del parámetro Nu puede requerir cierta experimentación.
Sensibilidad a la escala de datos: SVM, incluyendo NuSVR, puede ser sensible a la escala de datos, por lo que podría ser necesaria la estandarización o normalización de características.
Complejidad computacional: Para grandes conjuntos de datos y núcleos complejos, NuSVR puede resultar caro desde el punto de vista computacional.

NuSVR es un método de aprendizaje automático para tareas de regresión basado en el método Support Vector Machine (SVM). Permite la predicción de valores continuos de la variable objetivo y proporciona la capacidad de gestionar valores atípicos mediante el parámetro Nu.

2.2.6.1. Código para crear el modelo NuSVR y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.svm.NuSVR, lo entrena en datos sintéticos, guarda el modelo en formato ONNX y realiza predicciones utilizando datos de entrada tanto flotantes como dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# NuSVR.py
# The code demonstrates the process of training NuSVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import NuSVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "NuSVR"
onnx_model_filename = data_path + "nu_svr"

# create a NuSVR model
nusvr_model = NuSVR()

# fit the model to the data
nusvr_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = nusvr_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(nusvr_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(nusvr_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  NuSVR Original model (double)
Python  R-squared (Coefficient of determination): 0.2771437770527445
Python  Mean Absolute Error: 83.76666411704255
Python  Mean Squared Error: 9565.381751764757
Python  
Python  NuSVR ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\nu_svr_float.onnx
Python  Information about input tensors in ONNX:
1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.27714379657935495
Python  Mean Absolute Error: 83.766663385322
Python  Mean Squared Error: 9565.381493373838
Python  R^2 matching decimal places:  7
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  5
Python  
Python  NuSVR ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\nu_svr_double.onnx

Pestaña Errores:

NuSVR.py started        NuSVR.py        1       1
Traceback (most recent call last):      NuSVR.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  NuSVR.py        159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess.initialize_session(providers, provider_options, disabled_optimizers)   onnxruntime_inference_collection.py     435     1
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for SVMRegressor(1) node with name 'SVM'        onnxruntime_inference_collection.py     435     1
NuSVR.py finished in 2925 ms            5       1

Fig.115. Resultados del NuSVR.py (float ONNX)

2.2.6.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados nu_svr_float.onnx y nu_svr_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                        NuSVR.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "NuSVR"
#define   ONNXFilenameFloat  "nu_svr_float.onnx"
#define   ONNXFilenameDouble "nu_svr_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

NuSVR (EURUSD,H1)       Testing ONNX float: NuSVR (nu_svr_float.onnx)
NuSVR (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.2771437965793548
NuSVR (EURUSD,H1)       MQL5:   Mean Absolute Error: 83.7666633853219906
NuSVR (EURUSD,H1)       MQL5:   Mean Squared Error: 9565.3814933738358377
NuSVR (EURUSD,H1)       
NuSVR (EURUSD,H1)       Testing ONNX double: NuSVR (nu_svr_double.onnx)
NuSVR (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 9 'Could not find an implementation for SVMRegressor(1) node with name 'SVM''), inspect code 'Scripts\Regression\NuSVR.mq5' (133:16)
NuSVR (EURUSD,H1)       model_name=NuSVR OnnxCreate error 5800

El modelo ONNX en flotante se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo ONNX en doble.

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: NuSVR (nu_svr_float.onnx)
Python  Mean Absolute Error: 83.76666411704255
MQL5:   Mean Absolute Error: 83.7666633853219906

2.2.6.3. Representación ONNX de nu_svr_float.onnx y nu_svr_double.onnx

Fig.116. Representación ONNX del nu_svr_float.onnx en Netron

Fig.117. Representación ONNX del nu_svr_double.onnx en Netron

2.2.7. sklearn.ensemble.RandomForestRegressor

RandomForestRegressor es un método de aprendizaje automático utilizado para resolver tareas de regresión.

Es uno de los métodos más populares basados en el aprendizaje por conjuntos y emplea el algoritmo "Random Forest" (Bosques aleatorios) para crear modelos de regresión potentes y robustos.

Así funciona RandomForestRegressor:

Datos de entrada: Comienza con un conjunto de datos que incluye características (variables independientes) y una variable objetivo (continua).
Random Forest: RandomForestRegressor utiliza un conjunto de árboles de decisión para resolver la tarea de regresión. Cada árbol del bosque trabaja en la predicción de los valores de la variable objetivo.
Muestreo bootstrap: Cada árbol se entrena utilizando muestras bootstrap, lo que significa un muestreo aleatorio con reemplazo del conjunto de datos de entrenamiento. Esto permite diversificar los datos de los que aprende cada árbol.
Selección aleatoria de características: Al construir cada árbol, también se selecciona un subconjunto aleatorio de características, lo que hace que el modelo sea más robusto y reduce las correlaciones entre árboles.
Promedio de predicciones: Una vez construidos todos los árboles, RandomForestRegressor promedia o combina sus predicciones para obtener la predicción de regresión final.

Ventajas de RandomForestRegressor:

Potencia y robustez: RandomForestRegressor es un potente método de regresión que suele ofrecer un buen rendimiento.
Manejo de grandes datos: Maneja bien grandes conjuntos de datos y puede manejar multitud de características.
Resistencia al sobreajuste: Gracias al muestreo "bootstrap" y a la selección aleatoria de características, el bosque aleatorio suele ser resistente al sobreajuste.
Estimación de la importancia de las características: Random Forest puede proporcionar información sobre la importancia de cada característica en la tarea de regresión.

Limitaciones de RandomForestRegressor:

Falta de interpretabilidad: El modelo podría ser menos interpretable en comparación con los modelos lineales.
No siempre es el modelo más preciso: En algunas tareas, los conjuntos más complejos pueden ser innecesarios, y los modelos lineales podrían ser más adecuados.

RandomForestRegressor es un potente método de aprendizaje automático para tareas de regresión que utiliza un conjunto de árboles de decisión aleatorios para crear un modelo de regresión estable y de alto rendimiento. Este método es especialmente útil para tareas con grandes conjuntos de datos y para evaluar la importancia de las características.

2.2.7.1. Código para crear el modelo RandomForestRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.ensemble.RandomForestRegressor, lo entrena en datos sintéticos, guarda el modelo en el formato ONNX y realiza predicciones utilizando datos de entrada flotantes y dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# RandomForestRegressor.py
# The code demonstrates the process of training RandomForestRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "RandomForestRegressor"
onnx_model_filename = data_path + "random_forest_regressor"

# create a RandomForestRegressor model
regression_model = RandomForestRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  RandomForestRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9998854509605539
Python  Mean Absolute Error: 0.9186485980852603
Python  Mean Squared Error: 1.5157997632401086
Python  
Python  RandomForestRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9998854516013125
Python  Mean Absolute Error: 0.9186420704511761
Python  Mean Squared Error: 1.515791284236419
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  5
Python  
Python  RandomForestRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_double.onnx

Pestaña Errores:

RandomForestRegressor.py started        RandomForestRegressor.py        1       1
Traceback (most recent call last):      RandomForestRegressor.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  RandomForestRegressor.py        159     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     383     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     424     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_double.onnx failed:Type Er      onnxruntime_inference_collection.py     424     1
RandomForestRegressor.py finished in 4392 ms            5       1

Fig.118. Resultados de RandomForestRegressor.py (float ONNX)

2.2.7.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados random_forest_regressor_float.onnx y random_forest_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                        RandomForestRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "RandomForestRegressor"
#define   ONNXFilenameFloat  "random_forest_regressor_float.onnx"
#define   ONNXFilenameDouble "random_forest_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

RandomForestRegressor (EURUSD,H1)       
RandomForestRegressor (EURUSD,H1)       Testing ONNX float: RandomForestRegressor (random_forest_regressor_float.onnx)
RandomForestRegressor (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9998854516013125
RandomForestRegressor (EURUSD,H1)       MQL5:   Mean Absolute Error: 0.9186420704511761
RandomForestRegressor (EURUSD,H1)       MQL5:   Mean Squared Error: 1.5157912842364190
RandomForestRegressor (EURUSD,H1)       
RandomForestRegressor (EURUSD,H1)       Testing ONNX double: RandomForestRegressor (random_forest_regressor_double.onnx)
RandomForestRegressor (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\RandomForestRegressor.mq5' (133:16)
RandomForestRegressor (EURUSD,H1)       model_name=RandomForestRegressor OnnxCreate error 5800

El modelo ONNX flotante se ejecutó normalmente, pero se produjo un error al ejecutar el modelo ONNX en doble..

2.2.7.3. Representación ONNX de random_forest_regressor_float.onnx y random_forest_regressor_double.onnx

Fig.119. Representación ONNX del random_forest_regressor_float.onnx en Netron

Fig.120. Representación ONNX de random_forest_regressor_double.onnx en Netron

2.2.8. sklearn.ensemble.GradientBoostingRegressor

GradientBoostingRegressor es un método de aprendizaje automático utilizado para tareas de regresión. Forma parte de la familia de los métodos ensemble y se basa en la idea de construir modelos débiles y combinarlos en un modelo fuerte mediante el refuerzo de gradiente..

El refuerzo de gradiente es una técnica para mejorar los modelos añadiendo iterativamente modelos débiles y corrigiendo los errores de los modelos anteriores.

Así funciona GradientBoostingRegressor:

Inicialización: Comienza con el conjunto de datos original que contiene características (variables independientes) y sus correspondientes valores objetivo.
Primer modelo: Comienza entrenando el primer modelo, a menudo elegido como un modelo de regresión simple (por ejemplo, árbol de decisión) sobre los datos originales.
Residuales y Anti-Gradiente: Se calculan los residuos, la diferencia entre los valores predichos del primer modelo y los valores reales de la variable objetivo. A continuación, se calcula el antigradiente de esta función de pérdida, que indica la dirección para mejorar el modelo.
Construcción del siguiente modelo: Se construye el siguiente modelo, centrado en la predicción del antigradiente (errores del primer modelo). Este modelo se entrena con los residuos y se añade al primer modelo.
Iteraciones: El proceso de construcción de nuevos modelos y de corrección de los residuos se repite varias veces. Cada nuevo modelo tiene en cuenta los residuos de los modelos anteriores y pretende mejorar las predicciones.
Combinación de modelos: Las predicciones de todos los modelos se combinan en la predicción final promediándolas o ponderándolas según su importancia.

Ventajas de GradientBoostingRegressor:

Alto rendimiento: Gradient boosting es un potente método capaz de alcanzar un alto rendimiento en tareas de regresión.
Robustez frente a valores atípicos: Maneja los valores atípicos en los datos y construye modelos considerando esta incertidumbre.
Selección automática de características: Selecciona automáticamente las características más importantes para predecir la variable objetivo.
Manejo de varias funciones de pérdida: El método permite el uso de diferentes funciones de pérdida dependiendo de la tarea.

Limitaciones de GradientBoostingRegressor:

Es necesario ajustar los hiperparámetros: Para lograr el máximo rendimiento es necesario ajustar hiperparámetros como la tasa de aprendizaje, la profundidad del árbol y el número de modelos.
Cómputo costoso: El refuerzo de gradiente puede ser costoso desde el punto de vista informático, especialmente con grandes volúmenes de datos y un elevado número de árboles.

GradientBoostingRegressor es un potente método de regresión utilizado a menudo en tareas prácticas para lograr un alto rendimiento con el ajuste correcto de los hiperparámetros.

2.2.8.1. Código para crear el modelo GradientBoostingRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.ensemble.GradientBoostingRegressor, lo entrena en datos sintéticos, guarda el modelo en formato ONNX y realiza predicciones utilizando datos de entrada tanto flotantes como dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# GradientBoostingRegressor.py
# The code demonstrates the process of training GradientBoostingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "GradientBoostingRegressor"
onnx_model_filename = data_path + "gradient_boosting_regressor"

# create a Gradient Boosting Regressor model
regression_model = GradientBoostingRegressor()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  GradientBoostingRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9999959514652565
Python  Mean Absolute Error: 0.15069342754017417
Python  Mean Squared Error: 0.053573282108575676
Python  
Python  GradientBoostingRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9999959514739537
Python  Mean Absolute Error: 0.15069457426101718
Python  Mean Squared Error: 0.05357316702127665
Python  R^2 matching decimal places:  10
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  6
Python  float ONNX model precision:  5
Python  
Python  GradientBoostingRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_double.onnx

Pestaña Errores:

GradientBoostingRegressor.py started    GradientBoostingRegressor.py    1       1
Traceback (most recent call last):      GradientBoostingRegressor.py    1       1
    onnx_session = ort.InferenceSession(onnx_filename)  GradientBoostingRegressor.py    161     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     419     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     452     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_double.onnx failed:Typ      onnxruntime_inference_collection.py     452     1
GradientBoostingRegressor.py finished in 3073 ms                5       1

Fig.121. Resultados del GradientBoostingRegressor.py (float ONNX)

2.2.8.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos gradient_boosting_regressor_float.onnx y gradient_boosting_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                    GradientBoostingRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "GradientBoostingRegressor"
#define   ONNXFilenameFloat  "gradient_boosting_regressor_float.onnx"
#define   ONNXFilenameDouble "gradient_boosting_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

GradientBoostingRegressor (EURUSD,H1)   Testing ONNX float: GradientBoostingRegressor (gradient_boosting_regressor_float.onnx)
GradientBoostingRegressor (EURUSD,H1)   MQL5:   R-Squared (Coefficient of determination): 0.9999959514739537
GradientBoostingRegressor (EURUSD,H1)   MQL5:   Mean Absolute Error: 0.1506945742610172
GradientBoostingRegressor (EURUSD,H1)   MQL5:   Mean Squared Error: 0.0535731670212767
GradientBoostingRegressor (EURUSD,H1)   
GradientBoostingRegressor (EURUSD,H1)   Testing ONNX double: GradientBoostingRegressor (gradient_boosting_regressor_double.onnx)
GradientBoostingRegressor (EURUSD,H1)   ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\GradientBoostingRegressor.mq5' (133:16)
GradientBoostingRegressor (EURUSD,H1)   model_name=GradientBoostingRegressor OnnxCreate error 5800

El modelo ONNX en flotante se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo ONNX en doble.

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: GradientBoostingRegressor (gradient_boosting_regressor_float.onnx)
Python  Mean Absolute Error: 0.15069342754017417
MQL5:   Mean Absolute Error: 0.1506945742610172

Precisión del MAE flotante de ONNX: 5 decimales.

2.2.8.3. Representación ONNX de gradient_boosting_regressor_float.onnx y gradient_boosting_regressor_double.onnx

Fig.122. Representación ONNX del gradient_boosting_regressor_float.onnx en Netron

Fig.123. Representación ONNX del gradient_boosting_regressor_double.onnx en Netron

2.2.9. sklearn.ensemble.HistGradientBoostingRegressor

HistGradientBoostingRegressor es un método de aprendizaje automático que representa una variación del gradient boosting optimizada para trabajar con grandes conjuntos de datos.

Este método se utiliza para tareas de regresión, y su nombre "Hist" significa que emplea métodos basados en histogramas para agilizar el proceso de entrenamiento.

Cómo funciona HistGradientBoostingRegressor:

Inicialización: Comienza con el conjunto de datos original que contiene características (variables independientes) y sus correspondientes valores objetivo.
Métodos basados en histogramas: En lugar de la división exacta de los datos en los nodos del árbol, HistGradientBoostingRegressor utiliza métodos basados en histogramas para representar eficientemente los datos en forma de histogramas. Esto acelera significativamente el proceso de formación, especialmente en grandes conjuntos de datos.
Construcción de árboles base: El método construye un conjunto de árboles de decisión base denominados "árboles de decisión de histograma" utilizando las representaciones de histograma de los datos. Estos árboles se construyen basándose en el gradient boosting y se ajustan a los residuos del modelo anterior.
Entrenamiento gradual: HistGradientBoostingRegressor añade gradualmente nuevos árboles al conjunto, y cada árbol corrige los residuos de los árboles anteriores.
Combinación de modelos: Tras construir los árboles base, se combinan las predicciones de todos los árboles para obtener la predicción final.

Ventajas de HistGradientBoostingRegressor:

Alto rendimiento: Este método está optimizado para manejar grandes volúmenes de datos y puede alcanzar un alto rendimiento.
Robustez frente al ruido: HistGradientBoostingRegressor suele funcionar bien incluso en presencia de ruido en los datos.
Eficacia de alta dimensión: El método puede manejar tareas con un elevado número de características (datos de alta dimensión).
Excelente paralelización: Puede paralelizar eficazmente la formación en varios procesadores.

Limitaciones de HistGradientBoostingRegressor:

Requiere el ajuste de hiperparámetros: Para lograr el máximo rendimiento es necesario ajustar hiperparámetros como la profundidad del árbol y el número de modelos.
Menos interpretable que los modelos lineales: Al igual que otros métodos ensemble, HistGradientBoostingRegressor es menos interpretable que modelos más simples como la regresión lineal.

HistGradientBoostingRegressor puede ser un método de regresión útil para tareas que implican grandes conjuntos de datos en los que el alto rendimiento y la eficiencia de datos de alta dimensión son esenciales.

2.2.9.1. Código para crear el modelo HistGradientBoostingRegressor y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.ensemble.HistGradientBoostingRegressor, lo entrena en datos sintéticos, guarda el modelo en formato ONNX y realiza predicciones utilizando datos de entrada tanto flotantes como dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# HistGradientBoostingRegressor.py
# The code demonstrates the process of training HistGradientBoostingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "HistGradientBoostingRegressor"
onnx_model_filename = data_path + "hist_gradient_boosting_regressor"

# create a Histogram-Based Gradient Boosting Regressor model
hist_gradient_boosting_model = HistGradientBoostingRegressor()

# fit the model to the data
hist_gradient_boosting_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = hist_gradient_boosting_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(hist_gradient_boosting_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(hist_gradient_boosting_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  HistGradientBoostingRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.9833421349506157
Python  Mean Absolute Error: 9.070567104488434
Python  Mean Squared Error: 220.4295035561544
Python  
Python  HistGradientBoostingRegressor ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.9833421351962779
Python  Mean Absolute Error: 9.07056497799043
Python  Mean Squared Error: 220.42950030536645
Python  R^2 matching decimal places:  8
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  5
Python  float ONNX model precision:  5
Python  
Python  HistGradientBoostingRegressor ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_double.onnx

Pestaña Errores:

HistGradientBoostingRegressor.py started        HistGradientBoostingRegressor.py        1       1
Traceback (most recent call last):      HistGradientBoostingRegressor.py        1       1
    onnx_session = ort.InferenceSession(onnx_filename)  HistGradientBoostingRegressor.py        161     1
    self._create_inference_session(providers, provider_options, disabled_optimizers)    onnxruntime_inference_collection.py     419     1
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)    onnxruntime_inference_collection.py     452     1
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_double.onnx faile      onnxruntime_inference_collection.py     452     1
HistGradientBoostingRegressor.py finished in 3100 ms            5       1

Fig.124. Resultados del HistGradientBoostingRegressor.py (float ONNX)

2.2.9.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados hist_gradient_boosting_regressor_float.onnx y hist_gradient_boosting_regressor_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                HistGradientBoostingRegressor.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "HistGradientBoostingRegressor"
#define   ONNXFilenameFloat  "hist_gradient_boosting_regressor_float.onnx"
#define   ONNXFilenameDouble "hist_gradient_boosting_regressor_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

HistGradientBoostingRegressor (EURUSD,H1)       Testing ONNX float: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_float.onnx)
HistGradientBoostingRegressor (EURUSD,H1)       MQL5:   R-Squared (Coefficient of determination): 0.9833421351962779
HistGradientBoostingRegressor (EURUSD,H1)       MQL5:   Mean Absolute Error: 9.0705649779904292
HistGradientBoostingRegressor (EURUSD,H1)       MQL5:   Mean Squared Error: 220.4295003053665312
HistGradientBoostingRegressor (EURUSD,H1)       
HistGradientBoostingRegressor (EURUSD,H1)       Testing ONNX double: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_double.onnx)
HistGradientBoostingRegressor (EURUSD,H1)       ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\HistGradientBoostingRegressor.mq5' (133:16)
HistGradientBoostingRegressor (EURUSD,H1)       model_name=HistGradientBoostingRegressor OnnxCreate error 5800

El modelo ONNX en flotante se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo ONNX en doble.

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_float.onnx)
Python  Mean Absolute Error: 9.070567104488434
MQL5:   Mean Absolute Error: 9.0705649779904292

Precisión de ONNX float MAE: 5 decimales

2.2.9.3. Representación ONNX de hist_gradient_boosting_regressor_float.onnx e hist_gradient_boosting_regressor_double.onnx

Fig.125. Representación ONNX del hist_gradient_boosting_regressor_float.onnx en Netron

Fig.126. Representación ONNX del hist_gradient_boosting_regressor_double.onnx en Netron

2.2.10. sklearn.svm.SVR

SVR (Support Vector Regression) es un método de aprendizaje automático utilizado para tareas de regresión. Se basa en el mismo concepto que SVM (Support Vector Machine / Máquinas de vectores de soporte) para la clasificación, pero está adaptada para la regresión. El objetivo principal de SVR es predecir valores continuos de la variable objetivo basándose en la distancia media máxima entre los puntos de datos y la línea de regresión.

Cómo funciona el SVR:

Definición de límites: Similar a SVM, SVR construye límites que separan diferentes clases de puntos de datos. En lugar de la separación de clases, SVR pretende construir un "tubo" alrededor de los puntos de datos, donde la anchura del tubo está controlada por un hiperparámetro.
Variable objetivo y función de pérdida: En lugar de utilizar clases como en la clasificación, SVR trata con valores continuos de la variable objetivo. Minimiza el error de predicción medido mediante una función de pérdida, como la diferencia al cuadrado entre los valores predichos y los reales.
Regularización: SVR también admite la regularización, lo que ayuda a controlar la complejidad del modelo y evitar el sobreajuste.
Funciones de núcleo: SVR suele emplear funciones de núcleo que le permiten manejar dependencias no lineales entre las características y la variable objetivo. Las funciones kernel más populares son la función de base radial (RBF), la polinómica y la lineal.

Ventajas de la SVR:

Robustez frente a valores atípicos: SVR puede manejar valores atípicos en los datos, ya que su objetivo es minimizar el error de predicción.
Admite dependencias no lineales: El uso de funciones de núcleo permite a SVR modelar dependencias complejas y no lineales entre las características y la variable objetivo.
Alta calidad de predicción: En tareas de regresión que requieren predicciones precisas, SVR puede proporcionar resultados de alta calidad.

Limitaciones de la RVS:

Sensibilidad a los hiperparámetros: La elección de la función del núcleo y de los parámetros del modelo, como la anchura del tubo (hiperparámetros), puede requerir un ajuste y una optimización cuidadosos.
Complejidad computacional: El entrenamiento del modelo SVR, especialmente cuando se utilizan funciones de kernel complejas y grandes conjuntos de datos, puede ser intensivo desde el punto de vista computacional.

SVR es un método de aprendizaje automático para tareas de regresión basado en la idea de construir un "tubo" alrededor de los puntos de datos para minimizar los errores de predicción. Es resistente a los valores atípicos y capaz de manejar dependencias no lineales, por lo que resulta útil en diversas tareas de regresión.

2.2.10.1. Código para crear el modelo SVR y exportarlo a ONNX para float y double

Este código crea el modelo sklearn.svm.SVR, lo entrena en datos sintéticos, guarda el modelo en formato ONNX y realiza predicciones utilizando datos de entrada tanto flotantes como dobles. También evalúa la precisión tanto del modelo original como de los modelos exportados a ONNX.

# SVR.py
# The code demonstrates the process of training SVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "SVR"
onnx_model_filename = data_path + "svr"

# create an SVR model
regression_model = SVR()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  SVR Original model (double)
Python  R-squared (Coefficient of determination): 0.398243655775797
Python  Mean Absolute Error: 73.63683696034649
Python  Mean Squared Error: 7962.89631509593
Python  
Python  SVR ONNX model (float)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\svr_float.onnx
Python  Information about input tensors in ONNX:
Python  1. Name: float_input, Data Type: tensor(float), Shape: [None, 1]
Python  Information about output tensors in ONNX:
Python  1. Name: variable, Data Type: tensor(float), Shape: [None, 1]
Python  R-squared (Coefficient of determination) 0.3982436352100983
Python  Mean Absolute Error: 73.63683840363255
Python  Mean Squared Error: 7962.896587236852
Python  R^2 matching decimal places:  7
Python  MAE matching decimal places:  5
Python  MSE matching decimal places:  3
Python  float ONNX model precision:  5
Python  
Python  SVR ONNX model (double)
Python  ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\svr_double.onnx

Fig.127. Resultados del SVR.py (float ONNX)

2.2.10.2. Código MQL5 para ejecutar modelos ONNX

Este código ejecuta los modelos guardados svr_float.onnx y svr_double.onnx y demuestra el uso de métricas de regresión en MQL5.

//+------------------------------------------------------------------+
//|                                                          SVR.mq5 |
//|                                  Copyright 2023, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2023, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"

#define   ModelName          "SVR"
#define   ONNXFilenameFloat  "svr_float.onnx"
#define   ONNXFilenameDouble "svr_double.onnx"

#resource ONNXFilenameFloat  as const uchar ExtModelFloat[];
#resource ONNXFilenameDouble as const uchar ExtModelDouble[];

#define   TestFloatModel  1
#define   TestDoubleModel 2

//+------------------------------------------------------------------+
//| Calculate regression using float values                          |
//+------------------------------------------------------------------+
bool RunModelFloat(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   float input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=(float)input_vector[k];
//--- prepare output tensor
   float output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }
//+------------------------------------------------------------------+
//| Calculate regression using double values                         |
//+------------------------------------------------------------------+
bool RunModelDouble(long model,vector &input_vector, vector &output_vector)
  {
//--- check number of input samples
   ulong batch_size=input_vector.Size();
   if(batch_size==0)
      return(false);
//--- prepare output array
   output_vector.Resize((int)batch_size);
//--- prepare input tensor
   double input_data[];
   ArrayResize(input_data,(int)batch_size);
//--- set input shape
   ulong input_shape[]= {batch_size, 1};
   OnnxSetInputShape(model,0,input_shape);
//--- copy data to the input tensor
   for(int k=0; k<(int)batch_size; k++)
      input_data[k]=input_vector[k];
//--- prepare output tensor
   double output_data[];
   ArrayResize(output_data,(int)batch_size);
//--- set output shape
   ulong output_shape[]= {batch_size,1};
   OnnxSetOutputShape(model,0,output_shape);
//--- run the model
   bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data);
//--- copy output to vector
   if(res)
     {
      for(int k=0; k<(int)batch_size; k++)
         output_vector[k]=output_data[k];
     }
//---
   return(res);
  }

//+------------------------------------------------------------------+
//| Generate synthetic data                                          |
//+------------------------------------------------------------------+
bool GenerateData(const int n,vector &x,vector &y)
  {
   if(n<=0)
      return(false);
//--- prepare arrays
   x.Resize(n);
   y.Resize(n);
//---
   for(int i=0; i<n; i++)
     {
      x[i]=(double)1.0*i;
      y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5));
     }
//---
   return(true);
  }

//+------------------------------------------------------------------+
//| TestRegressionModel                                              |
//+------------------------------------------------------------------+
bool TestRegressionModel(const string model_name,const int model_type)
  {
//---
   long  model=INVALID_HANDLE;
   ulong flags=ONNX_DEFAULT;

   if(model_type==TestFloatModel)
     {
      PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat);
      model=OnnxCreateFromBuffer(ExtModelFloat,flags);
     }
   else
      if(model_type==TestDoubleModel)
        {
         PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble);
         model=OnnxCreateFromBuffer(ExtModelDouble,flags);
        }
      else
        {
         PrintFormat("Model type is not incorrect.");
         return(false);
        }
//--- check
   if(model==INVALID_HANDLE)
     {
      PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError());
      return(false);
     }
//---
   vector x_values= {};
   vector y_true= {};
   vector y_predicted= {};
//---
   int n=100;
   GenerateData(n,x_values,y_true);
//---
   bool run_result=false;
   if(model_type==TestFloatModel)
     {
      run_result=RunModelFloat(model,x_values,y_predicted);
     }
   else
      if(model_type==TestDoubleModel)
        {
         run_result=RunModelDouble(model,x_values,y_predicted);
        }
//---
   if(run_result)
     {
      PrintFormat("MQL5:   R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2));
      PrintFormat("MQL5:   Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE));
      PrintFormat("MQL5:   Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE));
     }
   else
      PrintFormat("Error %d",GetLastError());
//--- release model
   OnnxRelease(model);
//---
   return(true);
  }
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
int OnStart(void)
  {
//--- test ONNX regression model for float
   TestRegressionModel(ModelName,TestFloatModel);
//--- test ONNX regression model for double
   TestRegressionModel(ModelName,TestDoubleModel);
//---
   return(0);
  }
//+------------------------------------------------------------------+

Salida:

SVR (EURUSD,H1) Testing ONNX float: SVR (svr_float.onnx)
SVR (EURUSD,H1) MQL5:   R-Squared (Coefficient of determination): 0.3982436352100981
SVR (EURUSD,H1) MQL5:   Mean Absolute Error: 73.6368384036325523
SVR (EURUSD,H1) MQL5:   Mean Squared Error: 7962.8965872368517012
SVR (EURUSD,H1) 
SVR (EURUSD,H1) Testing ONNX double: SVR (svr_double.onnx)
SVR (EURUSD,H1) ONNX: cannot create session (OrtStatus: 9 'Could not find an implementation for SVMRegressor(1) node with name 'SVM''), inspect code 'Scripts\R\SVR.mq5' (133:16)
SVR (EURUSD,H1) model_name=SVR OnnxCreate error 5800

El modelo ONNX en flotante se ejecutó con normalidad, pero se produjo un error al ejecutar el modelo ONNX en doble.

Comparación con el modelo original de doble precisión en Python:

Testing ONNX float: SVR (svr_float.onnx)
Python  Mean Absolute Error: 73.63683696034649
MQL5:   Mean Absolute Error: 73.6368384036325523

Precisión de ONNX float MAE: 5 decimales

2.2.10.3. Representación ONNX de svr_float.onnx y svr_double.onnx

Fig.128. Representación ONNX de svr_float.onnx en Netron

Fig.129. Representación ONNX del svr_double.onnx en Netron

2.3. Modelos de regresión que encontraron problemas al convertirse a ONNX

Algunos modelos de regresión no pudieron ser convertidos al formato ONNX por el conversor sklearn-onnx.

2.3.1. sklearn.dummy.DummyRegressor

El DummyRegressor es un método de aprendizaje automático utilizado en tareas de regresión para crear un modelo base que prediga la variable objetivo utilizando reglas sencillas. Es valioso para compararlo con otros modelos más complejos y evaluar su rendimiento. Este método se utiliza a menudo en el contexto de la evaluación de la calidad de otros modelos de regresión.

El DummyRegressor ofrece varias estrategias de predicción:

"mean" (por defecto): DummyRegressor predice el valor medio de la variable objetivo a partir del conjunto de datos de entrenamiento. Esta estrategia es útil para determinar cuánto mejor es otro modelo en comparación con la simple predicción de la media.
"median": DummyRegressor predice el valor mediano de la variable objetivo a partir del conjunto de datos de entrenamiento.
"quantile": DummyRegressor predice el valor del cuantil de la variable objetivo (especificado por el parámetro cuantil) a partir del conjunto de datos de entrenamiento.
"constant": DummyRegressor predice un valor constante establecido por el usuario (mediante el parámetro de estrategia).

Ventajas de DummyRegressor:

Evaluación del rendimiento: DummyRegressor es útil para evaluar el rendimiento de otros modelos más complejos. Si su modelo no puede superar las predicciones realizadas por DummyRegressor, podría indicar problemas en el modelo.
Comparación con modelos de referencia: DummyRegressor permite comparar el rendimiento de modelos más complejos con un valor de referencia (por ejemplo, el valor medio o la mediana).
Fácil de usar: DummyRegressor es fácil de implementar y utilizar para análisis comparativos.

Limitaciones de DummyRegressor:

No apto para predicciones precisas: DummyRegressor sólo proporciona predicciones básicas de referencia y no está pensado para realizar predicciones precisas.
Ignora dependencias complejas: DummyRegressor ignora las estructuras de datos complejas y las dependencias de características.
No es adecuado para tareas que requieren una predicción precisa: En tareas de predicción del mundo real, utilizar DummyRegressor para predecir la variable objetivo es insuficiente.

DummyRegressor es una herramienta valiosa para evaluar y comparar rápidamente el rendimiento de otros modelos de regresión, pero no es un modelo de regresión serio e independiente.

2.3.1.1. Código para crear el modelo DummyRegressor

# DummyRegressor.py
# The code demonstrates the process of training DummyRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.dummy import DummyRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "DummyRegressor"
onnx_model_filename = data_path + "dummy_regressor"

# create an Dummy Regressor model
regression_model = DummyRegressor(strategy="mean")

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  DummyRegressor Original model (double)
Python  R-squared (Coefficient of determination): 0.0
Python  Mean Absolute Error: 100.00329851715793
Python  Mean Squared Error: 13232.758393867645

Pestaña Errores:

DummyRegressor.py started       DummyRegressor.py       1       1
Traceback (most recent call last):      DummyRegressor.py       1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     DummyRegressor.py       87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.dummy.DummyRegressor'>'. _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
DummyRegressor.py finished in 2565 ms           19      1

2.3.2. sklearn.kernel_ridge.KernelRidge

KernelRidge es un método de aprendizaje automático utilizado para tareas de regresión. Combina el método del núcleo de las máquinas de vectores de soporte (Kernel SVM), y la regresión. KernelRidge permite modelar relaciones complejas y no lineales entre las características y la variable objetivo utilizando funciones de kernel.

Principio de funcionamiento de KernelRidge:

Datos de entrada: Comienza con el conjunto de datos original que contiene características (variables independientes) y sus correspondientes valores de variable objetivo.
Funciones de núcleo: KernelRidge utiliza funciones kernel (como polinomial, RBF - Radial Basis Function / Función de base radial, y otras) que transforman los datos en un espacio de alta dimensión, permitiendo el modelado de relaciones no lineales más complejas.
Entrenamiento del modelo: El modelo se entrena con los datos minimizando el error cuadrático medio entre los valores predichos y los valores reales de la variable objetivo. Las funciones del núcleo se utilizan para tener en cuenta las dependencias complejas.
Predicción: Tras el entrenamiento, el modelo puede utilizarse para predecir los valores de la variable objetivo para nuevos datos, utilizando las mismas funciones de núcleo.

Ventajas de KernelRidge:

Modelización de relaciones no lineales complejas: KernelRidge permite modelar dependencias complejas y no lineales entre las características y la variable objetivo.
Selección de diferentes núcleos: Puedes elegir distintos kernels en función de la naturaleza de los datos y de la tarea.
Regularización: El método incluye regularización, lo que ayuda a evitar el sobreajuste del modelo.

Limitaciones de KernelRidge:

Falta de interpretabilidad: Como muchos métodos no lineales, KernelRidge es menos interpretable que los modelos lineales.
Complejidad computacional: El uso de funciones kernel puede ser costoso computacionalmente con grandes volúmenes de datos y/o alta dimensionalidad.
Necesidad de afinar los parámetros: Elegir el núcleo y los parámetros del modelo adecuados requiere afinación y experiencia.

KernelRidge es útil en tareas de regresión en las que los datos presentan dependencias complejas y no lineales, y se requiere un modelo capaz de considerar estas relaciones. También es útil en tareas en las que se pueden utilizar funciones de núcleo para transformar los datos en una representación más informativa.

2.3.2.1. Código para crear el modelo KernelRidge

# KernelRidge.py
# The code demonstrates the process of training KernelRidge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.kernel_ridge import KernelRidge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "KernelRidge"
onnx_model_filename = data_path + "kernel_ridge"

# create an KernelRidge model
regression_model = KernelRidge(alpha=1.0, kernel='linear')

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  KernelRidge Original model (double)
Python  R-squared (Coefficient of determination): 0.9962137909675411
Python  Mean Absolute Error: 6.36977985227399
Python  Mean Squared Error: 50.10198935520715

Pestaña Errores:

KernelRidge.py started  KernelRidge.py  1       1
Traceback (most recent call last):      KernelRidge.py  1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     KernelRidge.py  87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.kernel_ridge.KernelRidge'>'.     _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
KernelRidge.py finished in 2516 ms              19      1

2.3.3. sklearn.isotonic.IsotonicRegression

IsotonicRegression - es un método de aprendizaje automático utilizado para tareas de regresión que modela una relación monotónica entre las características y la variable objetivo. En este contexto, "monotonicidad" significa que un aumento en el valor de una de las características conduce a un aumento o disminución en el valor de la variable objetivo, preservando la dirección del cambio.

Principio de funcionamiento de la regresión isotónica:

Datos de entrada: Comienza con el conjunto de datos original que contiene características (variables independientes) y sus correspondientes valores de variable objetivo.
Regresión monotónica: IsotonicRegression tiene como objetivo encontrar la mejor función monótona que describa la relación entre las características y la variable objetivo. Esta función puede ser lineal o no lineal, pero debe mantener la monotonicidad.
Entrenamiento del modelo: El modelo se entrena con los datos para determinar los parámetros de la función monótona. Durante el entrenamiento, el modelo intenta minimizar la suma de errores al cuadrado entre las predicciones y los valores reales de la variable objetivo.
Predicción: Tras el entrenamiento, el modelo puede utilizarse para predecir los valores de la variable objetivo para nuevos datos manteniendo la relación monotónica.

Ventajas de IsotonicRegression:

Modelado de relaciones monótonas: Este método es una opción ideal cuando los datos demuestran dependencias monótonas, y es importante mantener esta característica en el modelo.
Interpretabilidad: Los modelos monotónicos pueden ser más interpretables ya que permiten definir claramente la dirección de influencia de cada característica sobre la variable objetivo.

Limitaciones de la regresión isotónica:

No es adecuado para relaciones complejas no lineales: Este método se limita a modelar relaciones monótonas y, por lo tanto, no es adecuado para modelar dependencias no lineales complejas.
Ajuste de parámetros: Algunas implementaciones de IsotonicRegression pueden tener parámetros que requieren ajuste para lograr un rendimiento óptimo.

IsotonicRegression es útil en tareas en las que la monotonicidad de la relación entre las características y la variable objetivo se considera un factor importante, y existe la necesidad de construir un modelo que preserve esta característica.

2.3.3.1. Código para crear los modelos IsotonicRegression

# IsotonicRegression.py
# The code demonstrates the process of training IsotonicRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.isotonic import IsotonicRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "IsotonicRegression"
onnx_model_filename = data_path + "isotonic_regression"

# create an IsotonicRegression model
regression_model = IsotonicRegression()

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  IsotonicRegression Original model (double)
Python  R-squared (Coefficient of determination): 0.9999898125037958
Python  Mean Absolute Error: 0.20093409873424467
Python  Mean Squared Error: 0.13480867590911208

Pestaña Errores:

IsotonicRegression.py started   IsotonicRegression.py   1       1
Traceback (most recent call last):      IsotonicRegression.py   1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     IsotonicRegression.py   87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.isotonic.IsotonicRegression'>'.  _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
IsotonicRegression.py finished in 2499 ms               19      1

2.3.4. sklearn.cross_decomposition.PLSCanonical

PLSCanonical (Partial Least Squares Canonical / Mínimos cuadrados parciales canónicos) es un método de aprendizaje automático utilizado para resolver problemas de correlación canónica. Es una extensión del método de mínimos cuadrados parciales (PLS) y se aplica para analizar y modelizar relaciones entre dos conjuntos de variables.

Principio de funcionamiento de PLSCanonical:

Datos de entrada: Se parte de dos conjuntos de datos (X e Y), donde cada conjunto representa una colección de variables (características). Normalmente, X e Y contienen datos correlacionados, y la tarea consiste en encontrar combinaciones lineales de características que maximicen la correlación entre ellas.
Selección de combinaciones lineales: PLSCanonical encuentra combinaciones lineales (componentes) tanto en X como en Y para maximizar la correlación entre los componentes de los dos conjuntos de datos. Estos componentes se denominan variables canónicas.
Búsqueda de máxima correlación: El objetivo principal de PLSCanonical es encontrar variables canónicas que maximicen la correlación entre X e Y, destacando las relaciones más informativas entre los dos conjuntos de datos.
Entrenamiento del modelo: Una vez encontradas las variables canónicas, se pueden utilizar para crear un modelo que prediga los valores de Y en función de X.
Generación de predicciones: Tras el entrenamiento, el modelo puede utilizarse para predecir valores Y en nuevos datos utilizando los valores X correspondientes.

Ventajas de PLSCanonical:

Análisis de correlaciones: PLSCanonical permite analizar y modelizar correlaciones entre dos conjuntos de datos, lo que puede ser útil para comprender las relaciones entre variables.
Reducción de la dimensionalidad: El método también puede utilizarse para reducir la dimensionalidad de los datos, destacando los componentes más importantes.

Limitaciones de PLSCanonical:

Sensibilidad a la elección del número de componentes: La selección del número óptimo de variables canónicas puede requerir cierta experimentación.
Dependencia de la estructura de datos: Los resultados de PLSCanonical pueden depender en gran medida de la estructura de datos y de las correlaciones entre ellos.

PLSCanonical es un método de aprendizaje automático utilizado para analizar y modelizar correlaciones entre dos conjuntos de variables. Este método permite estudiar las relaciones entre los datos y puede ser útil para reducir la dimensionalidad de los datos y predecir valores basados en componentes correlacionados.

2.3.4.1. Código para crear el PLSCanonical

# PLSCanonical.py
# The code demonstrates the process of training PLSCanonical model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import PLSCanonical
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name = "PLSCanonical"
onnx_model_filename = data_path + "pls_canonical"

# create an PLSCanonical model
regression_model = PLSCanonical(n_components=1)

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  
Python  PLSCanonical Original model (double)
Python  R-squared (Coefficient of determination): 0.9962347199278333
Python  Mean Absolute Error: 6.3561407034365995
Python  Mean Squared Error: 49.82504148022689

Pestaña Errores:

PLSCanonical.py started PLSCanonical.py 1       1
Traceback (most recent call last):      PLSCanonical.py 1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     PLSCanonical.py 87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.cross_decomposition._pls.PLSCanonical'>'.        _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
PLSCanonical.py finished in 2513 ms             19      1

2.3.5. sklearn.cross_decomposition.CCA

Canonical Correlation Analysis / Análisis de Correlación Canónica (CCA)es un método de análisis estadístico multivariante utilizado para estudiar las relaciones entre dos conjuntos de variables (conjunto X y conjunto Y). El objetivo principal del CCA es encontrar combinaciones lineales de las variables X e Y que maximicen la correlación entre ellas. Estas combinaciones lineales se denominan variables canónicas.

Principio de funcionamiento de la CCA:

Datos de entrada: Comienza con dos conjuntos de variables X e Y. Puede haber cualquier número de variables en estos conjuntos, y CCA intenta encontrar combinaciones lineales que maximicen la correlación entre ellas.
Construcción de variables canónicas: CCA identifica variables canónicas en X e Y que maximizan su correlación. Estas variables canónicas son combinaciones lineales de las variables originales, una por cada indicador canónico.
Evaluación de la correlación: CCA evalúa la correlación entre pares de variables canónicas. Las variables canónicas suelen ordenarse por correlación decreciente, de modo que el primer par tiene la correlación más alta, el segundo tiene la siguiente más alta, y así sucesivamente.
Interpretación: Las variables canónicas pueden interpretarse teniendo en cuenta su correlación y los pesos de las variables. Esto permite comprender qué variables de los conjuntos X e Y están más fuertemente relacionadas.

Ventajas de la CCA:

Revela conexiones ocultas: CCA puede ayudar a descubrir correlaciones ocultas entre dos conjuntos de variables que pueden no ser obvias durante el análisis inicial.
Resistente al ruido: CCA puede tener en cuenta el ruido en los datos y centrarse en las correlaciones más significativas.
Múltiples aplicaciones: CCA puede utilizarse en diversos campos, como la estadística, la bioinformática o las finanzas, entre otros, para estudiar las relaciones entre conjuntos de variables.

Limitaciones de la CCA:

Requiere más datos: CCA puede requerir una mayor cantidad de datos que otros métodos de análisis para estimar de forma fiable las correlaciones.
Relaciones lineales: CCA asume relaciones lineales entre variables, lo que puede ser insuficiente en algunos casos.
Complejidad de interpretación: La interpretación de variables canónicas puede ser compleja, especialmente cuando hay muchas variables en los conjuntos X e Y.

El CCA es beneficioso en tareas en las que se requiere estudiar la relación entre dos conjuntos de variables y descubrir correlaciones ocultas.

2.3.5.1. Código para crear el modelo CCA

# CCA.py
# The code demonstrates the process of training CCA model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com

# function to compare matching decimal places
def compare_decimal_places(value1, value2):
    # convert both values to strings
    str_value1 = str(value1)
    str_value2 = str(value2)

    # find the positions of the decimal points in the strings
    dot_position1 = str_value1.find(".")
    dot_position2 = str_value2.find(".")

    # if one of the values doesn't have a decimal point, return 0
    if dot_position1 == -1 or dot_position2 == -1:
        return 0

    # calculate the number of decimal places
    decimal_places1 = len(str_value1) - dot_position1 - 1
    decimal_places2 = len(str_value2) - dot_position2 - 1

    # find the minimum of the two decimal places counts
    min_decimal_places = min(decimal_places1, decimal_places2)

    # initialize a count for matching decimal places
    matching_count = 0

    # compare characters after the decimal point
    for i in range(1, min_decimal_places + 1):
        if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
            matching_count += 1
        else:
            break

    return matching_count

# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import CCA
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv

# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]

# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)

model_name="CCA"
onnx_model_filename = data_path + "cca"

# create an CCA model
regression_model = CCA(n_components=1)

# fit the model to the data
regression_model.fit(X, y.ravel())

# predict values for the entire dataset
y_pred = regression_model.predict(X)

# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)

print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)

print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)

# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')

# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]

# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)

# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)

print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")

# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name

# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
    print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")

# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
    print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")

# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)

# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]

# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))

# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')

Salida:

Python  CCA Original model (double)
Python  R-squared (Coefficient of determination): 0.9962347199278333
Python  Mean Absolute Error: 6.3561407034365995
Python  Mean Squared Error: 49.82504148022689

Pestaña Errores:

CCA.py started  CCA.py  1       1
Traceback (most recent call last):      CCA.py  1       1
    onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)     CCA.py  87      1
    onnx_model = convert_topology(      convert.py      208     1
    topology.convert_operators(container=container, verbose=verbose)    _topology.py    1532    1
    self.call_shape_calculator(operator)        _topology.py    1348    1
    operator.infer_types()      _topology.py    1163    1
    raise MissingShapeCalculator(       _topology.py    629     1
skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.cross_decomposition._pls.CCA'>'. _topology.py    629     1
It usually means the pipeline being converted contains a        _topology.py    629     1
transformer or a predictor with no corresponding converter      _topology.py    629     1
implemented in sklearn-onnx. If the converted is implemented    _topology.py    629     1
in another library, you need to register        _topology.py    629     1
the converted so that it can be used by sklearn-onnx (function  _topology.py    629     1
update_registered_converter). If the model is not yet covered   _topology.py    629     1
by sklearn-onnx, you may raise an issue to      _topology.py    629     1
https://github.com/onnx/sklearn-onnx/issues     _topology.py    629     1
to get the converter implemented or even contribute to the      _topology.py    629     1
project. If the model is a custom model, a new converter must   _topology.py    629     1
be implemented. Examples can be found in the gallery.   _topology.py    629     1
CCA.py finished in 2543 ms              19      1

Conclusión

El artículo revisaba 45 modelos de regresión disponibles en la biblioteca Scikit-learn versión 1.3.2.

1. De este conjunto, 5 modelos tuvieron dificultades al convertirlos al formato ONNX:

DummyRegressor (Dummy regressor / Regresor de Referencia);
KernelRidge (Kernel Ridge Regression / Regresión Ridge con núcleos);
IsotonicRegression (Isotonic Regression / Regresión isotónica);
PLSCanonical (Partial Least Squares Canonical Analysis / Análisis canónico de mínimos cuadrados parciales);
CCA (Canonical Correlation Analysis / Análisis de correlación canónica).

Estos modelos podrían ser demasiado complejos en su estructura o lógica y podrían utilizar estructuras de datos o algoritmos específicos que no son totalmente compatibles con el formato ONNX..

2. Los 40 modelos restantes se convirtieron con éxito a ONNX con cálculos en float de precisión.

ARDRegression: ARD (Automatic Relevance Determination Regression / Regresión con determinación automática de relevancia);
Regresión Bayesiana Ridge: Regresión Bayesiana Ridge con regularización;
ElasticNet: Combinación de regularización L1 y L2 para mitigar el sobreajuste;
ElasticNetCV: Elastic Net con selección automática de parámetros de regularización;
HuberRegressor: Regresión con sensibilidad reducida a los valores atípicos;
Lars: Regresión de ángulo mínimo;
LarsCV: Regresión de ángulo mínimo con validación cruzada;
Lasso: Regresión L1-regularizada para la selección de características;
LassoCV: Regresión Lasso con validación cruzada;
LassoLars: Combinación de Lasso y LARS para regresión;
LassoLarsCV: Regresión LassoLars con validación cruzada;
LassoLarsIC: Criterios de información para la selección de parámetros LassoLars;
Regresión lineal: Regresión lineal simple;
Ridge: Regresión lineal con regularización L2;
RidgeCV: Regresión Ridge con validación cruzada;
OrthogonalMatchingPursuit: Regresión con selección ortogonal de características;
RegresorPasivoAgresivo: Regresión con un enfoque de aprendizaje pasivo-agresivo;
QuantileRegressor: Regresión cuantil;
RANSACRegressor: Regresión con el método RANdom SAmple Consensus;
TheilSenRegressor: Regresión no lineal basada en el método Theil-Sen.
LinearSVR: Regresión lineal de vectores de soporte;
MLPRegressor: Regresión utilizando un perceptrón multicapa;
RegresiónPLS: Regresión parcial por mínimos cuadrados;
TweedieRegressor: Regresión basada en la distribución de Tweedie;
PoissonRegressor: Regresión para modelar datos con distribución de Poisson;
RadiusNeighborsRegressor: Regresión basada en vecinos de radio;
KNeighborsRegressor: Regresión basada en k-nearest neighbors (k-nn);
GaussianProcessRegressor: Regresión basada en procesos gaussianos;
GammaRegressor: Regresión para modelar datos con distribución gamma;
SGDRegressor: Regresión basada en el descenso de gradiente estocástico;
AdaBoostRegressor: Regresión mediante el algoritmo AdaBoost;
BaggingRegressor: Regresión que utiliza el método Bagging;
DecisionTreeRegressor: Regresión basada en árboles de decisión;
ExtraTreeRegressor: Regresión basada en árboles de decisión extra;
ExtraTreesRegressor: Regresión con árboles de decisión extra;
NuSVR: Regresión lineal SVR (Support Vector Regression / Regresión de vectores de soporte);
RandomForestRegressor: Regresión con un conjunto de árboles de decisión (Random Forest / Bosque aleatorio);
GradientBoostingRegressor: Regresión con refuerzo de gradiente;
HistGradientBoostingRegressor: Regresión con refuerzo de gradiente de histograma;
SVR: Support Vector Regression / Regresión de vectores de soporte.

3. También se estudió la posibilidad de convertir los modelos de regresión en ONNX con cálculos en doble precisión.

Un grave problema encontrado durante la conversión de modelos a doble precisión en ONNX es la limitación de los operadores ML ai.onnx.ml.LinearRegressor, ai.onnx.ml.SVMRegressor, ai.onnx.ml.TreeEnsembleRegressor: sus parámetros y valores de salida son de tipo float. Esencialmente, se trata de componentes de reducción de precisión y su ejecución en cálculos de doble precisión es dudosa. Por este motivo, la Librería ONNX Runtime no implementaba algunos operadores para modelos ONNX en doble precisión (podían producirse errores de naturaleza NOT_IMPLEMENTED nature might occur: 'Could not find an implementation for the node LinearRegressor:LinearRegressor(1)', 'Could not find an implementation for SVMRegressor(1) node with name 'SVM', etcétera). Por lo tanto, dentro de la especificación actual de ONNX, la operación completa de doble precisión para estos operadores ML es imposible.

Para los modelos de regresión lineal, el conversor sklearn-onnx consiguió saltarse la limitación de LinearRegressor: En su lugar se utilizan los operadores MatMul() y Add() ONNX. Gracias a este enfoque, los 30 primeros modelos de la lista anterior se convirtieron con éxito en modelos ONNX con cálculos en doble precisión, y estos modelos conservaron la precisión de los modelos originales en doble precisión.

Sin embargo, para operadores ML más complejos como SVMRegressor y TreeEnsembleRegressor, esto no se consiguió. Por lo tanto, modelos como AdaBoostRegressor, BaggingRegressor, DecisionTreeRegressor, ExtraTreeRegressor, ExtraTreesRegressor, NuSVR, RandomForestRegressor, GradientBoostingRegressor, HistGradientBoostingRegressor, y SVR están disponibles actualmente sólo en modelos ONNX con cálculos en float.

Resumen

El artículo cubría 45 modelos de regresión de la librería Scikit-learn versión 1.3.2 y sus resultados de conversión a formato ONNX tanto para cálculos en coma flotante como en doble precisión.

De todos los modelos revisados, 5 resultaron complejos para la conversión a ONNX. Estos modelos incluyen DummyRegressor, KernelRidge, IsotonicRegression, PLSCanonical y CCA. Sus estructuras o lógicas complejas pueden requerir una adaptación adicional para lograr una conversión exitosa a ONNX.

Los otros 40 modelos de regresión se transformaron con éxito al formato ONNX para float. Entre ellos, 30 modelos también se convirtieron con éxito al formato ONNX para doble precisión, conservando su precisión

Debido a la limitación en operadores ML para SVMRegressor y TreeEnsembleRegressor, los módulos AdaBoostRegressor, BaggingRegressor, DecisionTreeRegressor, ExtraTreeRegressor, ExtraTreesRegressor, NuSVR, RandomForestRegressor, GradientBoostingRegressor, HistGradientBoostingRegressor y SVR actualmente sólo están disponibles en modelos ONNX con cómputos en float.

Todos los scripts del artículo están también disponibles en el proyecto público MQL5\Shared Projects\Scikit.Regression.ONNX.

Traducción del ruso hecha por MetaQuotes Ltd.
Artículo original: https://www.mql5.com/ru/articles/13538

Archivos adjuntos |

Descargar ZIP

Scikit.Regression.ONNX.zip (563.48 KB)

Advertencia: todos los derechos de estos materiales pertenecen a MetaQuotes Ltd. Queda totalmente prohibido el copiado total o parcial.

Otros artículos del autor

Pasar a la discusión en el foro de los operadores

Creamos un asesor multidivisa sencillo utilizando MQL5 (Parte 6): Dos indicadores RSI se cruzan entre sí

Por asesor multidivisa en este artículo nos referimos a un asesor o robot comercial que utiliza dos indicadores RSI con líneas de intersección: un RSI rápido que se cruza con uno lento.

Algoritmos de optimización de la población: Algoritmo genético binario (Binary Genetic Algorithm, BGA). Parte I

En este artículo, analizaremos varios métodos utilizados en algoritmos genéticos binarios y otros algoritmos poblacionales. Asimismo, repasaremos los principales componentes del algoritmo, como la selección, el cruce y la mutación, así como su impacto en el proceso de optimización. Además, estudiaremos las formas de presentar la información y su repercusión en los resultados de la optimización.

Desarrollo de un sistema de repetición (Parte 43): Proyecto Chart Trade (II)

Gran parte de las personas que quieren, o desean aprender a programar, no tienen en realidad idea de lo que están haciendo. Lo que hacen es intentar crear las cosas de una determinada manera. Sin embargo, cuando programamos no estamos realmente intentando crear una solución. Si intentas hacerlo de esta manera, generarás más problemas que soluciones. Aquí haremos algo un poco más avanzado, y por consecuencia diferente.

Desarrollando un cliente MQTT para MetaTrader 5: metodología de TDD (Parte 5)

El presente artículo supone la quinta parte de la serie que describe las etapas de desarrollo de un cliente MQL5 nativo para el protocolo MQTT 5.0. Hoy describiremos la estructura de los paquetes PUBLISH: cómo establecemos sus banderas de publicación (Publish Flags), codificamos cadenas de nombres de temas y establecemos IDs de paquetes cuando es necesario.