Regression models of the Scikit-learn Library and their export to ONNX
ONNX (Open Neural Network Exchange) is a format for describing and exchanging machine learning models, providing the capability to transfer models between different machine learning frameworks. In deep learning and neural networks, data types like float32 are frequently used. They are widely applied because they usually provide acceptable accuracy and efficiency for training deep learning models.
Some classical machine learning models are difficult to represent as ONNX operators. Therefore, additional ML operators (ai.onnx.ml) were introduced to implement them in ONNX. It's worth noting that according to the ONNX specification, the key operators in this set (LinearRegressor, SVMRegressor, TreeEnsembleRegressor) can accept various types of input data (tensor(float), tensor(double), tensor(int64), tensor(int32)), but they always return the type tensor(float) as output. The parameterization of these operators is also performed using floating-point numbers, which may limit the accuracy of calculations, especially if double precision numbers were used to define the parameters of the original model.
This can lead to a loss of accuracy when converting models or using different data types in the process of converting and processing data in ONNX. Much depends on the converter, as we will see later; some models manage to bypass these limitations and ensure full portability of ONNX models, allowing work with them in double precision without losing accuracy. It's important to consider these characteristics when working with models and their representation in ONNX, especially in cases where the accuracy of data representation matters.
Scikit-learn is one of the most popular and widely used libraries for machine learning in the Python community. It offers a wide range of algorithms, a user-friendly interface, and good documentation. The previous article, "Classification Models of the Scikit-learn Library and Their Export to ONNX", covered classification models.
In this article, we will explore the application of regression models in the Scikit-learn package, compute their parameters with double precision for the test dataset, attempt to convert them to the ONNX format for float and double precision, and use the obtained models in programs on MQL5. Additionally, we will compare the accuracy of the original models and their ONNX versions for float and double precision. Furthermore, we will examine the ONNX representation of regression models, which will provide a better understanding of their internal structure and operation.
Contents
- If it bothers you, welcome to contribute
- 1. Test Dataset
The script for displaying the test dataset - 2. Regression Models
2.0. List of Scikit-learn Regression Models - 2.1. Scikit-learn Regression Models that convert to ONNX models float and double
- 2.1.1. sklearn.linear_model.ARDRegression
2.1.1.1. Code for creating the ARDRegression
2.1.1.2. MQL5 code for executing ONNX Models
2.1.1.3. ONNX representation of the ard_regression_float.onnx and ard_regression_double.onnx - 2.1.2. sklearn.linear_model.BayesianRidge
2.1.2.1. Code for creating the BayesianRidge model and exporting it to ONNX for float and double
2.1.2.2. MQL5 code for executing ONNX Models
2.1.2.3. ONNX representation of the bayesian_ridge_float.onnx and bayesian_ridge_double.onnx - 2.1.3. sklearn.linear_model.ElasticNet
2.1.3.1. Code for creating the ElasticNet model and exporting it to ONNX for float and double
2.1.3.2. MQL5 code for executing ONNX Models
2.1.3.3. ONNX representation of the elastic_net_float.onnx and elastic_net_double.onnx - 2.1.4. sklearn.linear_model.ElasticNetCV
2.1.4.1. Code for creating the ElasticNet model and exporting it to ONNX for float and double
2.1.4.2. MQL5 code for executing ONNX Models
2.1.4.3. ONNX representation of the elastic_net_cv_float.onnx and elastic_net_cv_double.onnx - 2.1.5. sklearn.linear_model.HuberRegressor
2.1.5.1. Code for creating the HuberRegressor model and exporting it to ONNX for float and double
2.1.5.2. MQL5 code for executing ONNX Models
2.1.5.3. ONNX representation of the huber_regressor_float.onnx and huber_regressor_double.onnx - 2.1.6. sklearn.linear_model.Lars
2.1.6.1. Code for creating the Lars model and exporting it to ONNX for float and double
2.1.6.2. MQL5 code for executing ONNX Models
2.1.6.3. ONNX representation of the lars_float.onnx and lars_double.onnx - 2.1.7. sklearn.linear_model.LarsCV
2.1.7.1. Code for creating the LarsCV model and exporting it to ONNX for float and double
2.1.7.2. MQL5 code for executing ONNX Models
2.1.7.3. ONNX representation of the lars_cv_float.onnx and lars_cv_double.onnx - 2.1.8. sklearn.linear_model.Lasso
2.1.8.1. Code for creating the Lasso model and exporting it to ONNX for float and double
2.1.8.2. MQL5 code for executing ONNX Models
2.1.8.3. ONNX representation of the lasso_float.onnx and lasso_double.onnx - 2.1.9. sklearn.linear_model.LassoCV
2.1.9.1. Code for creating the LassoCV model and exporting it to ONNX for float and double
2.1.9.2. MQL5 code for executing ONNX Models
2.1.9.3. ONNX representation of the lasso_cv_float.onnx and lasso_cv_double.onnx - 2.1.10. sklearn.linear_model.LassoLars
2.1.10.1. Code for creating the LassoLars model and exporting it to ONNX for float and double
2.1.10.2. MQL5 code for executing ONNX Models
2.1.10.3. ONNX representation of the lasso_lars_float.onnx and lasso_lars_double.onnx - 2.1.11. sklearn.linear_model.LassoLarsCV
2.1.11.1. Code for creating the LassoLarsCV model and exporting it to ONNX for float and double
2.1.11.2. MQL5 code for executing ONNX Models
2.1.11.3. ONNX representation of the lasso_lars_cv_float.onnx and lasso_lars_cv_double.onnx - 2.1.12. sklearn.linear_model.LassoLarsIC
2.1.12.1. Code for creating the LassoLarsIC model and exporting it to ONNX for float and double
2.1.12.2. MQL5 code for executing ONNX Models
2.1.12.3. ONNX representation of the lasso_lars_ic_float.onnx and lasso_lars_ic_double.onnx - 2.1.13. sklearn.linear_model.LinearRegression
2.1.13.1. Code for creating the LinearRegression model and exporting it to ONNX for float and double
2.1.13.2. MQL5 code for executing ONNX Models
2.1.13.3. ONNX representation of the linear_regression_float.onnx and linear_regression_double.onnx - 2.1.14. sklearn.linear_model.Ridge
2.1.14.1. Code for creating the Ridge model and exporting it to ONNX for float and double
2.1.14.2. MQL5 code for executing ONNX Models
2.1.14.3. ONNX representation of the ridge_float.onnx and ridge_double.onnx - 2.1.15. sklearn.linear_model.RidgeCV
2.1.15.1. Code for creating the RidgeCV model and exporting it to ONNX for float and double
2.1.15.2. MQL5 code for executing ONNX Models
2.1.15.3. ONNX representation of the ridge_cv_float.onnx and ridge_cv_double.onnx - 2.1.16. sklearn.linear_model.OrthogonalMatchingPursuit
2.1.16.1. Code for creating the OrthogonalMatchingPursuit model and exporting it to ONNX for float and double
2.1.16.2. MQL5 code for executing ONNX Models
2.1.16.3. ONNX representation of the orthogonal_matching_pursuit_float.onnx and orthogonal_matching_pursuit_double.onnx - 2.1.17. sklearn.linear_model.PassiveAggressiveRegressor
2.1.17.1. Code for creating the PassiveAggressiveRegressor model and exporting it to ONNX for float and double
2.1.17.2. MQL5 code for executing ONNX Models
2.1.17.3. ONNX representation of the passive_aggressive_regressor_float.onnx and passive_aggressive_regressor_double.onnx - 2.1.18. sklearn.linear_model.QuantileRegressor
2.1.18.1. Code for creating the QuantileRegressor model and exporting it to ONNX for float and double
2.1.18.2. MQL5 code for executing ONNX Models
2.1.18.3. ONNX representation of the quantile_regressor_float.onnx and quantile_regressor_double.onnx - 2.1.19. sklearn.linear_model.RANSACRegressor
2.1.19.1. Code for creating the RANSACRegressor model and exporting it to ONNX for float and double
2.1.19.2. MQL5 code for executing ONNX Models
2.1.19.3. ONNX representation of the ransac_regressor_float.onnx and ransac_regressor_double.onnx - 2.1.20. sklearn.linear_model.TheilSenRegressor
2.1.20.1. Code for creating the TheilSenRegressor model and exportingg it to ONNX for float and double
2.1.20.2. MQL5 code for executing ONNX Models
2.1.20.3. ONNX representation of the theil_sen_regressor_float.onnx and theil_sen_regressor_double.onnx - 2.1.21. sklearn.linear_model.LinearSVR
2.1.21.1. Code for creating the LinearSVR model and exporting it to ONNX for float and double
2.1.21.2. MQL5 code for executing ONNX Models
2.1.21.3. ONNX representation of the linear_svr_float.onnx and linear_svr_double.onnx - 2.1.22. sklearn.linear_model.MLPRegressor
2.1.22.1. Code for creating the MLPRegressor model and exporting it to ONNX for float and double
2.1.22.2. MQL5 code for executing ONNX Models
2.1.22.3. ONNX representation of the mlp_regressor_float.onnx and mlp_regressor_double.onnx - 2.1.23. sklearn.cross_decomposition.PLSRegression
2.1.23.1. Code for creating the PLSRegression model and exporting it to ONNX for float and double
2.1.23.2. MQL5 code for executing ONNX Models
2.1.23.3. ONNX representation of the pls_regression_float.onnx and pls_regression_double.onnx - 2.1.24. sklearn.linear_model.TweedieRegressor
2.1.24.1. Code for creating the TweedieRegressor model and exporting it to ONNX for float and double
2.1.24.2. MQL5 code for executing ONNX Models
2.1.24.3. ONNX representation of the tweedie_regressor_float.onnx and tweedie_regressor_double.onnx - 2.1.25. sklearn.linear_model.PoissonRegressor
2.1.25.1. Code for creating the PoissonRegressor model and exporting it to ONNX for float and double
2.1.25.2. MQL5 code for executing ONNX Models
2.1.25.3. ONNX representation of the poisson_regressor_float.onnx and poisson_regressor_double.onnx - 2.1.26. sklearn.neighbors.RadiusNeighborsRegressor
2.1.26.1. Code for creating the RadiusNeighborsRegressor model and exporting it to ONNX for float and double
2.1.26.2. MQL5 code for executing ONNX Models
2.1.26.3. ONNX representation of the radius_neighbors_regressor_float.onnx and radius_neighbors_regressor_double.onnx - 2.1.27. sklearn.neighbors.KNeighborsRegressor
2.1.27.1. Code for creating the KNeighborsRegressor model and exporting it to ONNX for float and double
2.1.27.2. MQL5 code for executing ONNX Models
2.1.27.3. ONNX representation of the kneighbors_regressor_float.onnx and kneighbors_regressor_double.onnx
- 2.1.28. sklearn.gaussian_process.GaussianProcessRegressor
2.1.28.1. Code for creating the GaussianProcessRegressor model and exporting it to ONNX for float and double
2.1.28.2. MQL5 code for executing ONNX Models
2.1.28.3. ONNX representation of the gaussian_process_regressor_float.onnx and gaussian_process_regressor_double.onnx
- 2.1.29. sklearn.linear_model.GammaRegressor
2.1.29.1. Code for creating the GammaRegressor model and exporting it to ONNX for float and double
2.1.29.2. MQL5 code for executing ONNX Models
2.1.29.3. ONNX representation of the gamma_regressor_float.onnx and gamma_regressor_double.onnx - 2.1.30. sklearn.linear_model.SGDRegressor
2.1.30.1. Code for creating the SGDRegressor model and exporting it to ONNX for float and double
2.1.30.2. MQL5 code for executing ONNX Models
2.1.30.3. ONNX representation of the sgd_regressor_float.onnx and sgd_rgressor_double.onnx
- 2.2. Regression models from the Scikit-learn library that are converted only into float precision ONNX models
- 2.2.1. sklearn.linear_model.AdaBoostRegressor
2.2.1.1. Code for creating the AdaBoostRegressor model and exporting it to ONNX for float and double
2.2.1.2. MQL5 code for executing ONNX Models
2.2.1.3. ONNX representation of the adaboost_regressor_float.onnx and adaboost_regressor_double.onnx - 2.2.2. sklearn.linear_model.BaggingRegressor
2.2.2.1. Code for creating the BaggingRegressor model and exporting it to ONNX for float and double
2.2.2.2. MQL5 code for executing ONNX Models
2.2.2.3. ONNX representation of the bagging_regressor_float.onnx and bagging_regressor_double.onnx - 2.2.3. sklearn.linear_model.DecisionTreeRegressor
2.2.3.1. Code for creating the DecisionTreeRegressor model and exporting it to ONNX for float and double
2.2.3.2. MQL5 code for executing ONNX Models
2.2.3.3. ONNX representation of the decision_tree_regressor_float.onnx and decision_tree_regressor_double.onnx - 2.2.4. sklearn.linear_model.ExtraTreeRegressor
2.2.4.1. Code for creating the ExtraTreeRegressor model and exporting it to ONNX for float and double
2.2.4.2. MQL5 code for executing ONNX Models
2.2.4.3. ONNX representation of the extra_tree_regressor_float.onnx and extra_tree_regressor_double.onnx - 2.2.5. sklearn.ensemble.ExtraTreesRegressor
2.2.5.1. Code for creating the ExtraTreesRegressor model and exporting it to ONNX for float and double
2.2.5.2. MQL5 code for executing ONNX Models
2.2.5.3. ONNX representation of the extra_trees_regressor_float.onnx and extra_trees_regressor_double.onnx - 2.2.6. sklearn.svm.NuSVR
2.2.6.1. Code for creating the NuSVR model and exporting it to ONNX for float and double
2.2.6.2. MQL5 code for executing ONNX Models
2.2.6.3. ONNX representation of the nu_svr_float.onnx and nu_svr_double.onnx - 2.2.7. sklearn.ensemble.RandomForestRegressor
2.2.7.1. Code for creating the RandomForestRegressor model and exporting it to ONNX for float and double
2.2.7.2. MQL5 code for executing ONNX Models
2.2.7.3. ONNX representation of the random_forest_regressor_float.onnx and random_forest_regressor_double.onnx
- 2.2.8. sklearn.ensemble.GradientBoostingRegressor
2.2.8.1. Code for creating the GradientBoostingRegressor model and exporting it to ONNX for float and double
2.2.8.2. MQL5 code for executing ONNX Models
2.2.8.3. ONNX representation of the gradient_boosting_regressor_float.onnx and gradient_boosting_regressor_double.onnx
- 2.2.9. sklearn.ensemble.HistGradientBoostingRegressor
2.2.9.1. Code for creating the HistGradientBoostingRegressor model and exporting it to ONNX for float and double
2.2.9.2. MQL5 code for executing ONNX Models
2.2.9.3. ONNX representation of the hist_gradient_boosting_regressor_float.onnx and hist_gradient_boosting_regressor_double.onnx - 2.2.10. sklearn.svm.SVR
2.2.10.1. Code for creating the SVR model and exporting it to ONNX for float and double
2.2.10.2. MQL5 code for executing ONNX Models
2.2.10.3. ONNX representation of the svr_float.onnx and svr_double.onnx
- 2.3. Regression Models that encountered problems when converting to ONNX
- 2.3.1. sklearn.dummy.DummyRegressor
Code for creating the DummyRegressor - 2.3.2. sklearn.kernel_ridge.KernelRidge
Code for creating the KernelRidge - 2.3.3. sklearn.isotonic.IsotonicRegression
Code for creating the IsotonicRegression - 2.3.4. sklearn.cross_decomposition.PLSCanonical
Code for creating the PLSCanonical - 2.3.5. sklearn.cross_decomposition.CCA
Code for creating the CCA - Conclusion
- Summary
If it bothers you, welcome to contribute
On the ONNX Runtime developer forum, one of the users reported an error "[ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for the node LinearRegressor:LinearRegressor(1)" when executing a model through ONNX Runtime.
Hi all, getting this error when trying to inferance a linear regression model. PLease help me resolve this.
"NOT_IMPLEMENTED : Could not find an implementation for the node LinearRegressor:LinearRegressor(1)" error from ONNX Runtime developer forum
Developer's response:
It is because we only implemented it for float32, not float64. But your model needs float64.
See:
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/core/providers/cpu/ml/linearregressor.cc#L16
If it bothers you, welcome to contribute.
In the user's ONNX model, the ai.onnx.ml.LinearRegressor operator is called with double (float64) data type, and the error message arises because the ONNX Runtime lacks support for the LinearRegressor() operator with double precision.
According to the specification of the ai.onnx.ml.LinearRegressor operator, the double input data type is possible (T: tensor(float), tensor(double), tensor(int64), tensor(int32)); however, the developers intentionally chose not to implement it.
The reason for this is that the output always returns Y: tensor(float) value. Furthermore, the computational parameters are float numbers (coefficients: list of floats, intercepts: list of floats).
Consequently, when the calculations are performed in double precision, this operator reduces the precision to float, and its implementation in double precision calculations has questionable value.
ai.onnx.ml.LinearRegressor operator description
Thus, the reduction of precision to float in the parameters and output value makes it impossible for the ai.onnx.ml.LinearRegressor to fully operate with double (float64) numbers. Presumably, for this reason, the ONNX Runtime developers decided to refrain from implementing it for the double type
The method of "adding double support" was demonstrated by the developers in code comments (highlighted in yellow).
In ONNX Runtime, its computation is performed using the LinearRegressor class (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/ml/linearregressor.h).
The operator's parameters, coefficients_, and intercepts_, are stored as std::vector<float>:
#pragma once #include "core/common/common.h" #include "core/framework/op_kernel.h" #include "core/util/math_cpuonly.h" #include "ml_common.h" namespace onnxruntime { namespace ml { class LinearRegressor final : public OpKernel { public: LinearRegressor(const OpKernelInfo& info); Status Compute(OpKernelContext* context) const override; private: int64_t num_targets_; std::vector<float> coefficients_; std::vector<float> intercepts_; bool use_intercepts_; POST_EVAL_TRANSFORM post_transform_; }; } // namespace ml } // namespace onnxruntimeThe implementation of LinearRegressor operator (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/ml/linearregressor.cc)
// Copyright (c) Microsoft Corporation. All rights reserved. // Licensed under the MIT License. #include "core/providers/cpu/ml/linearregressor.h" #include "core/common/narrow.h" #include "core/providers/cpu/math/gemm.h" namespace onnxruntime { namespace ml { ONNX_CPU_OPERATOR_ML_KERNEL( LinearRegressor, 1, // KernelDefBuilder().TypeConstraint("T", std::vector<MLDataType>{ // DataTypeImpl::GetTensorType<float>(), // DataTypeImpl::GetTensorType<double>()}), KernelDefBuilder().TypeConstraint("T", DataTypeImpl::GetTensorType<float>()), LinearRegressor); LinearRegressor::LinearRegressor(const OpKernelInfo& info) : OpKernel(info), intercepts_(info.GetAttrsOrDefault<float>("intercepts")), post_transform_(MakeTransform(info.GetAttrOrDefault<std::string>("post_transform", "NONE"))) { ORT_ENFORCE(info.GetAttr<int64_t>("targets", &num_targets_).IsOK()); ORT_ENFORCE(info.GetAttrs<float>("coefficients", coefficients_).IsOK()); // use the intercepts_ if they're valid use_intercepts_ = intercepts_.size() == static_cast<size_t>(num_targets_); } // Use GEMM for the calculations, with broadcasting of intercepts // https://github.com/onnx/onnx/blob/main/docs/Operators.md#Gemm // // X: [num_batches, num_features] // coefficients_: [num_targets, num_features] // intercepts_: optional [num_targets]. // Output: X * coefficients_^T + intercepts_: [num_batches, num_targets] template <typename T> static Status ComputeImpl(const Tensor& input, ptrdiff_t num_batches, ptrdiff_t num_features, ptrdiff_t num_targets, const std::vector<float>& coefficients, const std::vector<float>* intercepts, Tensor& output, POST_EVAL_TRANSFORM post_transform, concurrency::ThreadPool* threadpool) { const T* input_data = input.Data<T>(); T* output_data = output.MutableData<T>(); if (intercepts != nullptr) { TensorShape intercepts_shape({num_targets}); onnxruntime::Gemm<T>::ComputeGemm(CBLAS_TRANSPOSE::CblasNoTrans, CBLAS_TRANSPOSE::CblasTrans, num_batches, num_targets, num_features, 1.f, input_data, coefficients.data(), 1.f, intercepts->data(), &intercepts_shape, output_data, threadpool); } else { onnxruntime::Gemm<T>::ComputeGemm(CBLAS_TRANSPOSE::CblasNoTrans, CBLAS_TRANSPOSE::CblasTrans, num_batches, num_targets, num_features, 1.f, input_data, coefficients.data(), 1.f, nullptr, nullptr, output_data, threadpool); } if (post_transform != POST_EVAL_TRANSFORM::NONE) { ml::batched_update_scores_inplace(gsl::make_span(output_data, SafeInt<size_t>(num_batches) * num_targets), num_batches, num_targets, post_transform, -1, false, threadpool); } return Status::OK(); } Status LinearRegressor::Compute(OpKernelContext* ctx) const { Status status = Status::OK(); const auto& X = *ctx->Input<Tensor>(0); const auto& input_shape = X.Shape(); if (input_shape.NumDimensions() > 2) { return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Input shape had more than 2 dimension. Dims=", input_shape.NumDimensions()); } ptrdiff_t num_batches = input_shape.NumDimensions() <= 1 ? 1 : narrow<ptrdiff_t>(input_shape[0]); ptrdiff_t num_features = input_shape.NumDimensions() <= 1 ? narrow<ptrdiff_t>(input_shape.Size()) : narrow<ptrdiff_t>(input_shape[1]); Tensor& Y = *ctx->Output(0, {num_batches, num_targets_}); concurrency::ThreadPool* tp = ctx->GetOperatorThreadPool(); auto element_type = X.GetElementType(); switch (element_type) { case ONNX_NAMESPACE::TensorProto_DataType_FLOAT: { status = ComputeImpl<float>(X, num_batches, num_features, narrow<ptrdiff_t>(num_targets_), coefficients_, use_intercepts_ ? &intercepts_ : nullptr, Y, post_transform_, tp); break; } case ONNX_NAMESPACE::TensorProto_DataType_DOUBLE: { // TODO: Add support for 'double' to the scoring functions in ml_common.h // once that is done we can just call ComputeImpl<double>... // Alternatively we could cast the input to float. } default: status = ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Unsupported data type of ", element_type); } return status; } } // namespace ml } // namespace onnxruntime
It turns out that there is an option to use double numbers as input values and perform the operator's computation with float parameters. Another possibility could be to reduce the precision of the input data to float. However, none of these options can be considered a proper solution.
The specification of the ai.onnx.ml.LinearRegressor operator restricts the capability for full operation with double numbers since the parameters and output value are limited to the float type.
A similar situation occurs with other ONNX ML operators, such as ai.onnx.ml.SVMRegressor and ai.onnx.ml.TreeEnsembleRegressor.
As a result, all developers utilizing ONNX model execution in double precision face this limitation of the specification. A solution might involve extending the ONNX specification (or adding similar operators like LinearRegressor64, SVMRegressor64, and TreeEnsembleRegressor64 with parameters and output values in double). However, at present, this issue remains unresolved.
Much depends on the ONNX converter. For models calculated in double, it might be preferable to avoid using these operators (though this may not always be possible). In this particular case, the converter to ONNX did not work optimally with the user's model.
As we will see later, the sklearn-onnx converter manages to bypass the limitation of LinearRegressor: for ONNX double models, it uses ONNX operators MatMul() and Add() instead. Thanks to this method, numerous regression models of the Scikit-learn library are successfully converted into ONNX models calculated in double, preserving the accuracy of the original double models.
1. Test Dataset
To run the examples, you will need to install Python (we used version 3.10.8), additional libraries (pip install -U scikit-learn numpy matplotlib onnx onnxruntime skl2onnx), and specify the path to Python in the MetaEditor (in the menu Tools->Options->Compilers->Python).
As a test dataset, we will use generated values of the function y = 4X + 10sin(X*0.5).
To display a graph of such a function, open MetaEditor, create a file named RegressionData.py, copy the script text, and run it by clicking the "Compile" button.
The script for displaying the test dataset
# RegressionData.py # The code plots the synthetic data, used for all regression models # Copyright 2023, MetaQuotes Ltd. # https://mql5.com # import necessary libraries import numpy as np import matplotlib.pyplot as plt # generate synthetic data for regression X = np.arange(0,100,1).reshape(-1,1) y = 4*X + 10*np.sin(X*0.5) # set the figure size plt.figure(figsize=(8,5)) # plot the initial data for regression plt.scatter(X, y, label='Regression Data', marker='o') plt.xlabel('X') plt.ylabel('y') plt.legend() plt.title('Regression data') plt.show()
As a result, a graph of the function will be displayed, which we will use to test regression methods.
Fig.1. Function for testing regression models
2. Regression Models
The goal of a regression task is to find a mathematical function or model that best describes the relationship between features and the target variable to predict numerical values for new data. This allows making forecasts, optimizing solutions, and making informed decisions based on data.
Let's consider the main regression models in the scikit-learn package.
2.0. List of Scikit-learn Regression Models
To display a list of available scikit-learn regression models, you can use the script:
# ScikitLearnRegressors.py # The script lists all the regression algorithms available inb scikit-learn # Copyright 2023, MetaQuotes Ltd. # https://mql5.com # print Python version from platform import python_version print("The Python version is ", python_version()) # print scikit-learn version import sklearn print('The scikit-learn version is {}.'.format(sklearn.__version__)) # print scikit-learn regression models from sklearn.utils import all_estimators regressors = all_estimators(type_filter='regressor') for index, (name, RegressorClass) in enumerate(regressors, start=1): print(f"Regressor {index}: {name}")
Output:
The scikit-learn version is 1.3.2.
Regressor 1: ARDRegression
Regressor 2: AdaBoostRegressor
Regressor 3: BaggingRegressor
Regressor 4: BayesianRidge
Regressor 5: CCA
Regressor 6: DecisionTreeRegressor
Regressor 7: DummyRegressor
Regressor 8: ElasticNet
Regressor 9: ElasticNetCV
Regressor 10: ExtraTreeRegressor
Regressor 11: ExtraTreesRegressor
Regressor 12: GammaRegressor
Regressor 13: GaussianProcessRegressor
Regressor 14: GradientBoostingRegressor
Regressor 15: HistGradientBoostingRegressor
Regressor 16: HuberRegressor
Regressor 17: IsotonicRegression
Regressor 18: KNeighborsRegressor
Regressor 19: KernelRidge
Regressor 20: Lars
Regressor 21: LarsCV
Regressor 22: Lasso
Regressor 23: LassoCV
Regressor 24: LassoLars
Regressor 25: LassoLarsCV
Regressor 26: LassoLarsIC
Regressor 27: LinearRegression
Regressor 28: LinearSVR
Regressor 29: MLPRegressor
Regressor 30: MultiOutputRegressor
Regressor 31: MultiTaskElasticNet
Regressor 32: MultiTaskElasticNetCV
Regressor 33: MultiTaskLasso
Regressor 34: MultiTaskLassoCV
Regressor 35: NuSVR
Regressor 36: OrthogonalMatchingPursuit
Regressor 37: OrthogonalMatchingPursuitCV
Regressor 38: PLSCanonical
Regressor 39: PLSRegression
Regressor 40: PassiveAggressiveRegressor
Regressor 41: PoissonRegressor
Regressor 42: QuantileRegressor
Regressor 43: RANSACRegressor
Regressor 44: RadiusNeighborsRegressor
Regressor 45: RandomForestRegressor
Regressor 46: RegressorChain
Regressor 47: Ridge
Regressor 48: RidgeCV
Regressor 49: SGDRegressor
Regressor 50: SVR
Regressor 51: StackingRegressor
Regressor 52: TheilSenRegressor
Regressor 53: TransformedTargetRegressor
Regressor 54: TweedieRegressor
Regressor 55: VotingRegressor
For convenience in this list of regressors, they are highlighted in different colors. Models that require base regression model are highlighted in gray, while other models can be used independently. Note that models successfully exported to the ONNX format are marked in green, models that encounter errors during conversion in the current version of scikit-learn 1.2.2 are marked in red. Methods unsuitable for the considered test task are highlighted in blue.
Regression quality analysis uses regression metrics, which are functions of true and predicted values. In MQL5 language, several different metrics are available, detailed in the article "Evaluating ONNX models using regression metrics".
In this article, three metrics will be used to compare the quality of different models:
- Coefficient of determination R-squared (R2);
- Mean Absolute Error (MAE);
- Mean Squared Error (MSE).
2.1. Scikit-learn Regression Models that convert to ONNX models float and double
This section presents regression models that are successfully converted into ONNX formats in both float and double precisions.
All the regression models discussed further are presented in the following format:
- Model description, working principle, advantages, and limitations
- Python script for creating the model, exporting it to ONNX files in float and double formats, and executing the obtained models using ONNX Runtime in Python. Metrics like R^2, MAE, MSE, calculated using sklearn.metrics, are used to evaluate the quality of the original and ONNX models.
- MQL5 script for executing ONNX models (float and double) via ONNX Runtime, with metrics calculated using RegressionMetric().
- ONNX model representation in Netron for float and double precision.
2.1.1. sklearn.linear_model.ARDRegression
ARDRegression (Automatic Relevance Determination Regression) is a regression method designed to address regression problems while automatically determining the importance (relevance) of features and establishing their weights during the model training process.
ARDRegression enables the detection and use of only the most important features to build a regression model, which can be beneficial when dealing with a large number of features.
Working Principle of ARDRegression:
- Linear Regression: ARDRegression is based on linear regression, assuming a linear relationship between the independent variables (features) and the target variable.
- Automatic Feature Importance Determination: The main distinction of ARDRegression is its automatic determination of which features are most important for predicting the target variable. This is achieved by introducing prior distributions (regularization) over the weights, allowing the model to automatically set zero weights for less significant features.
- Estimation of Posterior Probabilities: ARDRegression computes posterior probabilities for each feature, enabling the determination of their importance. Features with high posterior probabilities are considered relevant and receive non-zero weights, while features with low posterior probabilities receive zero weights.
- Dimensionality Reduction: Thus, ARDRegression can lead to data dimensionality reduction by removing insignificant features.
Advantages of ARDRegression:
- Automatic Determination of Important Features: The method automatically identifies and uses only the most important features, potentially enhancing model performance and reducing the risk of overfitting.
- Resilience to Multicollinearity: ARDRegression handles multicollinearity well, even when features are highly correlated.
Limitations of ARDRegression:
- Requires Selection of Prior Distributions: Choosing suitable prior distributions might require experimentation.
- Computational Complexity: Training ARDRegression can be computationally expensive, particularly for large datasets.
ARDRegression is a regression method that automatically determines feature importance and establishes their weights based on posterior probabilities. This method is useful when considering only significant features for building a regression model and reducing data dimensionality is necessary.
2.1.1.1. Code for creating the ARDRegression model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.ARDRegression model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training ARDRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ARDRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name="ARDRegression"
onnx_model_filename = data_path + "ard_regression"
# create an ARDRegression model
regression_model = ARDRegression()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
The script creates and trains the sklearn.linear_model.ARDRegression model (the original model is considered in double), then exports the model to ONNX for float and double (ard_regression_float.onnx and ard_regression_double.onnx) and compares the accuracy of its operation.
It also generates files ARDRegression_plot_float.png and ARDRegression_plot_double.png, allowing a visual assessment of the results of ONNX models for float and double (Fig. 2-3).
Fig.2. Results of the ARDRegression.py (float)
Fig.3. Results of the ARDRegression.py (double)
Visually, the ONNX models for float and double look the same (Fig. 2-3), detailed information can be found in the Journal tab:
Python ARDRegression Original model (double) Python R-squared (Coefficient of determination): 0.9962382628120845 Python Mean Absolute Error: 6.347568012853758 Python Mean Squared Error: 49.77815934891289 Python Python ARDRegression ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ard_regression_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382627587808 Python Mean Absolute Error: 6.347568283744705 Python Mean Squared Error: 49.778160054267204 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python ONNX: MSE matching decimal places: 4 Python float ONNX model precision: 6 Python Python ARDRegression ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ard_regression_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382628120845 Python Mean Absolute Error: 6.347568012853758 Python Mean Squared Error: 49.77815934891289 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
In this example, the original model was considered in double, then it was exported into ONNX models ard_regression_float.onnx and ard_regression_double.onnx for float and double, respectively.
If the accuracy of the model is evaluated by Mean Absolute Error (MAE), the accuracy of the ONNX model for float is up to 6 decimal places, while the ONNX model using double showed accuracy retention up to 15 decimal places, in line with the precision of the original model.
Properties of the ONNX models can be viewed in MetaEditor (Fig. 4-5).
Fig.4. ard_regression_float.onnx ONNX-model in MetaEditor
Fig.5. ard_regression_double.onnx ONNX model in MetaEditor
A comparison between float and double ONNX models shows that in this case, the computation of ONNX models for ARDRegression occurs differently: for float numbers, the LinearRegressor() operator from ONNX-ML is used, whereas for double numbers, ONNX operators MatMul(), Add(), and Reshape() are used.
The implementation of the model in ONNX depends on the converter; in the examples for exporting to ONNX, the skl2onnx.convert_sklearn() function from the skl2onnx library will be used.
2.1.1.2. MQL5 code for executing ONNX Models
This code executes the saved ard_regression_float.onnx and ard_regression_double.onnx ONNX models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| ARDRegression.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "ARDRegression" #define ONNXFilenameFloat "ard_regression_float.onnx" #define ONNXFilenameDouble "ard_regression_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
ARDRegression (EURUSD,H1) Testing ONNX float: ARDRegression (ard_regression_float.onnx) ARDRegression (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382627587808 ARDRegression (EURUSD,H1) MQL5: Mean Absolute Error: 6.3475682837447049 ARDRegression (EURUSD,H1) MQL5: Mean Squared Error: 49.7781600542671896 ARDRegression (EURUSD,H1) ARDRegression (EURUSD,H1) Testing ONNX double: ARDRegression (ard_regression_double.onnx) ARDRegression (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382628120845 ARDRegression (EURUSD,H1) MQL5: Mean Absolute Error: 6.3475680128537597 ARDRegression (EURUSD,H1) MQL5: Mean Squared Error: 49.7781593489128795
Comparison with the original double model in Python:
Testing ONNX float: ARDRegression (ard_regression_float.onnx) Python Mean Absolute Error: 6.347568012853758 MQL5: Mean Absolute Error: 6.3475682837447049 Testing ONNX double: ARDRegression (ard_regression_double.onnx) Python Mean Absolute Error: 6.347568012853758 MQL5: Mean Absolute Error: 6.3475680128537597
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.1.3. The ONNX representations of models ard_regression_float.onnx and ard_regression_double.onnx
Netron (web version) is a tool for visualizing models and analyzing computation graphs, which can be used for models in the ONNX (Open Neural Network Exchange) format.
Netron presents model graphs and their architecture in a clear and interactive form, allowing the exploration of the structure and parameters of deep learning models, including those created using ONNX.
Key features of Netron include:
- Graph Visualization: Netron displays the model's architecture as a graph, enabling you to see the layers, operations, and connections between them. You can easily comprehend the structure and data flow within the model.
- Interactive Exploration: You can select nodes in the graph to obtain additional information about each operator and its parameters.
- Support for Various Formats: Netron supports a variety of deep learning model formats, including ONNX, TensorFlow, PyTorch, CoreML, and others.
- Parameter Analysis Capability: You can view the model's parameters and weights, which is useful for understanding the values used in different parts of the model.
Netron is convenient for developers and researchers in the field of machine learning and deep learning, as it simplifies the visualization and analysis of models, aiding in the understanding and debugging of complex neural networks.
This tool allows for quick model inspection, exploring their structure and parameters, easing the work with deep neural networks.
For more details about Netron, refer to the articles: Visualizing your Neural Network with Netron and Visualize Keras Neural Networks with Netron.
Video about Netron::
The ard_regression_float.onnx model is shown at Fig.6:
Fig.6. ONNX representation of the ard_regression_float.onnx model in Netron
The ai.onnx.ml LinearRegressor() ONNX operator is part of the ONNX standard, describing a model for regression tasks. This operator is used for regression, which involves predicting numerical (continuous) values based on input features
It takes model parameters as input, such as weights and bias, along with the input features, and executes linear regression. Linear regression estimates parameters (weights) for each input feature and then performs a linear combination of these features with the weights to generate a prediction.
This operator performs the following steps:
- Takes the model's weights and bias, along with input features.
- For each example of input data, performs a linear combination of weights with the corresponding features.
- Adds the bias to the resulting value.
The result is the prediction of the target variable in the regression task.
The LinearRegressor() parameters are shown in Fig.7.
Fig.7. The LinearRegressor() operator properties of the ard_regression_float.onnx model in Netron
Fig.8. ONNX representation of the ard_regression_double.onnx model in Netron
The parameters of the MatMul(), Add() and Reshape() ONNX-operators is shown at Fig.9-11.
Fig.9. Properties of the MatMul operator in the ard_regression_double.onnx model in Netron
The MatMul (matrix multiplication) ONNX operator performs the multiplication of two matrices.
It takes two inputs: two matrices and returns their matrix product.
If you have two matrices, A and B, then the result of Matmul(A, B) is a matrix C, where each element C[i][j] is calculated as the sum of the products of the elements from row i of matrix A by the elements from column j of matrix B.
Fig.10. Properties of the Add operator in the ard_regression_double.onnx model in Netron
The Add() ONNX operator performs element-wise addition of two tensors or arrays of the same shape.
It takes two inputs and returns the result, where each element of the resulting tensor equals the sum of the corresponding elements of the input tensors.
Fig.11. Properties of the Reshape operator in the ard_regression_double.onnx model in Netron
The Reshape(-1,1) ONNX operator is used to modify the shape (or dimension) of input data. In this operator, the value -1 for the dimension indicates that the size of that dimension should be automatically computed based on the other dimensions to ensure data consistency.
The value 1 in the second dimension specifies that after the shape transformation, each element will have a single sub-dimension.
2.1.2. sklearn.linear_model.BayesianRidge
BayesianRidge is a regression method that utilizes a Bayesian approach to estimate model parameters. This method enables modeling the prior distribution of parameters and updating it considering the data to obtain the posterior distribution of parameters.
BayesianRidge is a Bayesian regression method designed to predict the dependent variable based on one or several independent variables.
Working Principle of BayesianRidge:
- Prior distribution of parameters: It begins with defining the prior distribution of model parameters. This distribution represents prior knowledge or assumptions about model parameters before considering the data. In the case of BayesianRidge, Gaussian-shaped prior distributions are used.
- Updating the parameter distribution: Once the prior parameter distribution is set, it is updated based on the data. This is done using Bayesian theory, where the posterior distribution of parameters is computed considering the data. An essential aspect is the estimation of hyperparameters, which influence the form of the posterior distribution.
- Prediction: After estimating the posterior distribution of parameters, predictions can be made for new observations. This results in a distribution of forecasts rather than a single point value, allowing for uncertainty in predictions to be considered.
Advantages of BayesianRidge:
- Uncertainty consideration: BayesianRidge accounts for uncertainty in model parameters and predictions. Instead of point predictions, confidence intervals are provided.
- Regularization: The Bayesian regression method can be useful for model regularization, aiding in preventing overfitting.
- Automatic feature selection: BayesianRidge can automatically determine feature importance by reducing the weights of insignificant features.
Limitations of BayesianRidge:
- Computational complexity: The method requires computational resources to estimate parameters and compute the posterior distribution.
- High abstraction level: A deeper understanding of Bayesian statistics may be required to comprehend and use BayesianRidge.
- Not always the best choice: BayesianRidge may not be the most suitable method in certain regression tasks, particularly when dealing with limited data.
BayesianRidge is useful in regression tasks where the uncertainty of parameters and predictions is important and in cases where model regularization is needed.
2.1.2.1. Code for creating the BayesianRidge model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.BayesianRidge model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training BayesianRidge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "BayesianRidge"
onnx_model_filename = data_path + "bayesian_ridge"
# create a Bayesian Ridge regression model
regression_model = BayesianRidge()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ", compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python BayesianRidge Original model (double) Python R-squared (Coefficient of determination): 0.9962382628120845 Python Mean Absolute Error: 6.347568012853758 Python Mean Squared Error: 49.77815934891288 Python Python BayesianRidge ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bayesian_ridge_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382627587808 Python Mean Absolute Error: 6.347568283744705 Python Mean Squared Error: 49.778160054267204 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python MSE matching decimal places: 4 Python float ONNX model precision: 6 Python Python BayesianRidge ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bayesian_ridge_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382628120845 Python Mean Absolute Error: 6.347568012853758 Python Mean Squared Error: 49.77815934891288 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.12. Results of the BayesianRidge.py (float ONNX)
2.1.2.2. MQL5 code for executing ONNX Models
This code executes the saved bayesian_ridge_float.onnx and bayesian_ridge_double.onnx ONNX models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| BayesianRidge.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "BayesianRidge" #define ONNXFilenameFloat "bayesian_ridge_float.onnx" #define ONNXFilenameDouble "bayesian_ridge_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
BayesianRidge (EURUSD,H1) Testing ONNX float: BayesianRidge (bayesian_ridge_float.onnx) BayesianRidge (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382627587808 BayesianRidge (EURUSD,H1) MQL5: Mean Absolute Error: 6.3475682837447049 BayesianRidge (EURUSD,H1) MQL5: Mean Squared Error: 49.7781600542671896 BayesianRidge (EURUSD,H1) BayesianRidge (EURUSD,H1) Testing ONNX double: BayesianRidge (bayesian_ridge_double.onnx) BayesianRidge (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382628120845 BayesianRidge (EURUSD,H1) MQL5: Mean Absolute Error: 6.3475680128537624 BayesianRidge (EURUSD,H1) MQL5: Mean Squared Error: 49.7781593489128866
Comparison with the original double model in Python:
Testing ONNX float: BayesianRidge (bayesian_ridge_float.onnx) Python Mean Absolute Error: 6.347568012853758 MQL5: Mean Absolute Error: 6.3475682837447049 Testing ONNX double: BayesianRidge (bayesian_ridge_double.onnx) Python Mean Absolute Error: 6.347568012853758 MQL5: Mean Absolute Error: 6.3475680128537624
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.2.3. ONNX representation of bayesian_ridge_float.onnx and bayesian_ridge_double.onnx
Fig.13. ONNX representation of the bayesian_ridge_float.onnx in Netron
Fig.14. ONNX representation of the bayesian_ridge_double.onnx in Netron
Note on ElasticNet and ElasticNetCV Methods
ElasticNet and ElasticNetCV are two related machine learning methods used for regularizing regression models, especially linear regression. They share common functionality but differ in their manner of use and application.
ElasticNet (Elastic Net Regression):
- Working Principle: ElasticNet is a regression method that combines Lasso (L1 regularization) and Ridge (L2 regularization). It adds two regularization components to the loss function: one penalizes the model for large absolute values of coefficients (like Lasso), and the other penalizes the model for large squares of coefficients (like Ridge).
- ElasticNet is commonly used when there is multicollinearity in the data (when features are highly correlated) and when dimensionality reduction is needed, as well as controlling coefficient values.
ElasticNetCV (Elastic Net Cross-Validation):
- Working Principle: ElasticNetCV is an extension of ElasticNet that involves automatically selecting optimal hyperparameters alpha (the mixing coefficient between L1 and L2 regularization) and lambda (the regularization strength) using cross-validation. It iterates through various alpha and lambda values, choosing the combination that performs best in cross-validation.
- Advantages: ElasticNetCV automatically tunes model parameters based on cross-validation, allowing for the selection of optimal hyperparameter values without the need for manual tuning. This makes it more convenient to use and helps prevent model overfitting.
Thus, the main difference between ElasticNet and ElasticNetCV is that ElasticNet is the regression method applied to data, while ElasticNetCV is a tool that automatically finds optimal hyperparameter values for the ElasticNet model using cross-validation. ElasticNetCV is helpful when you need to find the best model parameters and make the tuning process more automated.
2.1.3. sklearn.linear_model.ElasticNet
ElasticNet is a regression method that represents a combination of L1 (Lasso) and L2 (Ridge) regularization.
This method is used for regression, which means predicting numerical values of a target variable based on a set of features. ElasticNet helps control overfitting and considers both L1 and L2 penalties on model coefficients.
Operation Principle of ElasticNet:
- Input Data: It starts with the original dataset where we have features (independent variables) and corresponding values of the target variable.
- Objective Function: ElasticNet minimizes the loss function that includes two components - mean squared error (MSE) and two regularizations: L1 (Lasso) and L2 (Ridge). This means the objective function looks like this:
Objective Function = MSE + α * L1 + β * L2
Where α and β are hyperparameters that control the weights of L1 and L2 regularization, respectively. - Finding Optimal α and β: The method of cross-validation is usually used to find the best values of α and β. This allows selecting values that strike a balance between reducing overfitting and preserving essential features.
- Model Training: ElasticNet trains the model considering the optimal α and β by minimizing the objective function.
- Prediction: After the model is trained, ElasticNet can be used to predict target variable values for new data.
Advantages of ElasticNet:
- Feature Selection Capability: ElasticNet can automatically select the most important features by setting weights to zero for insignificant features (similar to Lasso).
- Overfitting Control: ElasticNet allows controlling overfitting due to L1 and L2 regularization.
- Dealing with Multicollinearity: This method is useful when multicollinearity exists (high correlation between features) as L2 regularization can reduce the influence of multicollinear features.
Limitations of ElasticNet:
- Requires tuning of hyperparameters α and β, which can be a non-trivial task.
- Depending on parameter choices, ElasticNet may retain too few or too many features, affecting the model's quality.
ElasticNet is a powerful regression method that can be beneficial in tasks where feature selection and overfitting control are crucial.
2.1.3.1. Code for creating the ElasticNet model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.ElasticNet model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training ElasticNet model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ElasticNet
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "ElasticNet"
onnx_model_filename = data_path + "elastic_net"
# create an ElasticNet model
regression_model = ElasticNet()
# fit the model to the data
regression_model.fit(X,y)
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python ElasticNet Original model (double) Python R-squared (Coefficient of determination): 0.9962377031744798 Python Mean Absolute Error: 6.344394662876524 Python Mean Squared Error: 49.78556489812415 Python Python ElasticNet ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962377032416807 Python Mean Absolute Error: 6.344395027824294 Python Mean Squared Error: 49.78556400887057 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 5 Python MSE matching decimal places: 6 Python float ONNX model precision: 5 Python Python ElasticNet ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962377031744798 Python Mean Absolute Error: 6.344394662876524 Python Mean Squared Error: 49.78556489812415 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.15. Results of the ElasticNet.py (float ONNX)
2.1.3.2. MQL5 code for executing ONNX Models
This code executes the saved elastic_net_double.onnx and elastic_net_float.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| ElasticNet.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "ElasticNet" #define ONNXFilenameFloat "elastic_net_float.onnx" #define ONNXFilenameDouble "elastic_net_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
ElasticNet (EURUSD,H1) Testing ONNX float: ElasticNet (elastic_net_float.onnx) ElasticNet (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962377032416807 ElasticNet (EURUSD,H1) MQL5: Mean Absolute Error: 6.3443950278242944 ElasticNet (EURUSD,H1) MQL5: Mean Squared Error: 49.7855640088705869 ElasticNet (EURUSD,H1) ElasticNet (EURUSD,H1) Testing ONNX double: ElasticNet (elastic_net_double.onnx) ElasticNet (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962377031744798 ElasticNet (EURUSD,H1) MQL5: Mean Absolute Error: 6.3443946628765220 ElasticNet (EURUSD,H1) MQL5: Mean Squared Error: 49.7855648981241217
Comparison with the original double model in Python:
Testing ONNX float: ElasticNet (elastic_net_float.onnx) Python Mean Absolute Error: 6.344394662876524 MQL5: Mean Absolute Error: 6.3443950278242944 Testing ONNX double: ElasticNet (elastic_net_double.onnx) Python Mean Absolute Error: 6.344394662876524 MQL5: Mean Absolute Error: 6.3443946628765220
Accuracy of ONNX float MAE: 5 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.3.3. ONNX representation of elastic_net_float.onnx and elastic_net_double.onnx
Fig.16. ONNX representation of the elastic_net_float.onnx in Netron
Fig.17. ONNX representation of the elastic_net_double.onnx in Netron
2.1.4. sklearn.linear_model.ElasticNetCV
ElasticNetCV is an extension of the ElasticNet method designed for automatically selecting optimal values of hyperparameters α and β (L1 and L2 regularization) using cross-validation
This allows finding the best combination of regularizations for the ElasticNet model without the need for manual parameter tuning.
Operation Principle of ElasticNetCV:
- Input Data: It begins with the original dataset containing features (independent variables) and their corresponding target variable values.
- Defining the α and β Range: The user specifies the range of values for α and β to be considered during optimization. These values are typically chosen on a logarithmic scale.
- Data Splitting: The dataset is divided into multiple folds for cross-validation. Each fold is used as a test dataset while the others are used for training.
- Cross-Validation: For each combination of α and β within the specified range, cross-validation is performed. The ElasticNet model is trained on the training data and then evaluated on the test data.
- Performance Evaluation: The average error on test datasets in the cross-validation is computed for each α and β combination.
- Selection of Optimal Parameters: Values of α and β corresponding to the minimum average error obtained during cross-validation are determined.
- Model Training with Optimal Parameters: The ElasticNetCV model is trained using the found optimal values of α and β.
- Prediction: After training, the model can be used to predict target variable values for new data.
Advantages of ElasticNetCV:
- Automatic Hyperparameter Selection: ElasticNetCV automatically finds optimal values of α and β, simplifying model tuning.
- Overfitting Prevention: Cross-validation aids in selecting a model with good generalization ability.
- Noise Robustness: This method is robust against data noise and can identify the best combination of regularizations while considering noise.
Limitations of ElasticNetCV:
- Computational Complexity: Performing cross-validation over a large parameter range can be time-consuming.
- Optimal Parameters Depend on the Range Choice: Results might depend on the choice of the α and β range, so it's important to carefully adjust this range.
ElasticNetCV is a powerful tool for automatically tuning regularization in the ElasticNet model and enhancing its performance.
2.1.4.1. Code for creating the ElasticNetCV model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.ElasticNetCV model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# ElasticNetCV.py
# The code demonstrates the process of training ElasticNetCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import ElasticNetCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "ElasticNetCV"
onnx_model_filename = data_path + "elastic_net_cv"
# create an ElasticNetCV model
regression_model = ElasticNetCV()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python ElasticNetCV Original model (double) Python R-squared (Coefficient of determination): 0.9962137763338385 Python Mean Absolute Error: 6.334487104423225 Python Mean Squared Error: 50.10218299945999 Python Python ElasticNetCV ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_cv_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962137770260989 Python Mean Absolute Error: 6.334486542922601 Python Mean Squared Error: 50.10217383894468 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 5 Python MSE matching decimal places: 4 Python float ONNX model precision: 5 Python Python ElasticNetCV ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\elastic_net_cv_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962137763338385 Python Mean Absolute Error: 6.334487104423225 Python Mean Squared Error: 50.10218299945999 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
<
Fig.18. Results of the ElasticNetCV.py (float ONNX)
2.1.4.2. MQL5 code for executing ONNX Models
This code executes the saved elastic_net_cv_float.onnx and elastic_net_cv_double.onnx ONNX models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| ElasticNetCV.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "ElasticNetCV" #define ONNXFilenameFloat "elastic_net_cv_float.onnx" #define ONNXFilenameDouble "elastic_net_cv_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
ElasticNetCV (EURUSD,H1) Testing ONNX float: ElasticNetCV (elastic_net_cv_float.onnx) ElasticNetCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962137770260989 ElasticNetCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3344865429226038 ElasticNetCV (EURUSD,H1) MQL5: Mean Squared Error: 50.1021738389446938 ElasticNetCV (EURUSD,H1) ElasticNetCV (EURUSD,H1) Testing ONNX double: ElasticNetCV (elastic_net_cv_double.onnx) ElasticNetCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962137763338385 ElasticNetCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3344871044232205 ElasticNetCV (EURUSD,H1) MQL5: Mean Squared Error: 50.1021829994599983
Comparison with the original double model in Python:
Testing ONNX float: ElasticNetCV (elastic_net_cv_float.onnx) Python Mean Absolute Error: 6.334487104423225 MQL5: Mean Absolute Error: 6.3344865429226038 Testing ONNX double: ElasticNetCV (elastic_net_cv_double.onnx) Python Mean Absolute Error: 6.334487104423225 MQL5: Mean Absolute Error: 6.3344871044232205
Accuracy of ONNX float MAE: 5 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.4.3. ONNX representation of the elastic_net_cv_float.onnx and elastic_net_cv_double.onnx
Fig.19. ONNX representation of the elastic_net_cv_float.onnx in Netron
Fig.20. ONNX representation of the elastic_net_cv_double.onnx in Netron
2.1.5. sklearn.linear_model.HuberRegressor
HuberRegressor - is a machine learning method used for regression tasks, which is a modification of the Ordinary Least Squares (OLS) method and is designed to be robust to outliers in the data.
Unlike OLS, which minimizes the squares of errors, HuberRegressor minimizes a combination of squared errors and absolute errors. This allows the method to work more robustly in the presence of outliers in the data.
Working Principle of HuberRegressor:
- Input Data: It starts with the original dataset, where there are features (independent variables) and their corresponding target variable values.
- Huber Loss Function: HuberRegressor utilizes the Huber loss function, which combines a quadratic loss function for small errors and a linear loss function for large errors. This makes the method more resilient to outliers.
- Model Training: The model is trained on data using the Huber loss function. During training, it adjusts the weights (coefficients) for each feature and the bias.
- Prediction: After training, the model can be used to predict target variable values for new data.
Advantages of HuberRegressor:
- Robustness to Outliers: HuberRegressor is more robust to outliers in the data compared to OLS, making it useful in tasks where data might contain anomalous values.
- Error Estimation: The Huber loss function contributes to the estimation of prediction errors, which can be useful for analyzing model results.
- Regularization Level: HuberRegressor can also incorporate a level of regularization, which can reduce overfitting.
Limitations of HuberRegressor:
- Not as Accurate as OLS in the Absence of Outliers: In cases where there are no outliers in the data, OLS might provide more accurate results.
- Parameter Tuning: HuberRegressor has a parameter that defines the threshold for what is considered "large" to switch to the linear loss function. This parameter requires tuning.
HuberRegressor is valuable in regression tasks where data may contain outliers, and a model that is robust to such anomalies is required.
2.1.5.1. Code for creating the HuberRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.HuberRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX..
# The code demonstrates the process of training HuberRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import HuberRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "HuberRegressor"
onnx_model_filename = data_path + "huber_regressor"
# create a Huber Regressor model
huber_regressor_model = HuberRegressor()
# fit the model to the data
huber_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = huber_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(huber_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(huber_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python HuberRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9962363935647066 Python Mean Absolute Error: 6.341633708569641 Python Mean Squared Error: 49.80289464784336 Python Python HuberRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\huber_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962363944236795 Python Mean Absolute Error: 6.341633300252807 Python Mean Squared Error: 49.80288328126165 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 6 Python ONNX: MSE matching decimal places: 4 Python float ONNX model precision: 6 Python Python HuberRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\huber_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962363935647066 Python Mean Absolute Error: 6.341633708569641 Python Mean Squared Error: 49.80289464784336 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.21. Results of the HuberRegressor.py (float ONNX)
2.1.5.2. MQL5 code for executing ONNX Models
This code executes the saved huber_regressor_float.onnx and huber_regressor_double.onnx ONNX models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| HuberRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "HuberRegressor" #define ONNXFilenameFloat "huber_regressor_float.onnx" #define ONNXFilenameDouble "huber_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
HuberRegressor (EURUSD,H1) Testing ONNX float: HuberRegressor (huber_regressor_float.onnx) HuberRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962363944236795 HuberRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3416333002528074 HuberRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.8028832812616571 HuberRegressor (EURUSD,H1) HuberRegressor (EURUSD,H1) Testing ONNX double: HuberRegressor (huber_regressor_double.onnx) HuberRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962363935647066 HuberRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3416337085696410 HuberRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.8028946478433525
Comparison with the original double model in Python:
Testing ONNX float: HuberRegressor (huber_regressor_float.onnx) Python Mean Absolute Error: 6.341633708569641 MQL5: Mean Absolute Error: 6.3416333002528074 Testing ONNX double: HuberRegressor (huber_regressor_double.onnx) Python Mean Absolute Error: 6.341633708569641 MQL5: Mean Absolute Error: 6.3416337085696410
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.5.3. ONNX representation of the huber_regressor_float.onnx and huber_regressor_double.onnx
Fig.22. ONNX representation of the huber_regressor_float.onnx in Netron
Fig.23. ONNX representation of the huber_regressor_double.onnx in Netron
2.1.6. sklearn.linear_model.Lars
LARS (Least Angle Regression) is a machine learning method used for regression tasks. It's an algorithm that constructs a linear regression model by selecting active features (variables) during the learning process
LARS attempts to find the fewest features that provide the best approximation to the target variable.
Working Principle of LARS:
- Input Data: It starts with the original dataset, comprising features (independent variables) and their corresponding target variable values.
- Initialization: It begins with a null model, meaning no active features. All coefficients are set to zero.
- Feature Selection: At each step, LARS selects the feature most correlated with the model's residuals. This feature is then added to the model, and its corresponding coefficient is adjusted using the least squares method.
- Regression Along Active Features: After adding the feature to the model, LARS updates the coefficients of all active features to accommodate changes in the new model.
- Repetitive Steps: This process continues until all features are selected or a specified stopping criterion is met.
- Prediction: After model training, it can be used to predict target variable values for new data.
Advantages of LARS:
- Efficiency: LARS can be an efficient method, especially when there are many features, but only a few significantly affect the target variable.
- Interpretability: Since LARS aims to select only the most informative features, the model remains relatively interpretable.
Limitations of LARS:
- Linear Model: LARS builds a linear model, which might be insufficient for modeling complex nonlinear relationships.
- Noise Sensitivity: The method can be sensitive to outliers in the data.
- Inability to Handle Multicollinearity: If features are highly correlated, LARS might encounter multicollinearity issues.
LARS is valuable in regression tasks where selecting the most informative features and constructing a linear model with a minimal number of features is essential.
2.1.6.1. Code for creating the Lars model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.Lars model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training Lars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lars
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "Lars"
onnx_model_filename = data_path + "lars"
# create a Lars Regressor model
lars_regressor_model = Lars()
# fit the model to the data
lars_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = lars_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(lars_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(lars_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python Lars Original model (double) Python R-squared (Coefficient of determination): 0.9962382642613388 Python Mean Absolute Error: 6.347737926336425 Python Mean Squared Error: 49.778140171281784 Python Python Lars ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382641628886 Python Mean Absolute Error: 6.3477377671679385 Python Mean Squared Error: 49.77814147404787 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python Lars ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642613388 Python Mean Absolute Error: 6.347737926336425 Python Mean Squared Error: 49.778140171281784 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 15 Python double ONNX model precision: 15
Fig.24. Results of the Lars.py (float ONNX)
2.1.6.2. MQL5 code for executing ONNX Models
This code executes the saved lars_cv_float.onnx and lars_cv_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| Lars.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "Lars" #define ONNXFilenameFloat "lars_float.onnx" #define ONNXFilenameDouble "lars_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
Lars (EURUSD,H1) Testing ONNX float: Lars (lars_float.onnx) Lars (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382641628886 Lars (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477377671679385 Lars (EURUSD,H1) MQL5: Mean Squared Error: 49.7781414740478638 Lars (EURUSD,H1) Lars (EURUSD,H1) Testing ONNX double: Lars (lars_double.onnx) Lars (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642613388 Lars (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379263364302 Lars (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401712817768
Comparison with the original double model in Python:
Testing ONNX float: Lars (lars_float.onnx) Python Mean Absolute Error: 6.347737926336425 MQL5: Mean Absolute Error: 6.3477377671679385 Testing ONNX double: Lars (lars_double.onnx) Python Mean Absolute Error: 6.347737926336425 MQL5: Mean Absolute Error: 6.3477379263364302
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.6.3. ONNX representation of the lars_float.onnx and lars_double.onnx
Fig.25. ONNX representation of the lars_float.onnx in Netron
Fig.26. ONNX representation of lars_double.onnx in Netron
2.1.7. sklearn.linear_model.LarsCV
LarsCV is a variation of the LARS (Least Angle Regression) method that automatically selects the optimal number of features to include in the model using cross-validation.
This method helps strike a balance between a model that generalizes data effectively and one that uses a minimal number of features.
Working Principle of LarsCV:
- Input Data: It begins with the original dataset, comprising features (independent variables) and their corresponding target variable values.
- Initialization: It starts with a null model, which means no active features. All coefficients are set to zero.
- Cross-Validation: LarsCV performs cross-validation for different quantities of included features. This evaluates the model's performance with different sets of features.
- Selecting the Optimal Number of Features: LarsCV chooses the number of features that yields the best model performance, as determined through cross-validation.
- Model Training: The model is trained using the chosen number of features and their respective coefficients.
- Prediction: After training, the model can be used to predict target variable values for new data.
Advantages of LarsCV:
- Automatic Feature Selection: LarsCV automatically chooses the optimal number of features, simplifying the model setup process.
- Interpretability: Similar to the regular LARS, LarsCV maintains relatively high model interpretability.
- Efficiency: The method can be efficient, especially when datasets have many features, but only a few are significant.
Limitations of LarsCV:
- Linear Model: LarsCV constructs a linear model, which might be insufficient for modeling complex nonlinear relationships.
- Noise Sensitivity: The method can be sensitive to outliers in the data.
- Inability to Handle Multicollinearity: If features are highly correlated, LarsCV might encounter multicollinearity issues.
LarsCV is useful in regression tasks where automatically choosing the best set of features used in the model and maintaining model interpretability are important.
2.1.7.1. Code for creating the LarsCV model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.LarsCV model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training LarsCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LarsCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "LarsCV"
onnx_model_filename = data_path + "lars_cv"
# create a LarsCV Regressor model
larscv_regressor_model = LarsCV()
# fit the model to the data
larscv_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = larscv_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(larscv_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(larscv_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python LarsCV Original model (double) Python R-squared (Coefficient of determination): 0.9962382642612767 Python Mean Absolute Error: 6.3477379221400145 Python Mean Squared Error: 49.77814017210321 Python Python LarsCV ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_cv_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382640824089 Python Mean Absolute Error: 6.347737845846069 Python Mean Squared Error: 49.778142539016564 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python ONNX: MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python LarsCV ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lars_cv_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642612767 Python Mean Absolute Error: 6.3477379221400145 Python Mean Squared Error: 49.77814017210321 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 16 Python MSE matching decimal places: 14 Python double ONNX model precision: 16
Fig.27. Results of the LarsCV.py (float ONNX)
2.1.7.2. MQL5 code for executing ONNX Models
This code executes the saved lars_cv_float.onnx and lars_cv_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| LarsCV.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "LarsCV" #define ONNXFilenameFloat "lars_cv_float.onnx" #define ONNXFilenameDouble "lars_cv_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
LarsCV (EURUSD,H1) Testing ONNX float: LarsCV (lars_cv_float.onnx) LarsCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382640824089 LarsCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477378458460691 LarsCV (EURUSD,H1) MQL5: Mean Squared Error: 49.7781425390165566 LarsCV (EURUSD,H1) LarsCV (EURUSD,H1) Testing ONNX double: LarsCV (lars_cv_double.onnx) LarsCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642612767 LarsCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379221400145 LarsCV (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401721031642
Comparison with the original double precision model in Python:
Testing ONNX float: LarsCV (lars_cv_float.onnx) Python Mean Absolute Error: 6.3477379221400145 MQL5: Mean Absolute Error: 6.3477378458460691 Testing ONNX double: LarsCV (lars_cv_double.onnx) Python Mean Absolute Error: 6.3477379221400145 MQL5: Mean Absolute Error: 6.3477379221400145
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 16 decimal places.
2.1.7.3. ONNX representation of the lars_cv_float.onnx and lars_cv_double.onnx
Fig.28. ONNX representation of the lars_cv_float.onnx in Netron
Fig.29. ONNX representation of the lars_cv_double.onnx in Netron
2.1.8. sklearn.linear_model.Lasso
Lasso (Least Absolute Shrinkage and Selection Operator) is a regression method used to select the most important features and reduce model dimensionality.
It achieves this by adding a penalty for the sum of the absolute values of the coefficients (L1 regularization) in the linear regression optimization problem.
Working Principle of Lasso:
- Input Data: It begins with the original dataset, including features (independent variables) and their corresponding target variable values.
- Objective Function: The objective function in Lasso includes the sum of squared regression errors and a penalty on the sum of the absolute values of coefficients associated with features.
- Optimization: The Lasso model is trained by minimizing the objective function, resulting in some coefficients becoming zero, effectively excluding the corresponding features from the model.
- Selecting the Optimal Penalty Value: Lasso includes a hyperparameter that determines the strength of regularization. Choosing the optimal value for this hyperparameter may require cross-validation.
- Generating Predictions: After training, the model can be used to predict target variable values for new data.
Advantages of Lasso:
- Feature Selection: Lasso automatically selects the most important features, excluding less significant ones from the model. This reduces data dimensionality and simplifies the model.
- Regularization: The penalty on the sum of the absolute values of coefficients helps prevent model overfitting and enhances its generalization.
- Interpretability: As Lasso excludes some features, the model remains relatively interpretable.
Limitations of Lasso:
- Linear Model: Lasso constructs a linear model, which might be insufficient for modeling complex nonlinear relationships.
- Noise Sensitivity: The method can be sensitive to outliers in the data.
- Inability to Handle Multicollinearity: If features are highly correlated, Lasso might encounter multicollinearity problems.
Lasso is useful in regression tasks where selecting the most important features and reducing the model's dimensionality while maintaining interpretability are essential.
2.1.8.1. Code for creating the Lasso model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.Lasso model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX..
# The code demonstrates the process of training Lasso model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "Lasso"
onnx_model_filename = data_path + "lasso"
# create a Lasso model
lasso_model = Lasso()
# fit the model to the data
lasso_model.fit(X, y)
# predict values for the entire dataset
y_pred = lasso_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(lasso_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(lasso_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python Lasso Original model (double) Python R-squared (Coefficient of determination): 0.9962381735682287 Python Mean Absolute Error: 6.346393791922984 Python Mean Squared Error: 49.77934029129379 Python Python Lasso ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962381720269486 Python Mean Absolute Error: 6.346395056911361 Python Mean Squared Error: 49.77936068668213 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 5 Python MSE matching decimal places: 4 Python float ONNX model precision: 5 Python Python Lasso ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962381735682287 Python Mean Absolute Error: 6.346393791922984 Python Mean Squared Error: 49.77934029129379 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.30. Results of the Lasso.py (float ONNX)
2.1.8.2. MQL5 code for executing ONNX Models
This code executes the saved lasso_float.onnx and lasso_double.onnx and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| Lasso.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "Lasso" #define ONNXFilenameFloat "lasso_float.onnx" #define ONNXFilenameDouble "lasso_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
Lasso (EURUSD,H1) Testing ONNX float: Lasso (lasso_float.onnx) Lasso (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962381720269486 Lasso (EURUSD,H1) MQL5: Mean Absolute Error: 6.3463950569113612 Lasso (EURUSD,H1) MQL5: Mean Squared Error: 49.7793606866821037 Lasso (EURUSD,H1) Lasso (EURUSD,H1) Testing ONNX double: Lasso (lasso_double.onnx) Lasso (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962381735682287 Lasso (EURUSD,H1) MQL5: Mean Absolute Error: 6.3463937919229840 Lasso (EURUSD,H1) MQL5: Mean Squared Error: 49.7793402912937850
Comparison with the original double model in Python:
Testing ONNX float: Lasso (lasso_float.onnx) Python Mean Absolute Error: 6.346393791922984 MQL5: Mean Absolute Error: 6.3463950569113612 Testing ONNX double: Lasso (lasso_double.onnx) Python Mean Absolute Error: 6.346393791922984 MQL5: Mean Absolute Error: 6.3463937919229840
Accuracy of ONNX float MAE: 5 decimal places, Accuracy of ONNX double MAE: 15 decimal places.
2.1.8.3. ONNX representation of the lasso_float.onnx and lasso_double.onnx
Fig.31. ONNX representation of the lasso_float.onnx in Netron
Fig.32. ONNX representation of the lasso_double.onnx in Netron
2.1.9. sklearn.linear_model.LassoCV
LassoCV is a variant of the Lasso method (Least Absolute Shrinkage and Selection Operator) that automatically selects the optimal value for the regularization hyperparameter (alpha) using cross-validation.
This method enables finding a balance between reducing the model's dimensionality (selecting important features) and preventing overfitting, making it useful for regression tasks.
Working Principle of LassoCV:
- Input Data: It starts with the original dataset, including features (independent variables) and their corresponding target variable values.
- Initialization: LassoCV initializes several different values of the regularization hyperparameter (alpha) that cover a range from low to high.
- Cross-Validation: For each alpha value, LassoCV performs cross-validation to assess the model's performance. Metrics like mean squared error (MSE) or coefficient of determination (R^2) are commonly used.
- Selecting the Optimal Alpha: LassoCV selects the alpha value where the model achieves the best performance as determined by cross-validation.
- Model Training: The Lasso model is trained using the chosen alpha value, excluding less important features and applying L1 regularization.
- Generating Predictions: After training, the model can be used to predict target variable values for new data.
Advantages of LassoCV:
- Automatic Alpha Selection: LassoCV automatically selects the optimal alpha value using cross-validation, simplifying model tuning.
- Feature Selection: LassoCV automatically chooses the most important features, reducing the model's dimensionality and simplifying its interpretation.
- Regularization: The method prevents model overfitting through L1 regularization.
Limitations of LassoCV:
- Linear Model: LassoCV builds a linear model, which might be insufficient for modeling complex nonlinear relationships.
- Noise Sensitivity: The method can be sensitive to outliers in the data.
- Inability to Handle Multicollinearity: When features are highly correlated, LassoCV might face multicollinearity problems.
LassoCV is beneficial in regression tasks where selecting the most important features and reducing the model's dimensionality while maintaining interpretability and preventing overfitting are important.
2.1.9.1. Code for creating the LassoCV model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.LassoCV model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training LassoCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "LassoCV"
onnx_model_filename = data_path + "lasso_cv"
# create a LassoCV Regressor model
lassocv_regressor_model = LassoCV()
# fit the model to the data
lassocv_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = lassocv_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(lassocv_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(lassocv_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python LassoCV Original model (double) Python R-squared (Coefficient of determination): 0.9962241428413416 Python Mean Absolute Error: 6.33567334453819 Python Mean Squared Error: 49.96500551028169 Python Python LassoCV ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_cv_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.996224142876629 Python Mean Absolute Error: 6.335673221332177 Python Mean Squared Error: 49.96500504333324 Python R^2 matching decimal places: 10 Python MAE matching decimal places: 6 Python ONNX: MSE matching decimal places: 6 Python float ONNX model precision: 6 Python Python LassoCV ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_cv_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962241428413416 Python Mean Absolute Error: 6.33567334453819 Python Mean Squared Error: 49.96500551028169 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 14 Python MSE matching decimal places: 14 Python double ONNX model precision: 14
Fig.33. Results of the LassoCV.py (float ONNX)
2.1.9.2. MQL5 code for executing ONNX Models
This code executes the saved lasso_cv_float.onnx and lasso_cv_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| LassoCV.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "LassoCV" #define ONNXFilenameFloat "lasso_cv_float.onnx" #define ONNXFilenameDouble "lasso_cv_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
2023.10.26 22:14:00.736 LassoCV (EURUSD,H1) Testing ONNX float: LassoCV (lasso_cv_float.onnx) 2023.10.26 22:14:00.739 LassoCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962241428766290 2023.10.26 22:14:00.739 LassoCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3356732213321800 2023.10.26 22:14:00.739 LassoCV (EURUSD,H1) MQL5: Mean Squared Error: 49.9650050433332211 2023.10.26 22:14:00.748 LassoCV (EURUSD,H1) 2023.10.26 22:14:00.748 LassoCV (EURUSD,H1) Testing ONNX double: LassoCV (lasso_cv_double.onnx) 2023.10.26 22:14:00.753 LassoCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962241428413416 2023.10.26 22:14:00.753 LassoCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3356733445381899 2023.10.26 22:14:00.753 LassoCV (EURUSD,H1) MQL5: Mean Squared Error: 49.9650055102816992
Comparison with the original double model in Python:
Testing ONNX float: LassoCV (lasso_cv_float.onnx) Python Mean Absolute Error: 6.33567334453819 MQL5: Mean Absolute Error: 6.3356732213321800 Testing ONNX double: LassoCV (lasso_cv_double.onnx) Python Mean Absolute Error: 6.33567334453819 MQL5: Mean Absolute Error: 6.3356733445381899
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.9.3. ONNX representation of the lasso_cv_float.onnx and lasso_cv_double.onnx
Fig.34. ONNX representation of the lasso_cv_float.onnx in Netron
Fig.35. ONNX representation of the lasso_cv_double.onnx in Netron
2.1.10. sklearn.linear_model.LassoLars
LassoLars is a combination of two methods: Lasso (Least Absolute Shrinkage and Selection Operator) and LARS (Least Angle Regression).
This method is used for regression tasks and combines the advantages of both algorithms, allowing simultaneous feature selection and model dimensionality reduction.
Working Principle of LassoLars:
- Input Data: It starts with the original dataset, including features (independent variables) and their corresponding target variable values.
- Initialization: LassoLars begins with a null model, meaning no active features. All coefficients are set to zero.
- Stepwise Feature Selection: Similar to the LARS method, LassoLars selects, at each step, the feature most correlated with the model residuals and adds it to the model. Then, the coefficient of this feature is adjusted using the least squares method.
- Application of L1 Regularization: Simultaneously with stepwise feature selection, LassoLars applies L1 regularization, adding a penalty for the sum of the absolute values of coefficients. This allows modeling complex relationships and choosing the most important features.
- Making Predictions: After training, the model can be used to predict target variable values for new data.
Advantages of LassoLars:
- Feature Selection: LassoLars automatically selects the most important features and reduces the model's dimensionality, aiding in avoiding overfitting and simplifying interpretation.
- Interpretability: The method maintains the model's interpretability, making it easy to determine which features are included and how they influence the target variable.
- Regularization: LassoLars applies L1 regularization, preventing overfitting and enhancing the model's generalization.
Limitations of LassoLars:
- Linear Model: LassoLars builds a linear model, which might be insufficient for modeling complex nonlinear relationships.
- Sensitivity to Noise: The method might be sensitive to outliers in the data.
- Computational Complexity: Feature selection at each step and applying regularization might require more computational resources than simple linear regression.
LassoLars is useful in regression tasks where it's important to choose the most important features, reduce the model's dimensionality, and maintain interpretability.
2.1.10.1. Code for creating the LassoLars model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.LassoLars model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training LassoLars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLars
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "LassoLars"
onnx_model_filename = data_path + "lasso_lars"
# create a LassoLars Regressor model
lassolars_regressor_model = LassoLars(alpha=0.1)
# fit the model to the data
lassolars_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = lassolars_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(lassolars_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(lassolars_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python LassoLars Original model (double) Python R-squared (Coefficient of determination): 0.9962382633544077 Python Mean Absolute Error: 6.3476035128950805 Python Mean Squared Error: 49.778152172481896 Python Python LassoLars ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382635045889 Python Mean Absolute Error: 6.3476034814795375 Python Mean Squared Error: 49.77815018516975 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python LassoLars ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382633544077 Python Mean Absolute Error: 6.3476035128950805 Python Mean Squared Error: 49.778152172481896 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 16 Python MSE matching decimal places: 15 Python double ONNX model precision: 16
Fig.36. Result of the LassoLars.py (float)
2.1.10.2. MQL5 code for executing ONNX Models
This code executes the saved lasso_lars_float.onnx and lasso_lars_double.onnx and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| LassoLars.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "LassoLars" #define ONNXFilenameFloat "lasso_lars_float.onnx" #define ONNXFilenameDouble "lasso_lars_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
LassoLars (EURUSD,H1) Testing ONNX float: LassoLars (lasso_lars_float.onnx) LassoLars (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382635045889 LassoLars (EURUSD,H1) MQL5: Mean Absolute Error: 6.3476034814795375 LassoLars (EURUSD,H1) MQL5: Mean Squared Error: 49.7781501851697357 LassoLars (EURUSD,H1) LassoLars (EURUSD,H1) Testing ONNX double: LassoLars (lasso_lars_double.onnx) LassoLars (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382633544077 LassoLars (EURUSD,H1) MQL5: Mean Absolute Error: 6.3476035128950858 LassoLars (EURUSD,H1) MQL5: Mean Squared Error: 49.7781521724819029
Comparison with the original double model in Python:
Testing ONNX float: LassoLars (lasso_lars_float.onnx) Python Mean Absolute Error: 6.3476035128950805 MQL5: Mean Absolute Error: 6.3476034814795375 Testing ONNX double: LassoLars (lasso_lars_double.onnx) Python Mean Absolute Error: 6.3476035128950805 MQL5: Mean Absolute Error: 6.3476035128950858
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.10.3. ONNX representation of the lasso_lars_float.onnx and lasso_lars_double.onnx
Fig.37. ONNX representation of the lasso_lars_float.onnx in Netron
Fig.38. ONNX representation of the lasso_lars_double.onnx in Netron
2.1.11. sklearn.linear_model.LassoLarsCV
LassoLarsCV is a method that combines Lasso (Least Absolute Shrinkage and Selection Operator) and LARS (Least Angle Regression) with automatic selection of the optimal regularization hyperparameter (alpha) using cross-validation.
This method combines the advantages of both algorithms and allows determining the optimal alpha value for the model, considering feature selection and regularization.
Working Principle of LassoLarsCV:
- Input Data: It starts with the original dataset, including features (independent variables) and their corresponding target variable values.
- Initialization: LassoLarsCV begins with a null model, where all coefficients are set to zero.
- Definition of Alpha Range: A range of values for the hyperparameter alpha is determined, which will be considered during the selection process. Usually, a logarithmic scale of alpha values is used.
- Cross-Validation: For each alpha value from the chosen range, LassoLarsCV performs cross-validation to evaluate the model's performance with this alpha value. Typically, metrics like mean squared error (MSE) or coefficient of determination (R^2) are used.
- Selection of Optimal Alpha: LassoLarsCV chooses the alpha value where the model achieves the best performance based on the cross-validation results.
- Model Training: The LassoLars model is trained using the selected alpha value, excluding less important features and applying L1 regularization.
- Making Predictions: After training, the model can be used to predict target variable values for new data.
Advantages of LassoLarsCV:
- Automatic Alpha Selection: LassoLarsCV automatically selects the optimal hyperparameter alpha using cross-validation, simplifying model tuning.
- Feature Selection: LassoLarsCV automatically chooses the most important features and reduces the model's dimensionality.
- Regularization: The method applies L1 regularization, preventing overfitting and enhancing the model's generalization.
Limitations of LassoLarsCV:
- Linear Model: LassoLarsCV builds a linear model, which might be insufficient for modeling complex nonlinear relationships.
- Sensitivity to Noise: The method might be sensitive to outliers in the data.
- Computational Complexity: Feature selection at each step and applying regularization might require more computational resources than simple linear regression.
LassoLarsCV is useful in regression tasks where it's essential to choose the most important features, reduce the model's dimensionality, prevent overfitting, and automatically tune the model's hyperparameters.
2.1.11.1. Code for creating the LassoLarsCV model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.LassoLarsCV model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training LassoLars model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLarsCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "LassoLarsCV"
onnx_model_filename = data_path + "lasso_lars_cv"
# create a LassoLarsCV Regressor model
lassolars_cv_regressor_model = LassoLarsCV(cv=5)
# fit the model to the data
lassolars_cv_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = lassolars_cv_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(lassolars_cv_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(lassolars_cv_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python LassoLarsCV Original model (double) Python R-squared (Coefficient of determination): 0.9962382642612767 Python Mean Absolute Error: 6.3477379221400145 Python Mean Squared Error: 49.77814017210321 Python Python LassoLarsCV ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_cv_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382640824089 Python Mean Absolute Error: 6.347737845846069 Python Mean Squared Error: 49.778142539016564 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python LassoLarsCV ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_cv_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642612767 Python Mean Absolute Error: 6.3477379221400145 Python Mean Squared Error: 49.77814017210321 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 16 Python MSE matching decimal places: 14 Python double ONNX model precision: 16
Fig.39. Results of the LassoLarsCV.py (float ONNX)
2.1.11.2. MQL5 code for executing ONNX Models
This code executes the saved lasso_lars_cv_float.onnx and lasso_lars_cv_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| LassoLarsCV.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "LassoLarsCV" #define ONNXFilenameFloat "lasso_lars_cv_float.onnx" #define ONNXFilenameDouble "lasso_lars_cv_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
LassoLarsCV (EURUSD,H1) Testing ONNX float: LassoLarsCV (lasso_lars_cv_float.onnx) LassoLarsCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382640824089 LassoLarsCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477378458460691 LassoLarsCV (EURUSD,H1) MQL5: Mean Squared Error: 49.7781425390165566 LassoLarsCV (EURUSD,H1) LassoLarsCV (EURUSD,H1) Testing ONNX double: LassoLarsCV (lasso_lars_cv_double.onnx) LassoLarsCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642612767 LassoLarsCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379221400145 LassoLarsCV (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401721031642
Comparison with the original double model in Python:
Testing ONNX float: LassoLarsCV (lasso_lars_cv_float.onnx) Python Mean Absolute Error: 6.3477379221400145 MQL5: Mean Absolute Error: 6.3477378458460691 Testing ONNX double: LassoLarsCV (lasso_lars_cv_double.onnx) Python Mean Absolute Error: 6.3477379221400145 MQL5: Mean Absolute Error: 6.3477379221400145
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 16 decimal places.
2.1.11.3. ONNX representation of the lasso_lars_cv_float.onnx and lasso_lars_cv_double.onnx
Fig.40. ONNX representation of the lasso_lars_cv_float.onnx in Netron
Fig.41. ONNX representation of the lasso_lars_cv_double.onnx in Netron
2.1.12. sklearn.linear_model.LassoLarsIC
LassoLarsIC is a regression method that combines Lasso (Least Absolute Shrinkage and Selection Operator) and Information Criterion (IC) to automatically select the optimal set of features.
It utilizes information criteria such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) to determine which features to include in the model and applies L1 regularization to estimate the model coefficients.
Working Principle of LassoLarsIC:
- Input Data: It starts with the original dataset, including features (independent variables) and their corresponding target variable values.
- Initialization: LassoLarsIC begins with a null model, meaning no active features. All coefficients are set to zero.
- Feature Selection using Information Criterion: The method assesses the information criterion (e.g., AIC or BIC) for different feature sets, starting from an empty model and gradually incorporating features into the model. The information criterion evaluates the model's quality, considering the trade-off between fitting the data and model complexity.
- Selection of Optimal Feature Set: LassoLarsIC chooses the feature set for which the information criterion achieves the best value. This feature set will be included in the model.
- Application of L1 Regularization: L1 regularization is applied to the selected features, aiding in the estimation of model coefficients.
- Making Predictions: After training, the model can be used to predict target variable values for new data.
Advantages of LassoLarsIC:
- Automatic Feature Selection: LassoLarsIC automatically chooses the optimal feature set, reducing the model's dimensionality and preventing overfitting.
- Information Criteria: The use of information criteria allows for balancing model quality and complexity.
- Regularization: The method applies L1 regularization, preventing overfitting and enhancing the model's generalization.
Limitations of LassoLarsIC:
- Linear Model: LassoLarsIC builds a linear model, which may be insufficient for modeling complex nonlinear relationships..
- Sensitivity to Noise: The method might be sensitive to outliers in the data.
- Computational Complexity: Evaluating information criteria for various feature sets might require additional computational resources.
LassoLarsIC is valuable in regression tasks where automatically selecting the best feature set and reducing the model's dimensionality based on information criteria is crucial.
2.1.12.1. Code for creating the LassoLarsIC model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.LassoLarsIC model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training LassoLarsIC model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoLarsIC
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name="LassoLarsIC"
onnx_model_filename = data_path + "lasso_lars_ic"
# create a LassoLarsIC Regressor model
lasso_lars_ic_regressor_model = LassoLarsIC(criterion='aic')
# fit the model to the data
lasso_lars_ic_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = lasso_lars_ic_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(lasso_lars_ic_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(lasso_lars_ic_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python LassoLarsIC Original model (double) Python R-squared (Coefficient of determination): 0.9962382642613388 Python Mean Absolute Error: 6.347737926336425 Python Mean Squared Error: 49.778140171281784 Python Python LassoLarsIC ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_ic_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382641628886 Python Mean Absolute Error: 6.3477377671679385 Python Mean Squared Error: 49.77814147404787 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python LassoLarsIC ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\lasso_lars_ic_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642613388 Python Mean Absolute Error: 6.347737926336425 Python Mean Squared Error: 49.778140171281784 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 15 Python double ONNX model precision: 15
Fig.42. Results of the LassoLarsIC.py (float ONNX)
2.1.12.2. MQL5 code for executing ONNX Models
This code executes the saved lasso_lars_ic_float.onnx and lasso_lars_ic_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| LassoLarsIC.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "LassoLarsIC" #define ONNXFilenameFloat "lasso_lars_ic_float.onnx" #define ONNXFilenameDouble "lasso_lars_ic_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
LassoLarsIC (EURUSD,H1) Testing ONNX float: LassoLarsIC (lasso_lars_ic_float.onnx) LassoLarsIC (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382641628886 LassoLarsIC (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477377671679385 LassoLarsIC (EURUSD,H1) MQL5: Mean Squared Error: 49.7781414740478638 LassoLarsIC (EURUSD,H1) LassoLarsIC (EURUSD,H1) Testing ONNX double: LassoLarsIC (lasso_lars_ic_double.onnx) LassoLarsIC (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642613388 LassoLarsIC (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379263364302 LassoLarsIC (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401712817768
Comparison with the original double precision model in Python:
Testing ONNX float: LassoLarsIC (lasso_lars_ic_float.onnx) Python Mean Absolute Error: 6.347737926336425 MQL5: Mean Absolute Error: 6.3477377671679385 Testing ONNX double: LassoLarsIC (lasso_lars_ic_double.onnx) Python Mean Absolute Error: 6.347737926336425 MQL5: Mean Absolute Error: 6.3477379263364302
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.12.3. ONNX representation of the lasso_lars_ic_float.onnx and lasso_lars_ic_double.onnx
Fig.43. ONNX representation of the lasso_lars_ic_float.onnx in Netron
Fig.44. ONNX representation of the lasso_lars_ic_double.onnx in Netron
2.1.13. sklearn.linear_model.LinearRegression
LinearRegression is one of the simplest and most widely used methods in machine learning for regression tasks.
It's used to build linear models that predict numerical values (continuous) of the target variable based on a linear combination of input features.
Working Principle of LinearRegression:
- Linear Model: The LinearRegression model assumes that there exists a linear relationship between independent variables (features) and the target variable. This relationship can be expressed by the linear regression equation:y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ, where y is the target variable, β₀ -is the intercept coefficient, β₁, β₂, ... βₚ - are the feature coefficients, x₁, x₂, ... xₚ are the feature values.
- Parameter Estimation: The goal of LinearRegression is to estimate the coefficients β₀, β₁, β₂, ... βₚ, that best fit the data. This is typically achieved using the Ordinary Least Squares (OLS) method, minimizing the sum of squared differences between actual and predicted values.
- Model Evaluation: Various metrics such as Mean Squared Error (MSE), Coefficient of Determination (R²), among others, are used to assess the quality of the LinearRegression model.
Advantages of LinearRegression:
- Simplicity and Interpretability: LinearRegression is a simple method with easy interpretability, allowing the analysis of the influence of each feature on the target variable.
- High Training and Prediction Speed: The linear regression model has high training and prediction speeds, making it a good choice for large datasets.
- Applicability: LinearRegression can be successfully applied to diverse regression tasks.
Limitations of LinearRegression:
- Linearity: This method assumes linearity in the relationship between features and the target variable, which might be insufficient for modeling complex nonlinear dependencies.
- Sensitivity to Outliers: LinearRegression is sensitive to outliers in the data, which can affect the model's quality.
LinearRegression is a simple and widely used regression method that constructs a linear model to predict numerical values of the target variable based on a linear combination of input features. It is well-suited for problems with a linear relationship and when model interpretability is important.
2.1.13.1. Code for creating the LinearRegression model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.LinearRegression model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training LinearRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "LinearRegression"
onnx_model_filename = data_path + "linear_regression"
# create a Linear Regression model
linear_model = LinearRegression()
# fit the model to the data
linear_model.fit(X, y)
# predict values for the entire dataset
y_pred = linear_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(linear_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(linear_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python LinearRegression Original model (double) Python R-squared (Coefficient of determination): 0.9962382642613388 Python Mean Absolute Error: 6.347737926336427 Python Mean Squared Error: 49.77814017128179 Python Python LinearRegression ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_regression_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382641628886 Python Mean Absolute Error: 6.3477377671679385 Python Mean Squared Error: 49.77814147404787 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python ONNX: MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python LinearRegression ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_regression_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642613388 Python Mean Absolute Error: 6.347737926336427 Python Mean Squared Error: 49.77814017128179 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.45.Results of the LinearRegression.py (float ONNX)
2.1.13.2. MQL5 code for executing ONNX Models
This code executes the saved linear_regression_float.onnx and linear_regression_double.onnx and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| LinearRegression.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "LinearRegression" #define ONNXFilenameFloat "linear_regression_float.onnx" #define ONNXFilenameDouble "linear_regression_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
LinearRegression (EURUSD,H1) Testing ONNX float: LinearRegression (linear_regression_float.onnx) LinearRegression (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382641628886 LinearRegression (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477377671679385 LinearRegression (EURUSD,H1) MQL5: Mean Squared Error: 49.7781414740478638 LinearRegression (EURUSD,H1) LinearRegression (EURUSD,H1) Testing ONNX double: LinearRegression (linear_regression_double.onnx) LinearRegression (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642613388 LinearRegression (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379263364266 LinearRegression (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401712817768
Comparison with the original double precision model in Python:
Testing ONNX float: LinearRegression (linear_regression_float.onnx) Python Mean Absolute Error: 6.347737926336427 MQL5: Mean Absolute Error: 6.3477377671679385 Testing ONNX double: LinearRegression (linear_regression_double.onnx) Python Mean Absolute Error: 6.347737926336427 MQL5: Mean Absolute Error: 6.3477379263364266
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.13.3. ONNX representation of the linear_regression_float.onnx and linear_regression_double.onnx
Fig.46. ONNX representation of the linear_regression_float.onnx in Netron
Fig.47. ONNX representation of the linear_regression_double.onnx in Netron
Note on Ridge and RidgeCV Methods
Ridge and RidgeCV are two related methods in machine learning used for regularization in Ridge regression. They share similar functionality but differ in their usage and parameter tuning.
Working Principle of Ridge (Ridge Regression):
- Ridge is a regression method involving L2 regularization. It means that it adds the sum of squared coefficients (L2 norm) to the loss function minimized by the model. This additional regularization term helps reduce the magnitudes of the model's coefficients, thus preventing overfitting.
- Use of the alpha parameter: In the Ridge method, the alpha parameter (also known as regularization strength) is pre-set and not automatically altered. Users need to select a suitable alpha value based on their knowledge of the data and experiments.
Working Principle of RidgeCV (Ridge Cross-Validation):
- RidgeCV is an extension of the Ridge method, which involves automatically selecting the optimal value for the alpha parameter using cross-validation. Instead of manually setting alpha, RidgeCV iterates through different alpha values and chooses the one providing the best performance in cross-validation.
- Advantage of automatic tuning: The primary advantage of RidgeCV is its automatic determination of the optimal alpha value without the need for manual adjustment. This makes the tuning process more convenient and prevents potential errors in alpha selection.
The key difference between Ridge and RidgeCV is that Ridge requires users to explicitly specify the alpha parameter value, whereas RidgeCV automatically finds the optimal alpha value using cross-validation. RidgeCV is typically a more preferred choice when dealing with a large amount of data and aiming to avoid manual parameter tuning.
2.1.14. sklearn.linear_model.Ridge
Ridge is a regression method used in machine learning to solve regression problems. It's part of the family of linear models and represents a regularized linear regression.
The main feature of Ridge regression is adding L2 regularization to the standard ordinary least squares (OLS) method.
How Ridge regression works:
- Linear regression: Similar to regular linear regression, Ridge regression aims to find a linear relationship between independent variables (features) and the target variable.
- L2 regularization: The primary distinction of Ridge regression is adding L2 regularization to the loss function. This means a penalty for large values of regression coefficients is added to the sum of squared differences between actual and predicted values.
- Penalizing coefficients: L2 regularization imposes a penalty on the values of regression coefficients. As a result, some coefficients tend to be closer to zero, reducing overfitting and enhancing model stability.
- Hyperparameter α: One of the essential parameters in Ridge regression is the hyperparameter α (alpha), determining the degree of regularization. Higher α values lead to stronger regularization, resulting in simpler models with lower coefficient values.
Advantages of Ridge regression:
- Reduction of overfitting: L2 regularization in Ridge helps reduce overfitting, making the model more robust against noise in the data.
- Handling multicollinearity: Ridge regression copes well with multicollinearity issues, particularly when features are highly correlated.
- Addressing the curse of dimensionality: Ridge helps in scenarios with many features, where OLS might be unstable.
Limitations of Ridge regression:
- Doesn't eliminate features: Ridge regression does not zero out feature coefficients, only reducing them, meaning some features might still remain in the model.
- Choosing optimal α: Selecting the correct value for the hyperparameter α may require cross-validation.
Ridge regression is a regression method that introduces L2 regularization to standard linear regression to reduce overfitting, enhance stability, and address multicollinearity issues. This method is useful when balancing accuracy and model stability is needed.
2.1.14.1. Code for creating the Ridge model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.Ridge model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX..
# The code demonstrates the process of training Ridge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "Ridge"
onnx_model_filename = data_path + "ridge"
# create a Ridge model
regression_model = Ridge()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python Ridge Original model (double) Python R-squared (Coefficient of determination): 0.9962382641178552 Python Mean Absolute Error: 6.347684462929819 Python Mean Squared Error: 49.77814206996523 Python Python Ridge ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382634837793 Python Mean Absolute Error: 6.347684915729416 Python Mean Squared Error: 49.77815046053819 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 6 Python MSE matching decimal places: 4 Python float ONNX model precision: 6 Python Python Ridge ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382641178552 Python Mean Absolute Error: 6.347684462929819 Python Mean Squared Error: 49.77814206996523 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.49. Results of the Ridge.py (float ONNX)
2.1.14.2. MQL5 code for executing ONNX Models
This code executes the saved ridge_float.onnx and ridge_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| Ridge.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "Ridge" #define ONNXFilenameFloat "ridge_float.onnx" #define ONNXFilenameDouble "ridge_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
Ridge (EURUSD,H1) Testing ONNX float: Ridge (ridge_float.onnx) Ridge (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382634837793 Ridge (EURUSD,H1) MQL5: Mean Absolute Error: 6.3476849157294160 Ridge (EURUSD,H1) MQL5: Mean Squared Error: 49.7781504605381784 Ridge (EURUSD,H1) Ridge (EURUSD,H1) Testing ONNX double: Ridge (ridge_double.onnx) Ridge (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382641178552 Ridge (EURUSD,H1) MQL5: Mean Absolute Error: 6.3476844629298235 Ridge (EURUSD,H1) MQL5: Mean Squared Error: 49.7781420699652131
Comparison with the original double precision model in Python:
Testing ONNX float: Ridge (ridge_float.onnx) Python Mean Absolute Error: 6.347684462929819 MQL5: Mean Absolute Error: 6.3476849157294160 Testing ONNX double: Ridge (ridge_double.onnx) Python Mean Absolute Error: 6.347684462929819 MQL5: Mean Absolute Error: 6.3476844629298235
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.14.3. ONNX representation of the ridge_float.onnx and ridge_double.onnx
Fig.50. ONNX representation of the ridge_float.onnx in Netron
Fig.51. ONNX representation of the ridge_double.onnx in Netron
2.1.15. sklearn.linear_model.RidgeCV
RidgeCV - is an extension of Ridge regression that includes automatic selection of the best hyperparameter α (alpha), which determines the degree of regularization in Ridge regression. The hyperparameter α controls the balance between minimizing the sum of squared errors (as in ordinary linear regression) and minimizing the value of regression coefficients (regularization). RidgeCV automatically selects the optimal value of α based on specified parameters and criteria.
How RidgeCV works:
- Input data: RidgeCV takes input data consisting of features (independent variables) and the target variable (continuous).
- Choosing α: Ridge regression requires the selection of the hyperparameter α, which determines the degree of regularization. RidgeCV automatically selects the optimal value of α from the given range.
- Cross-validation: RidgeCV uses cross-validation, such as k-fold cross-validation, to assess which α value provides the best model generalization on independent data.
- Optimal α: Upon completing the training process, RidgeCV chooses the α value that delivers the best performance in cross-validation and uses this value to train the final Ridge regression model.
Advantages of RidgeCV:
- Automatic selection of α: RidgeCV allows for automatic selection of the optimal value of the hyperparameter α, simplifying the model tuning process.
- Balance between regularization and performance: This method helps find the optimal balance between regularization (reducing overfitting) and model performance.
Limitations of RidgeCV:
- Computational complexity: Cross-validation may require significant computational resources, especially when using a large range of α values.
RidgeCV is a Ridge regression method with automatic selection of the optimal hyperparameter α using cross-validation. This method streamlines the hyperparameter selection process and enables finding the best balance between regularization and model performance.
2.1.15.1. Code for creating the RidgeCV model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.RidgeCV model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training RidgeCV model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "RidgeCV"
onnx_model_filename = data_path + "ridge_cv"
# create a RidgeCV model
regression_model = RidgeCV()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python RidgeCV Original model (double) Python R-squared (Coefficient of determination): 0.9962382499160807 Python Mean Absolute Error: 6.34720334999352 Python Mean Squared Error: 49.77832999861571 Python Python RidgeCV ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_cv_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382499108485 Python Mean Absolute Error: 6.3472036427935485 Python Mean Squared Error: 49.77833006785168 Python R^2 matching decimal places: 11 Python MAE matching decimal places: 6 Python MSE matching decimal places: 4 Python float ONNX model precision: 6 Python Python RidgeCV ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ridge_cv_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382499160807 Python Mean Absolute Error: 6.34720334999352 Python Mean Squared Error: 49.77832999861571 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 14 Python MSE matching decimal places: 14 Python double ONNX model precision: 14
Fig.52. Results of the RidgeCV.py (float ONNX)
2.1.15.2. MQL5 code for executing ONNX Models
This code executes the saved ridge_cv_float.onnx and ridge_cv_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| RidgeCV.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "RidgeCV" #define ONNXFilenameFloat "ridge_cv_float.onnx" #define ONNXFilenameDouble "ridge_cv_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
RidgeCV (EURUSD,H1) Testing ONNX float: RidgeCV (ridge_cv_float.onnx) RidgeCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382499108485 RidgeCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3472036427935485 RidgeCV (EURUSD,H1) MQL5: Mean Squared Error: 49.7783300678516909 RidgeCV (EURUSD,H1) RidgeCV (EURUSD,H1) Testing ONNX double: RidgeCV (ridge_cv_double.onnx) RidgeCV (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382499160807 RidgeCV (EURUSD,H1) MQL5: Mean Absolute Error: 6.3472033499935216 RidgeCV (EURUSD,H1) MQL5: Mean Squared Error: 49.7783299986157246
Comparison with the original double precision model in Python:
Testing ONNX float: RidgeCV (ridge_cv_float.onnx) Python Mean Absolute Error: 6.34720334999352 MQL5: Mean Absolute Error: 6.3472036427935485 Testing ONNX double: RidgeCV (ridge_cv_double.onnx) Python Mean Absolute Error: 6.34720334999352 MQL5: Mean Absolute Error: 6.3472033499935216
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.15.3. ONNX representation of the ridge_cv_float.onnx and ridge_cv_double.onnx
Fig.53. ONNX representation of the ridge_cv_float.onnx in Netron
Fig.54. ONNX representation of the ridge_cv_double.onnx in Netron
2.1.16. sklearn.linear_model.OrthogonalMatchingPursuit
OrthogonalMatchingPursuit (OMP) is an algorithm used to solve feature selection and linear regression problems.
It is one of the methods for selecting the most significant features, which can be helpful in reducing data dimensionality and improving the model's generalization ability.
How OrthogonalMatchingPursuit works:
- Input data: It begins with a dataset containing features (independent variables) and values of the target variable (continuous).
- Selecting the number of features: One of the initial steps when using OrthogonalMatchingPursuit is determining the number of features you want to include in the model. This number can be predefined or chosen using criteria such as the Akaike Information Criterion (AIC) or minimum error criteria.
- Iterative feature addition: The algorithm starts with an empty model and iteratively adds features that best explain the model's residuals. In each iteration, a new feature is chosen to be orthogonal to the previously selected features. The optimal feature is selected based on its correlation with the model residuals.
- Model training: After adding the specified number of features, the model is trained on the data considering only these selected features.
- Making predictions: After training, the model can predict the values of the target variable on new data.
Advantages of OrthogonalMatchingPursuit:
- Dimensionality reduction: OMP can reduce the data dimensionality by selecting only the most informative features.
- Interpretability: Because OMP selects only a small number of features, models created using it can be more interpretable.
Limitations of OrthogonalMatchingPursuit:
- Sensitivity to the number of selected features: The number of selected features needs to be properly tuned, and incorrect choices may lead to overfitting or underfitting.
- Does not consider multicollinearity: OMP may not account for multicollinearity between features, which could impact the selection of optimal features.
- Computational complexity: OMP is computationally expensive, especially for large datasets.
OrthogonalMatchingPursuit is an algorithm for feature selection and linear regression, allowing the selection of the most informative features for the model. This method can be valuable for reducing data dimensionality and improving model interpretability.
2.1.16.1. Code for creating the OrthogonalMatchingPursuit model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.OrthogonalMatchingPursuit model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training OrthogonalMatchingPursuit model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import OrthogonalMatchingPursuit
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "OrthogonalMatchingPursuit"
onnx_model_filename = data_path + "orthogonal_matching_pursuit"
# create an OrthogonalMatchingPursuit model
regression_model = OrthogonalMatchingPursuit()
# fit the model to the data
regression_model.fit(X, y)
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python OrthogonalMatchingPursuit Original model (double) Python R-squared (Coefficient of determination): 0.9962382642613388 Python Mean Absolute Error: 6.3477379263364275 Python Mean Squared Error: 49.778140171281784 Python Python OrthogonalMatchingPursuit ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\orthogonal_matching_pursuit_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382641628886 Python Mean Absolute Error: 6.3477377671679385 Python Mean Squared Error: 49.77814147404787 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python OrthogonalMatchingPursuit ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\orthogonal_matching_pursuit_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642613388 Python Mean Absolute Error: 6.3477379263364275 Python Mean Squared Error: 49.778140171281784 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 16 Python MSE matching decimal places: 15 Python double ONNX model precision: 16
Fig.55. Results of the OrthogonalMatchingPursuit.py (float ONNX)
2.1.16.2. MQL5 code for executing ONNX Models
This code executes the saved orthogonal_matching_pursuit_float.onnx and orthogonal_matching_pursuit_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| OrthogonalMatchingPursuit.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "OrthogonalMatchingPursuit" #define ONNXFilenameFloat "orthogonal_matching_pursuit_float.onnx" #define ONNXFilenameDouble "orthogonal_matching_pursuit_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
OrthogonalMatchingPursuit (EURUSD,H1) Testing ONNX float: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_float.onnx) OrthogonalMatchingPursuit (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382641628886 OrthogonalMatchingPursuit (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477377671679385 OrthogonalMatchingPursuit (EURUSD,H1) MQL5: Mean Squared Error: 49.7781414740478638 OrthogonalMatchingPursuit (EURUSD,H1) OrthogonalMatchingPursuit (EURUSD,H1) Testing ONNX double: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_double.onnx) OrthogonalMatchingPursuit (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642613388 OrthogonalMatchingPursuit (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379263364275 OrthogonalMatchingPursuit (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401712817768
Comparison with the original double precision model in Python:
Testing ONNX float: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_float.onnx) Python Mean Absolute Error: 6.3477379263364275 MQL5: Mean Absolute Error: 6.3477377671679385 Testing ONNX double: OrthogonalMatchingPursuit (orthogonal_matching_pursuit_double.onnx) Python Mean Absolute Error: 6.3477379263364275 MQL5: Mean Absolute Error: 6.3477379263364275
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 16 decimal places.
2.1.16.3. ONNX representation of the orthogonal_matching_pursuit_float.onnx and orthogonal_matching_pursuit_double.onnx
Fig.56. ONNX representation of the orthogonal_matching_pursuit_float.onnx in Netron
Fig.57. ONNX representation of the orthogonal_matching_pursuit_double.onnx in Netron
2.1.17. sklearn.linear_model.PassiveAggressiveRegressor
PassiveAggressiveRegressor is a machine learning method used for regression tasks.
This method is a variant of the Passive-Aggressive (PA) algorithm that can be employed to train a model capable of predicting continuous values of the target variable.
How PassiveAggressiveRegressor works:
- Input data: It starts with a dataset comprising features (independent variables) and values of the target variable (continuous).
- Supervised learning: PassiveAggressiveRegressor is a supervised learning method trained on pairs (X, y), where X represents the features, and y corresponds to the target variable values.
- Adaptive learning: The primary idea behind the Passive-Aggressive method is the adaptive learning approach. The model learns by minimizing the prediction error on each training example. It updates by correcting the weights to reduce the prediction error.
- Parameter C: PassiveAggressiveRegressor has a hyperparameter C, which controls how strongly the model adapts to errors. A higher C value means more aggressive weight updates, while a lower C value makes the model less aggressive.
- Prediction: Once trained, the model can predict target variable values for new data.
Advantages of PassiveAggressiveRegressor:
- Adaptability: The method can adapt to changes in data and update the model to minimize prediction errors.
- Efficiency for large datasets: PassiveAggressiveRegressor can be an effective method for regression, particularly when trained on substantial volumes of data.
Limitations of PassiveAggressiveRegressor:
- Sensitivity to the choice of parameter C: Properly selecting the value of C may require tuning and experimentation.
- Additional features may be needed: In some cases, additional engineered features might be required for successful model training.
PassiveAggressiveRegressor is a machine learning method for regression tasks that learns adaptively by minimizing prediction errors on training data. This method can be valuable for handling large datasets and requires tuning the C parameter for optimal performance.
2.1.17.1. Code for creating the PassiveAggressiveRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.PassiveAggressiveRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training PassiveAggressiveRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import PassiveAggressiveRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "PassiveAggressiveRegressor"
onnx_model_filename = data_path + "passive_aggressive_regressor"
# create a PassiveAggressiveRegressor model
regression_model = PassiveAggressiveRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python PassiveAggressiveRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9894376841493092 Python Mean Absolute Error: 9.64524669506544 Python Mean Squared Error: 139.76857373191007 Python Python PassiveAggressiveRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\passive_aggressive_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9894376801868329 Python Mean Absolute Error: 9.645248834431873 Python Mean Squared Error: 139.76862616640122 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 5 Python MSE matching decimal places: 3 Python float ONNX model precision: 5 Python Python PassiveAggressiveRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\passive_aggressive_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9894376841493092 Python Mean Absolute Error: 9.64524669506544 Python Mean Squared Error: 139.76857373191007 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 14 Python MSE matching decimal places: 14 Python double ONNX model precision: 14
Fig.58. Results of the PassiveAggressiveRegressor.py (double ONNX)
2.1.17.2. MQL5 code for executing ONNX Models
This code executes the saved passive_aggressive_regressor_float.onnx and passive_aggressive_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| PassiveAggressiveRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "PassiveAggressiveRegressor" #define ONNXFilenameFloat "passive_aggressive_regressor_float.onnx" #define ONNXFilenameDouble "passive_aggressive_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
PassiveAggressiveRegressor (EURUSD,H1) Testing ONNX float: PassiveAggressiveRegressor (passive_aggressive_regressor_float.onnx) PassiveAggressiveRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9894376801868329 PassiveAggressiveRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 9.6452488344318716 PassiveAggressiveRegressor (EURUSD,H1) MQL5: Mean Squared Error: 139.7686261664012761 PassiveAggressiveRegressor (EURUSD,H1) PassiveAggressiveRegressor (EURUSD,H1) Testing ONNX double: PassiveAggressiveRegressor (passive_aggressive_regressor_double.onnx) PassiveAggressiveRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9894376841493092 PassiveAggressiveRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 9.6452466950654419 PassiveAggressiveRegressor (EURUSD,H1) MQL5: Mean Squared Error: 139.7685737319100667
Comparison with the original double precision model in Python:
Testing ONNX float: PassiveAggressiveRegressor (passive_aggressive_regressor_float.onnx) Python Mean Absolute Error: 9.64524669506544 MQL5: Mean Absolute Error: 9.6452488344318716 Testing ONNX double: PassiveAggressiveRegressor (passive_aggressive_regressor_double.onnx) Python Mean Absolute Error: 9.64524669506544 MQL5: Mean Absolute Error: 9.6452466950654419
Accuracy of ONNX float MAE: 5 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.17.3. ONNX representation of the passive_aggressive_regressor_float.onnx and passive_aggressive_regressor_double.onnx
Fig.59. ONNX representation of the passive_aggressive_regressor_float.onnx in Netron
Fig.60. ONNX representation of the passive_aggressive_regressor_double.onnx in Netron
2.1.18. sklearn.linear_model.QuantileRegressor
QuantileRegressor is a machine learning method used to estimate quantiles (specific percentiles) of the target variable in regression tasks.
Instead of predicting the mean value of the target variable, as typically done in regression tasks, QuantileRegressor predicts values corresponding to specified quantiles, such as the median (50th percentile) or the 25th and 75th percentiles.
How QuantileRegressor works:
- Input data: It begins with a dataset containing features (independent variables) and the target variable (continuous).
- Quantile focus: Instead of predicting exact values of the target variable, QuantileRegressor models the conditional distribution of the target variable and predicts values for certain quantiles of this distribution.
- Training for different quantiles: Training a QuantileRegressor model involves training separate models for each desired quantile. Each of these models predicts a value corresponding to its quantile.
- Quantile parameter: The main parameter for this method is the choice of desired quantiles for which you want to get predictions. For example, if you need predictions for the median, you'll need to train the model on the 50th percentile.
- Quantile prediction: After training, the model can be used to predict values corresponding to specified quantiles on new data.
Advantages of QuantileRegressor:
- Flexibility: QuantileRegressor provides flexibility in predicting various quantiles, which can be useful in tasks where different percentiles of the distribution are important.
- Robustness to outliers: A quantile-oriented approach can be robust against outliers as it does not consider the mean, which can be heavily influenced by extreme values.
Limitations of QuantileRegressor:
- Need for quantile selection: Choosing optimal quantiles might require some knowledge about the task.
- Increased computational complexity: Training separate models for different quantiles can increase the computational complexity of the task.
QuantileRegressor is a machine learning method designed to predict values corresponding to specified quantiles of the target variable. This method can be useful in tasks where various percentiles of the distribution are of interest and in cases where data may contain outliers.
2.1.18.1. Code for creating the QuantileRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.QuantileRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training QuantileRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import QuantileRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "QuantileRegressor"
onnx_model_filename = data_path + "quantile_regressor"
# create a QuantileRegressor model
regression_model = QuantileRegressor(solver='highs')
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python QuantileRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9959915738839231 Python Mean Absolute Error: 6.3693091850025185 Python Mean Squared Error: 53.0425343337143 Python Python QuantileRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\quantile_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9959915739158818 Python Mean Absolute Error: 6.3693091422201125 Python Mean Squared Error: 53.042533910812814 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 7 Python MSE matching decimal places: 5 Python float ONNX model precision: 7 Python Python QuantileRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\quantile_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9959915738839231 Python Mean Absolute Error: 6.3693091850025185 Python Mean Squared Error: 53.0425343337143 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 16 Python MSE matching decimal places: 13 Python double ONNX model precision: 16
Fig.61. Results of the QuantileRegressor.py (float ONNX)
2.1.18.2. MQL5 code for executing ONNX Models
This code executes the saved quantile_regressor_float.onnx and quantile_regressor_double.onnx and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| QuantileRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "QuantileRegressor" #define ONNXFilenameFloat "quantile_regressor_float.onnx" #define ONNXFilenameDouble "quantile_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
QuantileRegressor (EURUSD,H1) Testing ONNX float: QuantileRegressor (quantile_regressor_float.onnx) QuantileRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9959915739158818 QuantileRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3693091422201169 QuantileRegressor (EURUSD,H1) MQL5: Mean Squared Error: 53.0425339108128071 QuantileRegressor (EURUSD,H1) QuantileRegressor (EURUSD,H1) Testing ONNX double: QuantileRegressor (quantile_regressor_double.onnx) QuantileRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9959915738839231 QuantileRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3693091850025185 QuantileRegressor (EURUSD,H1) MQL5: Mean Squared Error: 53.0425343337142721
Comparison with the original double precision model in Python:
Testing ONNX float: QuantileRegressor (quantile_regressor_float.onnx) Python Mean Absolute Error: 6.3693091850025185 MQL5: Mean Absolute Error: 6.3693091422201169 Testing ONNX double: QuantileRegressor (quantile_regressor_double.onnx) Python Mean Absolute Error: 6.3693091850025185 MQL5: Mean Absolute Error: 6.3693091850025185
Accuracy of ONNX float MAE: 7 decimal places, Accuracy of ONNX double MAE: 16 decimal places.
2.1.18.3. ONNX representation of the quantile_regressor_float.onnx and quantile_regressor_double.onnx
Fig.62. ONNX representation of the quantile_regressor_float.onnx in Netron
Fig.63. ONNX representation of the quantile_regressor_double.onnx in Netron
2.1.19. sklearn.linear_model.RANSACRegressor
RANSACRegressor is a machine learning method used to solve regression problems using the RANSAC (Random Sample Consensus) method.
The RANSAC method is designed to handle data containing outliers or imperfections, allowing for a more robust regression model by excluding the influence of outliers.
How RANSACRegressor works:
- Input data: It begins with a dataset containing features (independent variables) and the target variable (continuous).
- Selection of random subsets: RANSAC starts by choosing random subsets of data used to train the regression model. These subsets are called "hypotheses."
- Fitting model to hypotheses: For each chosen hypothesis, a regression model is trained. In the case of RANSACRegressor, linear regression is usually used, and the model is fitted to the subset of data.
- Outlier evaluation: After training the model, its fit to all the data is evaluated. The error between the predicted and actual values is computed for each data point.
- Outlier identification: Data points with errors exceeding a specified threshold are considered outliers. These outliers can influence model training and distort results.
- Model update: All data points not considered outliers are used to update the regression model. This process may be repeated multiple times with different random hypotheses.
- Final model: After several iterations, RANSACRegressor selects the best model trained on the subset of data and returns it as the final regression model.
Advantages of RANSACRegressor:
- Outlier robustness: RANSACRegressor is a robust method against outliers as it excludes them from training.
- Robust regression: This method enables the creation of a more reliable regression model when data contains outliers or imperfections.
Limitations of RANSACRegressor:
- Sensitivity to error threshold: Choosing an error threshold to determine which points are considered outliers might require experimentation.
- Complexity of hypothesis selection: Choosing good hypotheses at the initial stage might not be a straightforward task.
RANSACRegressor is a machine learning method used for regression problems based on the RANSAC method. This method allows the creation of a more robust regression model when data contains outliers or imperfections by excluding their influence on the model.
2.1.19.1. Code for creating the RANSACRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.RANSACRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training RANSACRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RANSACRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "RANSACRegressor"
onnx_model_filename = data_path + "ransac_regressor"
# create a RANSACRegressor model
regression_model = RANSACRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("ONNX: MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python RANSACRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9962382642613388 Python Mean Absolute Error: 6.347737926336427 Python Mean Squared Error: 49.77814017128179 Python Python RANSACRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ransac_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382641628886 Python Mean Absolute Error: 6.3477377671679385 Python Mean Squared Error: 49.77814147404787 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python ONNX: MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python RANSACRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\ransac_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642613388 Python Mean Absolute Error: 6.347737926336427 Python Mean Squared Error: 49.77814017128179 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.64. Results of the RANSACRegressor.py (float ONNX)
2.1.19.2. MQL5 code for executing ONNX Models
This code executes the saved ransac_regressor_float.onnx and ransac_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| RANSACRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "RANSACRegressor" #define ONNXFilenameFloat "ransac_regressor_float.onnx" #define ONNXFilenameDouble "ransac_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
RANSACRegressor (EURUSD,H1) Testing ONNX float: RANSACRegressor (ransac_regressor_float.onnx) RANSACRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382641628886 RANSACRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477377671679385 RANSACRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.7781414740478638 RANSACRegressor (EURUSD,H1) RANSACRegressor (EURUSD,H1) Testing ONNX double: RANSACRegressor (ransac_regressor_double.onnx) RANSACRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642613388 RANSACRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379263364266 RANSACRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401712817768
Comparison with the original double precision model in Python:
Testing ONNX float: RANSACRegressor (ransac_regressor_float.onnx) Python Mean Absolute Error: 6.347737926336427 MQL5: Mean Absolute Error: 6.3477377671679385 Testing ONNX double: RANSACRegressor (ransac_regressor_double.onnx) Python Mean Absolute Error: 6.347737926336427 MQL5: Mean Absolute Error: 6.3477379263364266
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.19.3. ONNX representation of the ransac_regressor_float.onnx and ransac_regressor_double.onnx
Fig.65. ONNX representation of the ransac_regressor_float.onnx in Netron
Fig.66. ONNX representation of the ransac_regressor_double.onnx in Netron
2.1.20. sklearn.linear_model.TheilSenRegressor
Theil-Sen regression (Theil-Sen estimator) is a regression estimation method used to approximate linear relationships between independent variables and the target variable.
It offers a more robust estimate compared to ordinary linear regression in the presence of outliers and noise in the data.
How Theil-Sen regression works:
- Point selection: Initially, Theil-Sen selects random pairs of data points from the training dataset.
- Slope calculation: For each pair of data points, the method computes the slope of the line passing through these points, creating a set of slopes.
- Median slope: Then, the method finds the median slope from the set of slopes. This median slope is used as an estimation of the linear regression slope.
- Median deviations: For each data point, the method computes the deviation (difference between the actual value and the value predicted based on the median slope) and finds the median of these deviations. This creates an estimate for the coefficient of the linear regression intercept.
- Final estimation: The final estimations of the slope and intercept coefficients are used to build the linear regression model.
Advantages of Theil-Sen regression:
- Outlier resilience: Theil-Sen regression is more robust against outliers and data noise compared to regular linear regression.
- Less strict assumptions: The method does not require strict assumptions about data distribution or dependency form, making it more versatile.
- Suitable for multicollinear data: Theil-Sen regression performs well with data where independent variables are highly correlated (multicollinearity issue).
Limitations of Theil-Sen regression:
- Computational complexity: Computing median slopes for all pairs of data points might be time-consuming, especially for large datasets.
- Intercept coefficient estimation: Median deviations are used for estimating the intercept coefficient, which can lead to bias in the presence of outliers.
Theil-Sen regression is an estimation method for regression that provides a stable assessment of the linear relationship between independent variables and the target variable, particularly in the presence of outliers and data noise. This method is useful when a stable estimate is needed under real-world data conditions.
2.1.20.1. Code for creating the TheilSenRegressor and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.TheilSenRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX..
# The code demonstrates the process of training TheilSenRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import TheilSenRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "TheilSenRegressor"
onnx_model_filename = data_path + "theil_sen_regressor"
# create a TheilSen Regressor model
regression_model = TheilSenRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python TheilSenRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9962329196940459 Python Mean Absolute Error: 6.338686004537594 Python Mean Squared Error: 49.84886353898735 Python Python TheilSenRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\theil_sen_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.996232919516505 Python Mean Absolute Error: 6.338686370832071 Python Mean Squared Error: 49.84886588834327 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 6 Python MSE matching decimal places: 5 Python float ONNX model precision: 6 Python Python TheilSenRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\theil_sen_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962329196940459 Python Mean Absolute Error: 6.338686004537594 Python Mean Squared Error: 49.84886353898735 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.67. Results of the TheilSenRegressor.py (float ONNX)
2.1.20.2. MQL5 code for executing ONNX Models
This code executes the saved theil_sen_regressor_float.onnx and theil_sen_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| TheilSenRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "TheilSenRegressor" #define ONNXFilenameFloat "theil_sen_regressor_float.onnx" #define ONNXFilenameDouble "theil_sen_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
TheilSenRegressor (EURUSD,H1) Testing ONNX float: TheilSenRegressor (theil_sen_regressor_float.onnx) TheilSenRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962329195165051 TheilSenRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3386863708320735 TheilSenRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.8488658883432691 TheilSenRegressor (EURUSD,H1) TheilSenRegressor (EURUSD,H1) Testing ONNX double: TheilSenRegressor (theil_sen_regressor_double.onnx) TheilSenRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962329196940459 TheilSenRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3386860045375943 TheilSenRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.8488635389873735
Comparison with the original double precision model in Python:
Testing ONNX float: TheilSenRegressor (theil_sen_regressor_float.onnx) Python Mean Absolute Error: 6.338686004537594 MQL5: Mean Absolute Error: 6.3386863708320735 Testing ONNX double: TheilSenRegressor (theil_sen_regressor_double.onnx) Python Mean Absolute Error: 6.338686004537594 MQL5: Mean Absolute Error: 6.3386860045375943
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 15 decimal places.
2.1.20.3. ONNX representation of the theil_sen_regressor_float.onnx and theil_sen_regressor_double.onnx
Fig.68. ONNX representation of the theil_sen_regressor_float.onnx in Netron
Fig.69. ONNX representation of the theil_sen_regressor_double.onnx in Netron
2.1.21. sklearn.linear_model.LinearSVR
LinearSVR (Linear Support Vector Regression) is a machine learning model for regression tasks based on the Support Vector Machines (SVM) method.
This method is used to find linear relationships between features and the target variable using a linear kernel.
How LinearSVR works:
- Input data: LinearSVR begins with a dataset that includes features (independent variables) and their corresponding target variable values.
- Selecting a linear model: The model assumes there's a linear relationship between the features and the target variable, described by a linear regression equation.
- Model training: LinearSVR finds optimal values for the model's coefficients by minimizing a loss function that considers prediction error and an acceptable error (epsilon).
- Generating predictions: After training, the model can predict the target variable values for new data based on the discovered coefficients.
Advantages of LinearSVR:
- Support Vector Regression: LinearSVR employs the Support Vector Machines method, which enables finding the optimal separation between data while considering an acceptable error.
- Support for multiple features: The model can handle multiple features and process data in high dimensions.
- Regularization: LinearSVR involves regularization, aiding in combating overfitting and ensuring more stable predictions.
Limitations of LinearSVR:
- Linearity: LinearSVR is constrained by using linear relationships between features and the target variable. In the case of complex, nonlinear relationships, the model might be insufficiently flexible.
- Sensitivity to outliers: The model can be sensitive to outliers in the data and the acceptable error (epsilon).
- Inability to capture complex relationships: LinearSVR, like other linear models, is unable to capture complex nonlinear relationships between features and the target variable.
LinearSVR is a regression machine learning model that utilizes the Support Vector Machines method to find linear relationships between features and the target variable. It supports regularization and can be used in tasks where controlling acceptable error is essential. However, the model is limited by its linear dependence and might be sensitive to outliers.
2.1.21.1. Code for creating the LinearSVR model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.LinearSVR model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# LinearSVR.py
# The code demonstrates the process of training LinearSVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import LinearSVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "LinearSVR"
onnx_model_filename = data_path + "linear_svr"
# create a Linear SVR model
linear_svr_model = LinearSVR()
# fit the model to the data
linear_svr_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = linear_svr_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(linear_svr_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(linear_svr_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python LinearSVR Original model (double) Python R-squared (Coefficient of determination): 0.9944935515149387 Python Mean Absolute Error: 7.026852359381935 Python Mean Squared Error: 72.86550241109444 Python Python LinearSVR ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_svr_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9944935580726729 Python Mean Absolute Error: 7.026849848037511 Python Mean Squared Error: 72.86541563418206 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 4 Python MSE matching decimal places: 3 Python float ONNX model precision: 4 Python Python LinearSVR ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\linear_svr_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9944935515149387 Python Mean Absolute Error: 7.026852359381935 Python Mean Squared Error: 72.86550241109444 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 14 Python double ONNX model precision: 15
Fig.70. Results of the LinearSVR.py (float ONNX)
2.1.21.2. MQL5 code for executing ONNX Models
This code executes the saved linear_svr_float.onnx and linear_svr_double.onnx and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| LinearSVR.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "LinearSVR" #define ONNXFilenameFloat "linear_svr_float.onnx" #define ONNXFilenameDouble "linear_svr_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
LinearSVR (EURUSD,H1) Testing ONNX float: LinearSVR (linear_svr_float.onnx) LinearSVR (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9944935580726729 LinearSVR (EURUSD,H1) MQL5: Mean Absolute Error: 7.0268498480375108 LinearSVR (EURUSD,H1) MQL5: Mean Squared Error: 72.8654156341820567 LinearSVR (EURUSD,H1) LinearSVR (EURUSD,H1) Testing ONNX double: LinearSVR (linear_svr_double.onnx) LinearSVR (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9944935515149387 LinearSVR (EURUSD,H1) MQL5: Mean Absolute Error: 7.0268523593819374 LinearSVR (EURUSD,H1) MQL5: Mean Squared Error: 72.8655024110944680
Comparison with the original double precision model in Python:
Testing ONNX float: LinearSVR (linear_svr_float.onnx) Python Mean Absolute Error: 7.026852359381935 MQL5: Mean Absolute Error: 7.0268498480375108 Testing ONNX double: LinearSVR (linear_svr_double.onnx) Python Mean Absolute Error: 7.026852359381935 MQL5: Mean Absolute Error: 7.0268523593819374
Accuracy of ONNX float MAE: 4 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.21.3. ONNX representation of the linear_svr_float.onnx and linear_svr_double.onnx
Fig.71. ONNX representation of the linear_svr_float.onnx in Netron
Fig.72. ONNX representation of the linear_svr_double.onnx in Netron
2.1.22. sklearn.neural_network.MLPRegressor
MLPRegressor (Multi-Layer Perceptron Regressor) is a machine learning model that utilizes artificial neural networks for regression tasks.
It's a multi-layer neural network comprising several layers of neurons (including input, hidden, and output layers) that are trained to predict continuous values of the target variable.
How MLPRegressor works:
- Input data: It starts with a dataset containing features (independent variables) and their corresponding target variable values.
- Creating a multi-layer neural network: MLPRegressor employs a multi-layer neural network with multiple hidden layers of neurons. These neurons are connected via weighted connections and activation functions.
- Model training: MLPRegressor trains the neural network by adjusting weights and biases to minimize a loss function that measures the disparity between the network's predictions and the actual target variable values. This is achieved through backpropagation algorithms.
- Generating predictions: After training, the model can predict target variable values for new data.
Advantages of MLPRegressor:
- Flexibility: Multi-layer neural networks can model complex nonlinear relationships between features and the target variable.
- Versatility: MLPRegressor can be used for various regression tasks, including time series problems, function approximation, and more.
- Generalization ability: Neural networks learn from data and can generalize the dependencies found in the training data to new data.
Limitations of MLPRegressor:
- Complexity of the base model: Large neural networks can be computationally expensive and require extensive data for training.
- Hyperparameter tuning: Choosing optimal hyperparameters (number of layers, number of neurons in each layer, learning rate, etc.) might require experimentation.
- Susceptibility to overfitting: Large neural networks can be prone to overfitting if there's insufficient data or insufficient regularization.
MLPRegressor represents a powerful machine learning model based on multi-layer neural networks and can be used for a wide range of regression tasks. This model is flexible but requires meticulous tuning and training on large volumes of data to achieve optimal results.
2.1.22.1. Code for creating the MLPRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.neural_network.MLPRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training MLPRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "MLPRegressor"
onnx_model_filename = data_path + "mlp_regressor"
# create an MLP Regressor model
mlp_regressor_model = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', max_iter=1000)
# fit the model to the data
mlp_regressor_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = mlp_regressor_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(mlp_regressor_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(mlp_regressor_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python MLPRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9874070836467945 Python Mean Absolute Error: 10.62249788982753 Python Mean Squared Error: 166.63901957615224 Python Python MLPRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\mlp_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9874070821340352 Python Mean Absolute Error: 10.62249972216809 Python Mean Squared Error: 166.63903959413219 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 5 Python MSE matching decimal places: 4 Python float ONNX model precision: 5 Python Python MLPRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\mlp_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9874070836467945 Python Mean Absolute Error: 10.622497889827532 Python Mean Squared Error: 166.63901957615244 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 14 Python MSE matching decimal places: 12 Python double ONNX model precision: 14
Fig.73. Results of the MLPRegressor.py (float ONNX)
2.1.22.2. MQL5 code for executing ONNX Models
This code executes the saved mlp_regressor_float.onnx and mlp_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| MLPRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "MLPRegressor" #define ONNXFilenameFloat "mlp_regressor_float.onnx" #define ONNXFilenameDouble "mlp_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
MLPRegressor (EURUSD,H1) Testing ONNX float: MLPRegressor (mlp_regressor_float.onnx) MLPRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9875198695654352 MLPRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 10.5596681685341309 MLPRegressor (EURUSD,H1) MQL5: Mean Squared Error: 165.1465507645494597 MLPRegressor (EURUSD,H1) MLPRegressor (EURUSD,H1) Testing ONNX double: MLPRegressor (mlp_regressor_double.onnx) MLPRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9875198617341387 MLPRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 10.5596715833884609 MLPRegressor (EURUSD,H1) MQL5: Mean Squared Error: 165.1466543942046599
Comparison with the original double precision model in Python:
Testing ONNX float: MLPRegressor (mlp_regressor_float.onnx) Python Mean Absolute Error: 10.62249788982753 MQL5: Mean Absolute Error: 10.6224997221680901 Testing ONNX double: MLPRegressor (mlp_regressor_double.onnx) Python Mean Absolute Error: 10.62249788982753 MQL5: Mean Absolute Error: 10.6224978898275282
Accuracy of ONNX float MAE: 5 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.22.3. ONNX representation of the mlp_regressor_float.onnx and mlp_regressor_double.onnx
Fig.74. ONNX representation of the mlp_regressor_float.onnx in Netron
Fig.75. ONNX representation of the mlp_regressor_double.onnx in Netron
2.1.23. sklearn.cross_decomposition.PLSRegression
PLSRegression (Partial Least Squares Regression) is a machine learning method used for solving regression problems.
It is a part of the PLS family of methods and is applied to analyze and model relationships between two sets of variables, where one set serves as predictors, and the other set is the target variables.
How PLSRegression works:
- Input data: It starts with two sets of data, labeled as X and Y. The X set contains independent variables (predictors), and the Y set contains target variables (dependent).
- Selection of linear combinations: PLSRegression identifies linear combinations (components) in sets X and Y that maximize the covariance between them. These components are referred to as PLS components.
- Maximizing covariance: The primary objective of PLSRegression is to find PLS components that maximize the covariance between X and Y. This allows for the extraction of the most informative relationships between predictors and target variables.
- Model training: Once the PLS components are found, they can be used to create a model that predicts Y values based on X.
- Generating predictions: After training, the model can be used to predict Y values for new data using corresponding X values.
Advantages of PLSRegression:
- Correlation analysis: PLSRegression enables the analysis and modeling of correlations between two sets of variables, which can be useful for understanding the relationships between predictors and target variables.
- Dimensionality reduction: The method can also be used to reduce the dimensionality of data by identifying the most important PLS components.
Limitations of PLSRegression:
- Sensitivity to the choice of the number of components: Selecting the optimal number of PLS components may require some experimentation.
- Dependency on the data structure: PLSRegression results can heavily rely on the structure of the data and the correlations between them.
PLSRegression is a machine learning method used to analyze and model correlations between two sets of variables, where one set acts as predictors, and the other is the target variables. This method allows for studying relationships within the data and can be useful for reducing data dimensionality and predicting target variable values based on predictors.
2.1.23.1. Code for creating the PLSRegression model and exporting it to ONNX for float and double
This code creates the sklearn.cross_decomposition.PLSRegression model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training PLSRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "PLSRegression"
onnx_model_filename = data_path + "pls_regression"
# create a PLSRegression model
pls_model = PLSRegression(n_components=1)
# fit the model to the data
pls_model.fit(X, y)
# predict values for the entire dataset
y_pred = pls_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(pls_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(pls_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python PLSRegression Original model (double) Python R-squared (Coefficient of determination): 0.9962382642613388 Python Mean Absolute Error: 6.3477379263364275 Python Mean Squared Error: 49.778140171281805 Python Python PLSRegression ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\pls_regression_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382638567003 Python Mean Absolute Error: 6.3477379221400145 Python Mean Squared Error: 49.778145525764096 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 8 Python MSE matching decimal places: 5 Python float ONNX model precision: 8 Python Python PLSRegression ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\pls_regression_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9962382642613388 Python Mean Absolute Error: 6.3477379263364275 Python Mean Squared Error: 49.778140171281805 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 16 Python MSE matching decimal places: 15 Python double ONNX model precision: 16
Fig.76. Results of the PLSRegression.py (float ONNX)
2.1.23.2. MQL5 code for executing ONNX Models
This code executes the saved pls_regression_float.onnx and pls_regression_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| PLSRegression.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "PLSRegression" #define ONNXFilenameFloat "pls_regression_float.onnx" #define ONNXFilenameDouble "pls_regression_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
PLSRegression (EURUSD,H1) Testing ONNX float: PLSRegression (pls_regression_float.onnx) PLSRegression (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382638567003 PLSRegression (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379221400145 PLSRegression (EURUSD,H1) MQL5: Mean Squared Error: 49.7781455257640815 PLSRegression (EURUSD,H1) PLSRegression (EURUSD,H1) Testing ONNX double: PLSRegression (pls_regression_double.onnx) PLSRegression (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962382642613388 PLSRegression (EURUSD,H1) MQL5: Mean Absolute Error: 6.3477379263364275 PLSRegression (EURUSD,H1) MQL5: Mean Squared Error: 49.7781401712817839
Comparison with the original double precision model in Python:
Testing ONNX float: PLSRegression (pls_regression_float.onnx) Python Mean Absolute Error: 6.3477379263364275 MQL5: Mean Absolute Error: 6.3477379221400145 Testing ONNX double: PLSRegression (pls_regression_double.onnx) Python Mean Absolute Error: 6.3477379263364275 MQL5: Mean Absolute Error: 6.3477379263364275
Accuracy of ONNX float MAE: 8 decimal places, Accuracy of ONNX double MAE: 16 decimal places.
2.1.23.3. ONNX representation of the pls_regression_float.onnx and pls_regression_double.onnx
Fig.77. ONNX representation of the pls_regression_float.onnx in Netron
Fig.78. ONNX representation of the pls_regression_double.onnx in Netron
2.1.24. sklearn.linear_model.TweedieRegressor
TweedieRegressor is a regression method designed to solve regression problems using the Tweedie distribution. The Tweedie distribution is a probability distribution that can describe a wide range of data, including data with varying variance structure. TweedieRegressor is applied in regression tasks where the target variable possesses characteristics that align with the Tweedie distribution.
How TweedieRegressor works:
- Target variable and Tweedie distribution: TweedieRegressor assumes that the target variable follows a Tweedie distribution. The Tweedie distribution depends on the parameter 'p,' which determines the distribution's shape and the degree of variance.
- Model training: TweedieRegressor trains a regression model to predict the target variable based on independent variables (features). The model maximizes the likelihood for data corresponding to the Tweedie distribution.
- Choosing the 'p' parameter: Selecting the 'p' parameter is a crucial aspect when using TweedieRegressor. This parameter defines the distribution's shape and variance. Different 'p' values correspond to different types of data; for instance, p=1 corresponds to the Poisson distribution, while p=2 corresponds to the normal distribution.
- Transforming responses: Sometimes the model may require transformations of responses (target variables) before training. This transformation relates to the 'p' parameter and might involve logarithmic functions or other transformations to conform to the Tweedie distribution.
Advantages of TweedieRegressor:
- Ability to model data with varying variance: The Tweedie distribution can adapt to data with different variance structures, which is valuable for real-world data where variance can vary.
- Variety of 'p' parameters: The ability to choose different 'p' values allows modeling various data types.
Limitations of TweedieRegressor:
- Complexity in choosing the 'p' parameter: Selecting the correct 'p' value may require knowledge about the data and experimentation.
- Conformance to the Tweedie distribution: For successful application of TweedieRegressor, the target variable must correspond to the Tweedie distribution. Non-compliance may lead to poor model performance.
TweedieRegressor is a regression method that uses the Tweedie distribution to model data with varying variance structures. This method is useful in regression tasks where the target variable conforms to the Tweedie distribution and can be tuned with different 'p' parameter values for better data adaptation.
2.1.24.1. Code for creating the TweedieRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.TweedieRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training TweedieRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import TweedieRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "TweedieRegressor"
onnx_model_filename = data_path + "tweedie_regressor"
# create a Tweedie Regressor model
regression_model = TweedieRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
2023.10.31 11:39:36.223 Python TweedieRegressor Original model (double) 2023.10.31 11:39:36.223 Python R-squared (Coefficient of determination): 0.9962368328117072 2023.10.31 11:39:36.223 Python Mean Absolute Error: 6.342397897667562 2023.10.31 11:39:36.223 Python Mean Squared Error: 49.797082198408745 2023.10.31 11:39:36.223 Python 2023.10.31 11:39:36.223 Python TweedieRegressor ONNX model (float) 2023.10.31 11:39:36.223 Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\tweedie_regressor_float.onnx 2023.10.31 11:39:36.253 Python Information about input tensors in ONNX: 2023.10.31 11:39:36.253 Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] 2023.10.31 11:39:36.253 Python Information about output tensors in ONNX: 2023.10.31 11:39:36.253 Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] 2023.10.31 11:39:36.253 Python R-squared (Coefficient of determination) 0.9962368338709323 2023.10.31 11:39:36.253 Python Mean Absolute Error: 6.342397072978867 2023.10.31 11:39:36.253 Python Mean Squared Error: 49.797068181938165 2023.10.31 11:39:36.253 Python R^2 matching decimal places: 8 2023.10.31 11:39:36.253 Python MAE matching decimal places: 6 2023.10.31 11:39:36.253 Python MSE matching decimal places: 4 2023.10.31 11:39:36.253 Python float ONNX model precision: 6 2023.10.31 11:39:36.613 Python 2023.10.31 11:39:36.613 Python TweedieRegressor ONNX model (double) 2023.10.31 11:39:36.613 Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\tweedie_regressor_double.onnx 2023.10.31 11:39:36.613 Python Information about input tensors in ONNX: 2023.10.31 11:39:36.613 Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] 2023.10.31 11:39:36.613 Python Information about output tensors in ONNX: 2023.10.31 11:39:36.628 Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] 2023.10.31 11:39:36.628 Python R-squared (Coefficient of determination) 0.9962368328117072 2023.10.31 11:39:36.628 Python Mean Absolute Error: 6.342397897667562 2023.10.31 11:39:36.628 Python Mean Squared Error: 49.797082198408745 2023.10.31 11:39:36.628 Python R^2 matching decimal places: 16 2023.10.31 11:39:36.628 Python MAE matching decimal places: 15 2023.10.31 11:39:36.628 Python MSE matching decimal places: 15 2023.10.31 11:39:36.628 Python double ONNX model precision: 15
Fig.79. Results of the TweedieRegressor.py (float ONNX)
2.1.24.2. MQL5 code for executing ONNX Models
This code executes the saved tweedie_regressor_float.onnx and tweedie_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| TweedieRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "TweedieRegressor" #define ONNXFilenameFloat "tweedie_regressor_float.onnx" #define ONNXFilenameDouble "tweedie_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
2023.10.31 11:42:20.113 TweedieRegressor (EURUSD,H1) Testing ONNX float: TweedieRegressor (tweedie_regressor_float.onnx) 2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962368338709323 2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3423970729788666 2023.10.31 11:42:20.119 TweedieRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.7970681819381653 2023.10.31 11:42:20.125 TweedieRegressor (EURUSD,H1) 2023.10.31 11:42:20.125 TweedieRegressor (EURUSD,H1) Testing ONNX double: TweedieRegressor (tweedie_regressor_double.onnx) 2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9962368328117072 2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 6.3423978976675608 2023.10.31 11:42:20.130 TweedieRegressor (EURUSD,H1) MQL5: Mean Squared Error: 49.7970821984087593
Comparison with the original double precision model in Python:
Testing ONNX float: TweedieRegressor (tweedie_regressor_float.onnx) Python Mean Absolute Error: 6.342397897667562 MQL5: Mean Absolute Error: 6.3423970729788666 Testing ONNX double: TweedieRegressor (tweedie_regressor_double.onnx) Python Mean Absolute Error: 6.342397897667562 MQL5: Mean Absolute Error: 6.3423978976675608
Accuracy of ONNX float MAE: 6 decimal places, Accuracy of ONNX double MAE: 14 decimal places.
2.1.24.3. ONNX representation of the tweedie_regressor_float.onnx and tweedie_regressor_double.onnx
Fig.80. ONNX representation of the tweedie_regressor_float.onnx in Netron
Fig.81. ONNX representation of the tweedie_regressor_double.onnx in Netron
2.1.25. sklearn.linear_model.PoissonRegressor
PoissonRegressor is a machine learning method applied to solve regression tasks based on the Poisson distribution..
This method is suitable when the dependent variable (target variable) is count data, representing the number of events that occurred within a fixed period of time or in a fixed spatial interval. PoissonRegressor models the relationship between predictors (independent variables) and the target variable by assuming that this relationship conforms to the Poisson distribution.
How PoissonRegressor works:
- Input data: Starting with a dataset that includes features (independent variables) and the target variable, representing the count of events.
- Poisson distribution: The PoissonRegressor method models the target variable by assuming it follows the Poisson distribution. The Poisson distribution is suitable for modeling events that occur at a fixed mean intensity within a given time interval or spatial range.
- Model training: PoissonRegressor trains a model that estimates the parameters of the Poisson distribution, considering the predictors. The model attempts to find the best fit for the observed data using the likelihood function that corresponds to the Poisson distribution.
- Predicting count values: After training, the model can be used to predict count values (the number of events) on new data, and these predictions also follow the Poisson distribution.
Advantages of PoissonRegressor:
- Suitable for count data: PoissonRegressor is suitable for tasks where the target variable represents count data, such as the number of orders, calls, etc.
- Specificity of the distribution: Since the model adheres to the Poisson distribution, it can be more accurate for data that are well described by this distribution.
Limitations of PoissonRegressor:
- Only suitable for count data: PoissonRegressor is not suitable for regression where the target variable is continuous and non-count.
- Dependence on feature selection: The quality of the model can heavily depend on the selection and engineering of features.
PoissonRegressor is a machine learning method used for solving regression tasks when the target variable represents count data and is modeled using the Poisson distribution. This method is beneficial for tasks related to events occurring at a fixed intensity within specific time or spatial intervals.
2.1.25.1. Code for creating the PoissonRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.PoissonRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training PoissonRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "PoissonRegressor"
onnx_model_filename = data_path + "poisson_regressor"
# create a PoissonRegressor model
regression_model = PoissonRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python PoissonRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9204304782362495 Python Mean Absolute Error: 27.59790466048524 Python Mean Squared Error: 1052.9242570153044 Python Python PoissonRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\poisson_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9204305082536851 Python Mean Absolute Error: 27.59790825165078 Python Mean Squared Error: 1052.9238598018305 Python R^2 matching decimal places: 6 Python MAE matching decimal places: 5 Python MSE matching decimal places: 2 Python float ONNX model precision: 5 Python Python PoissonRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\poisson_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9204304782362495 Python Mean Absolute Error: 27.59790466048524 Python Mean Squared Error: 1052.9242570153044 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 14 Python MSE matching decimal places: 13 Python double ONNX model precision: 14
Fig.82. Results of the PoissonRegressor.py (float ONNX)
2.1.25.2. MQL5 code for executing ONNX Models
This code executes the saved poisson_regressor_float.onnx and poisson_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| PoissonRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "PoissonRegressor" #define ONNXFilenameFloat "poisson_regressor_float.onnx" #define ONNXFilenameDouble "poisson_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
PoissonRegressor (EURUSD,H1) Testing ONNX float: PoissonRegressor (poisson_regressor_float.onnx) PoissonRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9204305082536851 PoissonRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 27.5979082516507788 PoissonRegressor (EURUSD,H1) MQL5: Mean Squared Error: 1052.9238598018305311 PoissonRegressor (EURUSD,H1) PoissonRegressor (EURUSD,H1) Testing ONNX double: PoissonRegressor (poisson_regressor_double.onnx) PoissonRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9204304782362493 PoissonRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 27.5979046604852343 PoissonRegressor (EURUSD,H1) MQL5: Mean Squared Error: 1052.9242570153051020
Comparison with the original double precision model in Python:
Testing ONNX float: PoissonRegressor (poisson_regressor_float.onnx) Python Mean Absolute Error: 27.59790466048524 MQL5: Mean Absolute Error: 27.5979082516507788 Testing ONNX double: PoissonRegressor (poisson_regressor_double.onnx) Python Mean Absolute Error: 27.59790466048524 MQL5: Mean Absolute Error: 27.5979046604852343
Accuracy of ONNX float MAE: 5 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.25.3. ONNX representation of the poisson_regressor_float.onnx and poisson_regressor_double.onnx
Fig.83. ONNX representation of the poisson_regressor_float.onnx in Netron
Fig.84. ONNX representation of the poisson_regressor_double.onnx in Netron
2.1.26. sklearn.neighbors.RadiusNeighborsRegressor
RadiusNeighborsRegressor is a machine learning method used for regression tasks. It's a variant of the k-Nearest Neighbors (k-NN) method designed to predict values of the target variable based on the nearest neighbors in the feature space. However, instead of a fixed number of neighbors (as in the k-NN method), RadiusNeighborsRegressor uses a fixed radius to determine neighbors for each sample.How RadiusNeighborsRegressor works:
- Input data: Starting with a dataset that includes features (independent variables) and the target variable (continuous).
- Setting the radius: RadiusNeighborsRegressor requires setting a fixed radius to determine the closest neighbors for each sample in the feature space.
- Neighbor definition: For each sample, all data points within the specified radius are determined, becoming neighbors of that sample.
- Weighted averaging: To predict the value of the target variable for each sample, the values of its neighbors' target variables are used. This is often done using weighted averaging, where weights depend on the distance between samples.
- Prediction: After training, the model can be used to predict the values of the target variable on new data based on the nearest neighbors in the feature space.
- Versatility: RadiusNeighborsRegressor can be used for regression tasks, particularly when the number of neighbors may vary significantly depending on the radius.
- Resilience to outliers: A neighbor-based approach can be resilient to outliers because the model only considers nearby data points.
- Dependency on radius selection: Choosing the right radius may require tuning and experimentation.
- Computational complexity: Handling large datasets may require substantial computational resources.
2.1.26.1. Code for creating the RadiusNeighborsRegressor and exporting it to ONNX for float and double
This code creates the sklearn.neighbors.RadiusNeighborsRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training RadiusNeighborsRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import RadiusNeighborsRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "RadiusNeighborsRegressor"
onnx_model_filename = data_path + "radius_neighbors_regressor"
# create a RadiusNeighborsRegressor model
regression_model = RadiusNeighborsRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python RadiusNeighborsRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9999521132921395 Python Mean Absolute Error: 0.591458244376554 Python Mean Squared Error: 0.6336732353950723 Python Python RadiusNeighborsRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\radius_neighbors_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9999999999999971 Python Mean Absolute Error: 4.393654615473253e-06 Python Mean Squared Error: 3.829042036424747e-11 Python R^2 matching decimal places: 4 Python MAE matching decimal places: 0 Python MSE matching decimal places: 0 Python float ONNX model precision: 0 Python Python RadiusNeighborsRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\radius_neighbors_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 1.0 Python Mean Absolute Error: 0.0 Python Mean Squared Error: 0.0 Python R^2 matching decimal places: 0 Python MAE matching decimal places: 0 Python MSE matching decimal places: 0 Python double ONNX model precision: 0
Fig.85. Results of the RadiusNeighborsRegressor.py (float ONNX)
2.1.26.2. MQL5 code for executing ONNX Models
This code executes the saved radius_neighbors_regressor_float.onnx and radius_neighbors_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| RadiusNeighborsRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "RadiusNeighborsRegressor" #define ONNXFilenameFloat "radius_neighbors_regressor_float.onnx" #define ONNXFilenameDouble "radius_neighbors_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
RadiusNeighborsRegressor (EURUSD,H1) Testing ONNX float: RadiusNeighborsRegressor (radius_neighbors_regressor_float.onnx) RadiusNeighborsRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9999999999999971 RadiusNeighborsRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.0000043936546155 RadiusNeighborsRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0000000000382904 RadiusNeighborsRegressor (EURUSD,H1) RadiusNeighborsRegressor (EURUSD,H1) Testing ONNX double: RadiusNeighborsRegressor (radius_neighbors_regressor_double.onnx) RadiusNeighborsRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 1.0000000000000000 RadiusNeighborsRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.0000000000000000 RadiusNeighborsRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0000000000000000
2.1.26.3. ONNX representation of the radius_neighbors_regressor_float.onnx and radius_neighbors_regressor_double.onnx
Fig.86. ONNX representation of the radius_neighbors_regressor_float.onnx in Netron
Fig.87. ONNX-representaion of radius_neighbors_regressor_double.onnx in Netron
2.1.27. sklearn.neighbors.KNeighborsRegressor
KNeighborsRegressor is a machine learning method used for regression tasks.
It belongs to the category of k-Nearest Neighbors (k-NN) algorithms and is used to predict numerical values of the target variable based on the proximity (similarity) between objects in the training dataset.
How KNeighborsRegressor works:
- Input data: It begins with the initial dataset, including features (independent variables) and corresponding values of the target variable.
- Selecting the number of neighbors (k): You need to choose the number of nearest neighbors (k) to be considered during prediction. This number is one of the model's hyperparameters.
- Calculating proximity: For new data (points for which predictions are needed), the distance or similarity between this data and all objects in the training dataset is computed.
- Choosing k nearest neighbors: k objects from the training dataset that are closest to the new data are selected.
- Prediction: For regression tasks, predicting the value of the target variable for new data is calculated as the average value of the target variables of the k nearest neighbors.
Advantages of KNeighborsRegressor:
- Ease of use: KNeighborsRegressor is a straightforward algorithm that does not require complex preprocessing of data.
- Non-parametric nature: The method does not assume a specific functional form of dependency between features and the target variable, enabling modeling of diverse relationships.
- Reproducibility: Results from KNeighborsRegressor can be reproduced as predictions are based on data proximity.
Limitations of KNeighborsRegressor:
- Computational complexity: Calculating distances to all points in the training dataset can be computationally expensive for large volumes of data.
- Sensitivity to the choice of the number of neighbors: Selecting the optimal value of k requires tuning and can significantly impact the model's performance.
- Sensitivity to noise: The method can be sensitive to data noise and outliers.
KNeighborsRegressor is useful in regression tasks where considering the neighborhood of objects for predicting the target variable is essential. It can be particularly useful in situations where the relationship between features and the target variable is nonlinear and complex.
2.1.27.1. Code for creating the KNeighborsRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.neighbors.KNeighborsRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training KNeighborsRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "KNeighborsRegressor"
onnx_model_filename = data_path + "kneighbors_regressor"
# create a KNeighbors Regressor model
kneighbors_model = KNeighborsRegressor(n_neighbors=5)
# fit the model to the data
kneighbors_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = kneighbors_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(kneighbors_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(kneighbors_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python KNeighborsRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9995599863346534 Python Mean Absolute Error: 1.7414210057117578 Python Mean Squared Error: 5.822594523532273 Python Python KNeighborsRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\kneighbors_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9995599867417418 Python Mean Absolute Error: 1.7414195457976402 Python Mean Squared Error: 5.8225891366283875 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 4 Python MSE matching decimal places: 4 Python float ONNX model precision: 4 Python Python KNeighborsRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\kneighbors_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9995599863346534 Python Mean Absolute Error: 1.7414210057117583 Python Mean Squared Error: 5.822594523532269 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 14 Python MSE matching decimal places: 13 Python double ONNX model precision: 14
Fig.88. Results of the KNeighborsRegressor.py (float ONNX)
2.1.27.2. MQL5 code for executing ONNX Models
This code executes the saved kneighbors_regressor_float.onnx and kneighbors_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| KNeighborsRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "KNeighborsRegressor" #define ONNXFilenameFloat "kneighbors_regressor_float.onnx" #define ONNXFilenameDouble "kneighbors_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
KNeighborsRegressor (EURUSD,H1) Testing ONNX float: KNeighborsRegressor (kneighbors_regressor_float.onnx) KNeighborsRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9995599860116634 KNeighborsRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 1.7414200607817711 KNeighborsRegressor (EURUSD,H1) MQL5: Mean Squared Error: 5.8225987975798184 KNeighborsRegressor (EURUSD,H1) KNeighborsRegressor (EURUSD,H1) Testing ONNX double: KNeighborsRegressor (kneighbors_regressor_double.onnx) KNeighborsRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9995599863346534 KNeighborsRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 1.7414210057117601 KNeighborsRegressor (EURUSD,H1) MQL5: Mean Squared Error: 5.8225945235322705
Comparison with the original double precision model in Python:
Testing ONNX float: KNeighborsRegressor (kneighbors_regressor_float.onnx) Python Mean Absolute Error: 1.7414210057117578 MQL5: Mean Absolute Error: 1.7414200607817711 Testing ONNX double: KNeighborsRegressor (kneighbors_regressor_double.onnx) Python Mean Absolute Error: 1.7414210057117578 MQL5: Mean Absolute Error: 1.7414210057117601
Accuracy of ONNX float MAE: 5 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.27.3. ONNX representation of the kneighbors_regressor_float.onnx and kneighbors_regressor_double.onnx
Fig.89. ONNX representation of the kneighbors_regressor_float.onnx in Netron
Fig.90. ONNX representation of the kneighbors_regressor_double.onnx in Netron
2.1.28. sklearn.gaussian_process.GaussianProcessRegressor
GaussianProcessRegressor is a machine learning method used for regression tasks that allows modeling uncertainty in predictions.
The Gaussian Process (GP) is a powerful tool in Bayesian machine learning and is used to model complex functions and predict target variable values while accounting for uncertainty.
How GaussianProcessRegressor works:
- Input data: It begins with the initial dataset, including features (independent variables) and corresponding values of the target variable.
- Modeling the Gaussian process: Gaussian Process employs a Gaussian process, which is a collection of random variables described by a Gaussian (normal) distribution. GP models not only the mean values for each data point but also the covariance (or similarity) between these points.
- Choosing the covariance function: A crucial aspect of GP is the selection of the covariance function (or kernel) that determines the interconnectedness and strength among data points. Different covariance functions can be used based on the nature of the data and the task.
- Model training: GaussianProcessRegressor trains the GP using the training data. During training, the model adjusts the parameters of the covariance function and evaluates uncertainty in predictions.
- Prediction: After training, the model can be used to predict target variable values for new data. An important feature of GP is that it predicts not only the mean value but also a confidence interval that estimates the level of confidence in the predictions.
Advantages of GaussianProcessRegressor:
- Modeling uncertainty: GP allows for accounting for uncertainty in predictions, which is beneficial in tasks where knowing the confidence in predicted values is crucial.
- Flexibility: GP can model various functions, and its covariance functions can be adapted for different data types.
- Few hyperparameters: GP has a relatively small number of hyperparameters, simplifying model tuning.
Limitations of GaussianProcessRegressor:
- Computational complexity: GP can be computationally expensive, especially with a large volume of data.
- Inefficiency in high-dimensional spaces: GP might lose efficiency in tasks with numerous features due to the curse of dimensionality.
GaussianProcessRegressor is useful in regression tasks where modeling uncertainty and providing reliable predictions are crucial. This method is frequently used in Bayesian machine learning and meta-analysis.
2.1.28.1. Code for creating the GaussianProcessRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.gaussian_process.GaussianProcessRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training GaussianProcessRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "GaussianProcessRegressor"
onnx_model_filename = data_path + "gaussian_process_regressor"
# create a GaussianProcessRegressor model
kernel = 1.0 * RBF()
gp_model = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)
# fit the model to the data
gp_model.fit(X, y)
# predict values for the entire dataset
y_pred = gp_model.predict(X, return_std=False)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(gp_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("ONNX: MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(gp_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python GaussianProcessRegressor Original model (double) Python R-squared (Coefficient of determination): 1.0 Python Mean Absolute Error: 3.504041501400934e-13 Python Mean Squared Error: 1.6396606443650807e-25 Python Python GaussianProcessRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gaussian_process_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: GPmean, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9999999999999936 Python Mean Absolute Error: 6.454076974495848e-06 Python Mean Squared Error: 8.493606782250733e-11 Python R^2 matching decimal places: 0 Python MAE matching decimal places: 0 Python MSE matching decimal places: 0 Python float ONNX model precision: 0 Python Python GaussianProcessRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gaussian_process_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: GPmean, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 1.0 Python Mean Absolute Error: 3.504041501400934e-13 Python Mean Squared Error: 1.6396606443650807e-25 Python R^2 matching decimal places: 1 Python MAE matching decimal places: 19 Python MSE matching decimal places: 20 Python double ONNX model precision: 19
Fig.91. Results of the GaussianProcessRegressor.py (float ONNX)
2.1.28.2. MQL5 code for executing ONNX Models
This code executes the saved gaussian_process_regressor_float.onnx and gaussian_process_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| GaussianProcessRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "GaussianProcessRegressor" #define ONNXFilenameFloat "gaussian_process_regressor_float.onnx" #define ONNXFilenameDouble "gaussian_process_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
GaussianProcessRegressor (EURUSD,H1) Testing ONNX float: GaussianProcessRegressor (gaussian_process_regressor_float.onnx) GaussianProcessRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9999999999999936 GaussianProcessRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.0000064540769745 GaussianProcessRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0000000000849361 GaussianProcessRegressor (EURUSD,H1) GaussianProcessRegressor (EURUSD,H1) Testing ONNX double: GaussianProcessRegressor (gaussian_process_regressor_double.onnx) GaussianProcessRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 1.0000000000000000 GaussianProcessRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.0000000000003504 GaussianProcessRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0000000000000000
2.1.28.3. ONNX representation of the gaussian_process_regressor_float.onnx and gaussian_process_regressor_double.onnx
Fig.92. ONNX representation of the gaussian_process_regressor_float.onnx in Netron
Fig.93. ONNX representation of the gaussian_process_regressor_double.onnx in Netron
2.1.29. sklearn.linear_model.GammaRegressor
GammaRegressor is a machine learning method designed for regression tasks where the target variable follows a gamma distribution.
The gamma distribution is a probability distribution used to model positive, continuous random variables. This method enables modeling and predicting positive numerical values, such as cost, time, or proportions.
How GammaRegressor works:
- Input data: It starts with the initial dataset, where there are features (independent variables) and corresponding values of the target variable following the gamma distribution.
- Loss function selection: GammaRegressor utilizes a loss function that corresponds to the gamma distribution and considers the peculiarities of this distribution. This allows modeling data while considering the non-negativity and the right-skew of the gamma distribution.
- Model training: The model is trained on data using the chosen loss function. During training, it adjusts the model's parameters to minimize the loss function.
- Prediction: After training, the model can be used to predict the values of the target variable for new data.
Advantages of GammaRegressor:
- Modeling positive values: This method is specifically designed for modeling positive numerical values, which can be useful in tasks where the target variable is lower-bounded.
- Considering the gamma distribution shape: GammaRegressor accounts for the characteristics of the gamma distribution, enabling more accurate modeling of data following this distribution.
- Usefulness in econometrics and medical research: The gamma distribution is frequently used to model cost, waiting time, and other positive random variables in econometrics and medical research.
Limitations of GammaRegressor:
- Limitation on data type: This method is suitable only for regression tasks where the target variable follows the gamma distribution or similar distributions. For data that doesn't conform to such a distribution, this method might not be effective.
- Requires choosing a loss function: Choosing an appropriate loss function might require knowledge about the distribution of the target variable and its characteristics.
GammaRegressor is useful in tasks where modeling and predicting positive numerical values that align with the gamma distribution are needed.
2.1.29.1. Code for creating the GammaRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.GammaRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training GammaRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import GammaRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 10+4*X + 10*np.sin(X*0.5)
model_name = "GammaRegressor"
onnx_model_filename = data_path + "gamma_regressor"
# create a Gamma Regressor model
regression_model = GammaRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python GammaRegressor Original model (double) Python R-squared (Coefficient of determination): 0.7963797339354436 Python Mean Absolute Error: 37.266200319422815 Python Mean Squared Error: 2694.457784927322 Python Python GammaRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gamma_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.7963795030042045 Python Mean Absolute Error: 37.266211754095956 Python Mean Squared Error: 2694.4608407846144 Python R^2 matching decimal places: 6 Python MAE matching decimal places: 4 Python MSE matching decimal places: 1 Python float ONNX model precision: 4 Python Python GammaRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gamma_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.7963797339354436 Python Mean Absolute Error: 37.266200319422815 Python Mean Squared Error: 2694.457784927322 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 15 Python MSE matching decimal places: 12 Python double ONNX model precision: 15
Fig.94. Results of the GammaRegressor.py (float ONNX)
2.1.29.2. MQL5 code for executing ONNX Models
This code executes the saved gamma_regressor_float.onnx and gamma_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| GammaRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "GammaRegressor" #define ONNXFilenameFloat "gamma_regressor_float.onnx" #define ONNXFilenameDouble "gamma_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(10+4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
GammaRegressor (EURUSD,H1) Testing ONNX float: GammaRegressor (gamma_regressor_float.onnx) GammaRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.7963795030042045 GammaRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 37.2662117540959628 GammaRegressor (EURUSD,H1) MQL5: Mean Squared Error: 2694.4608407846144473 GammaRegressor (EURUSD,H1) GammaRegressor (EURUSD,H1) Testing ONNX double: GammaRegressor (gamma_regressor_double.onnx) GammaRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.7963797339354435 GammaRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 37.2662003194228220 GammaRegressor (EURUSD,H1) MQL5: Mean Squared Error: 2694.4577849273218817
Comparison with the original double precision model in Python:
Testing ONNX float: GammaRegressor (gamma_regressor_float.onnx) Python Mean Absolute Error: 37.266200319422815 MQL5: Mean Absolute Error: 37.2662117540959628 Testing ONNX double: GammaRegressor (gamma_regressor_double.onnx) Python Mean Absolute Error: 37.266200319422815 MQL5: Mean Absolute Error: 37.2662003194228220
Accuracy of ONNX float MAE: 4 decimal places, Accuracy of ONNX double MAE: 13 decimal places.
2.1.29.3. ONNX representation of the gamma_regressor_float.onnx and gamma_regressor_double.onnx
Fig.95. ONNX representation of the gamma_regressor_float.onnx in Netron
Fig.96. ONNX representation of the gamma_regressor_double.onnx in Netron
2.1.30. sklearn.linear_model.SGDRegressor
SGDRegressor is a regression method that utilizes Stochastic Gradient Descent (SGD) to train a regression model. It is part of the linear models family and can be employed for regression tasks. The key attributes of SGDRegressor are efficiency and its capability to handle large volumes of data.
How SGDRegressor works:
- Linear regression: Similar to Ridge and Lasso, SGDRegressor aims to find a linear relationship between independent variables (features) and the target variable in a regression problem.
- Stochastic Gradient Descent: The basis of SGDRegressor is stochastic gradient descent. Instead of computing gradients on the entire training dataset, it updates the model based on randomly selected mini-batches of data. This allows for efficient model training and working with substantial datasets.
- Regularization: SGDRegressor supports L1 and L2 regularization (Lasso and Ridge). This helps control overfitting and improves model stability.
- Hyperparameters: Similar to Ridge and Lasso, SGDRegressor allows tuning hyperparameters such as the regularization parameter (α, alpha) and the type of regularization.
Advantages of SGDRegressor:
- Efficiency: SGDRegressor performs well with large datasets and efficiently trains models on extensive data.
- Ability for regularization: The option to apply L1 and L2 regularization makes this method suitable for managing overfitting issues.
- Adaptive gradient descent: Stochastic gradient descent enables adaptation to changing data and the ability to train models on the fly.
Limitations of SGDRegressor:
- Sensitivity to hyperparameter choice: Tuning hyperparameters like learning rate and regularization coefficient might require experimentation.
- Not always converging to global minimum: Due to the stochastic nature of gradient descent, SGDRegressor doesn’t always converge to the global minimum of the loss function.
SGDRegressor is a regression method that uses stochastic gradient descent to train a regression model. It's efficient, capable of handling large datasets, and supports regularization for managing overfitting.
2.1.30.1. Code for creating the SGDRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.SGDRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training SGDRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,10,0.1).reshape(-1,1)
y = 4*X + np.sin(X*10)
model_name = "SGDRegressor"
onnx_model_filename = data_path + "sgd_regressor"
# create an SGDRegressor model
regression_model = SGDRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python SGDRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9961197872743282 Python Mean Absolute Error: 0.6405924406136998 Python Mean Squared Error: 0.5169867345998348 Python Python SGDRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\sgd_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9961197876338647 Python Mean Absolute Error: 0.6405924014799271 Python Mean Squared Error: 0.5169866866963753 Python R^2 matching decimal places: 9 Python MAE matching decimal places: 7 Python MSE matching decimal places: 6 Python float ONNX model precision: 7 Python Python SGDRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\sgd_regressor_double.onnx Python Information about input tensors in ONNX: Python 1. Name: double_input, Data Type: tensor(double), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(double), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9961197872743282 Python Mean Absolute Error: 0.6405924406136998 Python Mean Squared Error: 0.5169867345998348 Python R^2 matching decimal places: 16 Python MAE matching decimal places: 16 Python MSE matching decimal places: 16 Python double ONNX model precision: 16
Fig.97. Results of the SGDRegressor.py (float ONNX)
2.1.30.2. MQL5 code for executing ONNX Models
This code executes the saved sgd_regressor_float.onnx and sgd_rgressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| SGDRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "SGDRegressor" #define ONNXFilenameFloat "sgd_regressor_float.onnx" #define ONNXFilenameDouble "sgd_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i*0.1; y[i]=(double)(4*x[i] + sin(x[i]*10)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
SGDRegressor (EURUSD,H1) Testing ONNX float: SGDRegressor (sgd_regressor_float.onnx) SGDRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9961197876338647 SGDRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.6405924014799272 SGDRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.5169866866963754 SGDRegressor (EURUSD,H1) SGDRegressor (EURUSD,H1) Testing ONNX double: SGDRegressor (sgd_regressor_double.onnx) SGDRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9961197872743282 SGDRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.6405924406136998 SGDRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.5169867345998348
Comparison with the original double precision model in Python:
Testing ONNX float: SGDRegressor (sgd_regressor_float.onnx) Python Mean Absolute Error: 0.6405924406136998 MQL5: Mean Absolute Error: 0.6405924014799272 Testing ONNX double: SGDRegressor (sgd_regressor_double.onnx) Python Mean Absolute Error: 0.6405924406136998 MQL5: Mean Absolute Error: 0.6405924406136998
Accuracy of ONNX float MAE: 7 decimal places, Accuracy of ONNX double MAE: 16 decimal places.
2.1.30.3. ONNX representation of the sgd_regressor_float.onnx and sgd_regressor_double.onnx
Fig.98. ONNX representation of the sgd_regressor_float.onnx in Netron
Fig.99. ONNX representation of the sgd_rgressor_double.onnx in Netron
2.2. Regression models from the Scikit-learn library that are converted only into float precision ONNX models
This section covers models that can only function with float precision. Converting them to ONNX with double precision leads to errors related to the limitations of the ai.onnx.ml subset of ONNX operators.
2.2.1. sklearn.linear_model.AdaBoostRegressor
AdaBoostRegressor - is a machine learning method used for regression, which involves predicting numerical values (e.g., real estate prices, sales volumes, etc.).
This method is a variation of the AdaBoost (Adaptive Boosting) algorithm, initially developed for classification tasks.
How AdaBoostRegressor works:
- Original dataset: It begins with the original dataset containing features (independent variables) and their corresponding target variables (dependent variables we aim to predict).
- Weight initialization: Initially, each data point (observation) has equal weights, and the model is built based on this weighted dataset.
- Training weak learners: AdaBoostRegressor constructs several weak regression models (e.g., decision trees) that attempt to predict the target variable. These models are referred to as "weak learners." Each weak learner is trained on data while considering the weights of each observation.
- Selection of weak learner weights: AdaBoostRegressor computes weights for each weak learner based on how well that learner performed in predictions. Learners making more accurate predictions receive higher weights, and vice versa.
- Update of observation weights: Observation weights are updated so that observations previously incorrectly predicted receive greater weights, thus increasing their importance for the next model.
- Final prediction: AdaBoostRegressor combines the predictions of all weak learners, assigning weights based on their performance. This results in the final prediction of the model.
Advantages of AdaBoostRegressor:
- Adaptability: AdaBoostRegressor adapts to complex functions and deals better with nonlinear relationships.
- Overfitting reduction: AdaBoostRegressor uses regularization through the update of observation weights, helping to prevent overfitting.
- Powerful ensemble: By combining multiple weak models, AdaBoostRegressor can create strong models that can predict the target variable fairly accurately.
Limitations of AdaBoostRegressor:
- Sensitivity to outliers: AdaBoostRegressor is sensitive to outliers in the data, affecting prediction quality.
- High computational costs: Constructing multiple weak learners might require more computational resources and time.
- Not always the best choice: AdaBoostRegressor is not always the optimal choice, and in some cases, other regression methods might perform better.
AdaBoostRegressor is a useful machine learning method applicable to various regression tasks, especially in situations where data contains complex dependencies.
2.2.1.1. Code for creating the AdaBoostRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.AdaBoostRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training AdaBoostRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "AdaBoostRegressor"
onnx_model_filename = data_path + "adaboost_regressor"
# create an AdaBoostRegressor model
regression_model = AdaBoostRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python AdaBoostRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9991257208809748 Python Mean Absolute Error: 2.3678022748065457 Python Mean Squared Error: 11.569124350863143 Python Python AdaBoostRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9991257199849699 Python Mean Absolute Error: 2.36780399225718 Python Mean Squared Error: 11.569136207480646 Python R^2 matching decimal places: 7 Python MAE matching decimal places: 5 Python MSE matching decimal places: 4 Python float ONNX model precision: 5 Python Python AdaBoostRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_double.onnx
Here the model was exported into ONNX models for float and double. The ONNX float model executed successfully, while there execution error with the double model (errors in the Errors tab):
AdaBoostRegressor.py started AdaBoostRegressor.py 1 1 Traceback (most recent call last): AdaBoostRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) AdaBoostRegressor.py 159 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 383 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 424 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\adaboost_regressor_double.onnx failed:Type Error: onnxruntime_inference_collection.py 424 1 AdaBoostRegressor.py finished in 3207 ms 5 1
Fig.100. Results of the AdaBoostRegressor.py (float ONNX)
2.2.1.2. MQL5 code for executing ONNX Models
This code executes the saved adaboost_regressor_float.onnx and adaboost_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| AdaBoostRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "AdaBoostRegressor" #define ONNXFilenameFloat "adaboost_regressor_float.onnx" #define ONNXFilenameDouble "adaboost_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
AdaBoostRegressor (EURUSD,H1) AdaBoostRegressor (EURUSD,H1) Testing ONNX float: AdaBoostRegressor (adaboost_regressor_float.onnx) AdaBoostRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9991257199849699 AdaBoostRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 2.3678039922571803 AdaBoostRegressor (EURUSD,H1) MQL5: Mean Squared Error: 11.5691362074806463 AdaBoostRegressor (EURUSD,H1) AdaBoostRegressor (EURUSD,H1) Testing ONNX double: AdaBoostRegressor (adaboost_regressor_double.onnx) AdaBoostRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type parameter (T) of Optype (Mul) bound to different types (tensor(float) and tensor(double) in node (Mul).'), inspect code 'Scripts\Regression\AdaBoostRegressor.mq5' (133:16) AdaBoostRegressor (EURUSD,H1) model_name=AdaBoostRegressor OnnxCreate error 5800The ONNX float model executed successfully, while there execution error with the double model.
2.2.1.3. ONNX representation of the adaboost_regressor_float.onnx and adaboost_regressor_double.onnx
Fig.101. ONNX representation of the adaboost_regressor_float.onnx in Netron
Fig.102. ONNX representation of the adaboost_regressor_double.onnx in Netron
2.2.2. sklearn.linear_model.BaggingRegressor
BaggingRegressor is a machine learning method used for regression tasks.
It represents an ensemble method based on the idea of "bagging" (Bootstrap Aggregating), which involves constructing multiple base regression models and combining their predictions to obtain a more stable and accurate result.
How BaggingRegressor works:
- Original dataset: It starts with the original dataset containing features (independent variables) and their corresponding target variables (dependent variables we aim to predict).
- Generation of subsets: BaggingRegressor randomly creates several subsets (samples with replacement) from the original data. Each subset contains a random set of observations from the original data.
- Training base regression models: For each subset, BaggingRegressor constructs a separate base regression model (e.g., decision tree, random forest, linear regression model, etc.).
- Predictions from base models: Each base model is used to predict the target variable based on the corresponding subset.
- Averaging or combination: BaggingRegressor averages or combines the predictions of all base models to obtain the final regression prediction.
Advantages of BaggingRegressor:
- Variance reduction: BaggingRegressor reduces the model's variance, making it more robust to fluctuations in the data.
- Overfitting reduction: As the model is trained on different data subsets, BaggingRegressor usually reduces the risk of overfitting.
- Improved generalization: By combining predictions from multiple models, BaggingRegressor typically provides more accurate and stable forecasts.
- Wide range of base models: BaggingRegressor can use different types of base regression models, making it a flexible method.
Limitations of BaggingRegressor:
- It is not always capable of enhancing performance when the base model already performs well on the data.
- BaggingRegressor might require more computational resources and time compared to training a single model.
BaggingRegressor is a powerful machine learning method that can be beneficial in regression tasks, especially with noisy data, and the need for improved prediction stability.
2.2.2.1. Code for creating the BaggingRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.BaggingRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training BaggingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "BaggingRegressor"
onnx_model_filename = data_path + "bagging_regressor"
# create a Bagging Regressor model
regression_model = BaggingRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python Python BaggingRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9998128324923137 Python Mean Absolute Error: 1.0257279210387649 Python Mean Squared Error: 2.4767424083953005 Python Python BaggingRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9998128317934672 Python Mean Absolute Error: 1.0257282792130034 Python Mean Squared Error: 2.4767516560614187 Python R^2 matching decimal laces: 8 Python MAE matching decimal places: 5 Python MSE matching decimal places: 4 Python float ONNX model precision: 5 Python Python BaggingRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_double.onnx
Errors tab:
BaggingRegressor.py started BaggingRegressor.py 1 1 Traceback (most recent call last): BaggingRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) BaggingRegressor.py 161 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 383 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 424 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\bagging_regressor_double.onnx failed:Type Error: T onnxruntime_inference_collection.py 424 1 BaggingRegressor.py finished in 3173 ms 5 1
Fig.103. Results of the BaggingRegressor.py (float ONNX)
2.2.2.2. MQL5 code for executing ONNX Models
This code executes the saved bagging_regressor_float.onnx and bagging_regressor_double.onnx and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| BaggingRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "BaggingRegressor" #define ONNXFilenameFloat "bagging_regressor_float.onnx" #define ONNXFilenameDouble "bagging_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
BaggingRegressor (EURUSD,H1) Testing ONNX float: BaggingRegressor (bagging_regressor_float.onnx) BaggingRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9998128317934672 BaggingRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 1.0257282792130034 BaggingRegressor (EURUSD,H1) MQL5: Mean Squared Error: 2.4767516560614196 BaggingRegressor (EURUSD,H1) BaggingRegressor (EURUSD,H1) Testing ONNX double: BaggingRegressor (bagging_regressor_double.onnx) BaggingRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (ReduceMean) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\BaggingRegressor.mq5' (133:16) BaggingRegressor (EURUSD,H1) model_name=BaggingRegressor OnnxCreate error 5800
The ONNX model calculated in float executed normally, but an error occurred when executing the model in double.
2.2.2.3. ONNX representation of the bagging_regressor_float.onnx and bagging_regressor_double.onnx
Fig.104. ONNX representation of the bagging_regressor_float.onnx in Netron
Fig.105. ONNX representation of the bagging_regressor_double.onnx in Netron
2.2.3. sklearn.linear_model.DecisionTreeRegressor
DecisionTreeRegressor is a machine learning method used for regression tasks, predicting numerical values of the target variable based on a set of features (independent variables).
This method is based on building decision trees that partition feature space into intervals and predict the target variable's value for each interval.
Working principle of DecisionTreeRegressor:
- Beginning construction: Starting with the initial dataset containing features (independent variables) and corresponding values of the target variable.
- Feature selection and splitting: The decision tree selects a feature and a threshold value that divides the data into two or more subgroups. This split is performed to minimize the mean squared error (the average squared deviation between predicted and actual values of the target variable) within each subgroup.
- Recursive building: The process of feature selection and splitting is repeated for each subgroup, creating sub-trees. This process is done recursively until certain stopping criteria are met, such as maximum tree depth or minimum samples in a node.
- Leaf nodes: When stopping criteria are met, leaf nodes are created, predicting numerical values of the target variable for samples that fall into a given leaf node.
- Prediction: For new data, the decision tree is applied, and new observations traverse the tree until they reach a leaf node that predicts the numerical value of the target variable.
Advantages of DecisionTreeRegressor:
- Interpretability: Decision trees are easy to understand and visualize, making them useful for explaining model decision-making.
- Outlier robustness: Decision trees can be robust to data outliers.
- Handling both numeric and categorical data: Decision trees can process both numeric and categorical features without additional preprocessing.
- Automated feature selection: Trees can automatically select important features, ignoring less relevant ones.
Limitations of DecisionTreeRegressor:
- Overfitting vulnerability: Decision trees can be prone to overfitting, especially if they are too deep.
- Generalization issues: Decision trees may not generalize well to data not included in the training set.
- Not always an optimal choice: In some cases, other regression methods like linear regression or k-nearest neighbors might perform better.
DecisionTreeRegressor is a valuable method for regression tasks, especially when understanding the model's decision-making logic and visualizing the process is crucial.
2.2.3.1. Code for creating the DecisionTreeRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.linear_model.DecisionTreeRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training DecisionTreeRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "DecisionTreeRegressor"
onnx_model_filename = data_path + "decision_tree_regressor"
# create a Decision Tree Regressor model
regression_model = DecisionTreeRegressor()
# fit the model to the data
regression_model.fit(X, y)
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python DecisionTreeRegressor Original model (double) Python R-squared (Coefficient of determination): 1.0 Python Mean Absolute Error: 0.0 Python Mean Squared Error: 0.0 Python Python DecisionTreeRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9999999999999971 Python Mean Absolute Error: 4.393654615473253e-06 Python Mean Squared Error: 3.829042036424747e-11 Python R^2 matching decimal places: 0 Python MAE matching decimal places: 0 Python MSE matching decimal places: 0 Python float ONNX model precision: 0 Python Python DecisionTreeRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_double.onnx
Errors tab:
DecisionTreeRegressor.py started DecisionTreeRegressor.py 1 1 Traceback (most recent call last): DecisionTreeRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) DecisionTreeRegressor.py 160 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 383 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 424 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\decision_tree_regressor_double.onnx failed:Type Er onnxruntime_inference_collection.py 424 1 DecisionTreeRegressor.py finished in 2957 ms 5 1
Fig.106. Results of the DecisionTreeRegressor.py (float ONNX)
2.2.3.2. MQL5 code for executing ONNX Models
This code executes the saved decision_tree_regressor_float.onnx and decision_tree_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| DecisionTreeRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "DecisionTreeRegressor" #define ONNXFilenameFloat "decision_tree_regressor_float.onnx" #define ONNXFilenameDouble "decision_tree_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
DecisionTreeRegressor (EURUSD,H1) Testing ONNX float: DecisionTreeRegressor (decision_tree_regressor_float.onnx) DecisionTreeRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9999999999999971 DecisionTreeRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.0000043936546155 DecisionTreeRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0000000000382904 DecisionTreeRegressor (EURUSD,H1) DecisionTreeRegressor (EURUSD,H1) Testing ONNX double: DecisionTreeRegressor (decision_tree_regressor_double.onnx) DecisionTreeRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\DecisionTreeRegressor.mq5' (133:16) DecisionTreeRegressor (EURUSD,H1) model_name=DecisionTreeRegressor OnnxCreate error 5800
The ONNX model calculated in float executed normally, but an error occurred when executing the model in double.
2.2.3.3. ONNX representation of the decision_tree_regressor_float.onnx and decision_tree_regressor_double.onnx
Fig.107. ONNX representation of the decision_tree_regressor_float.onnx in Netron
Fig.108. ONNX representation of the decision_tree_regressor_double.onnx in Netron
2.2.4. sklearn.tree.ExtraTreeRegressor
ExtraTreeRegressor, or Extremely Randomized Trees Regressor, is a regression ensemble method based on decision trees.
This method is a variation of random forests and differs in that instead of choosing the best split for each tree node, it uses random splits for each node. This makes it more random and faster, which can be advantageous in certain situations.
Working principle of ExtraTreeRegressor:
- Beginning construction: Starting with the initial dataset containing features (independent variables) and corresponding values of the target variable.
- Randomness in splits: Unlike regular decision trees where the best split is chosen, ExtraTreeRegressor uses random threshold values to split the tree nodes. This makes the splitting process more random and less prone to overfitting.
- Tree construction: The tree is built by splitting nodes based on random features and threshold values. This process continues until certain stopping criteria are met, such as maximum tree depth or minimum number of samples in a node.
- Ensemble of trees: ExtraTreeRegressor constructs multiple such random trees, the number of which is controlled by the "n_estimators" hyperparameter.
- Prediction: To predict the target variable for new data, ExtraTreeRegressor simply averages the predictions of all trees in the ensemble.
Advantages of ExtraTreeRegressor:
- Reduction in overfitting: Using random node splits makes the method less prone to overfitting compared to regular decision trees.
- High parallelization: Since the trees are built independently, ExtraTreeRegressor can be easily parallelized for training on multiple processors.
- Fast training: Compared to some other methods like gradient boosting, ExtraTreeRegressor can be trained faster.
Limitations of ExtraTreeRegressor:
- May be less accurate: In some cases, especially with small datasets, ExtraTreeRegressor may be less accurate compared to more complex methods.
- Less interpretable: Compared to linear models, decision trees, and other simpler methods, ExtraTreeRegressor is typically less interpretable.
ExtraTreeRegressor can be a useful method for regression in situations where reducing overfitting and quick training are needed.
2.2.4.1. Code for creating the ExtraTreeRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.tree.ExtraTreeRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training ExtraTreeRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import ExtraTreeRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "ExtraTreeRegressor"
onnx_model_filename = data_path + "extra_tree_regressor"
# create an ExtraTreeRegressor model
regression_model = ExtraTreeRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression data
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
2023.10.30 14:40:57.665 Python ExtraTreeRegressor Original model (double) 2023.10.30 14:40:57.665 Python R-squared (Coefficient of determination): 1.0 2023.10.30 14:40:57.665 Python Mean Absolute Error: 0.0 2023.10.30 14:40:57.665 Python Mean Squared Error: 0.0 2023.10.30 14:40:57.681 Python 2023.10.30 14:40:57.681 Python ExtraTreeRegressor ONNX model (float) 2023.10.30 14:40:57.681 Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_float.onnx 2023.10.30 14:40:57.681 Python Information about input tensors in ONNX: 2023.10.30 14:40:57.681 Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] 2023.10.30 14:40:57.681 Python Information about output tensors in ONNX: 2023.10.30 14:40:57.681 Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] 2023.10.30 14:40:57.681 Python R-squared (Coefficient of determination) 0.9999999999999971 2023.10.30 14:40:57.681 Python Mean Absolute Error: 4.393654615473253e-06 2023.10.30 14:40:57.681 Python Mean Squared Error: 3.829042036424747e-11 2023.10.30 14:40:57.681 Python R^2 matching decimal places: 0 2023.10.30 14:40:57.681 Python MAE matching decimal places: 0 2023.10.30 14:40:57.681 Python MSE matching decimal places: 0 2023.10.30 14:40:57.681 Python float ONNX model precision: 0 2023.10.30 14:40:58.011 Python 2023.10.30 14:40:58.011 Python ExtraTreeRegressor ONNX model (double) 2023.10.30 14:40:58.011 Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_double.onnx
Errors tab:
ExtraTreeRegressor.py started ExtraTreeRegressor.py 1 1 Traceback (most recent call last): ExtraTreeRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) ExtraTreeRegressor.py 159 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 383 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 424 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_tree_regressor_double.onnx failed:Type Error onnxruntime_inference_collection.py 424 1 ExtraTreeRegressor.py finished in 2980 ms 5 1
Fig.109. Results of the ExtraTreeRegressor.py (float ONNX)
2.2.4.2. MQL5 code for executing ONNX Models
This code executes the saved extra_tree_regressor_float.onnx and extra_tree_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| ExtraTreeRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "ExtraTreeRegressor" #define ONNXFilenameFloat "extra_tree_regressor_float.onnx" #define ONNXFilenameDouble "extra_tree_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
ExtraTreeRegressor (EURUSD,H1) Testing ONNX float: ExtraTreeRegressor (extra_tree_regressor_float.onnx) ExtraTreeRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9999999999999971 ExtraTreeRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.0000043936546155 ExtraTreeRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0000000000382904 ExtraTreeRegressor (EURUSD,H1) ExtraTreeRegressor (EURUSD,H1) Testing ONNX double: ExtraTreeRegressor (extra_tree_regressor_double.onnx) ExtraTreeRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\ExtraTreeRegressor.mq5' (133:16) ExtraTreeRegressor (EURUSD,H1) model_name=ExtraTreeRegressor OnnxCreate error 5800
The float ONNX model executed normally, but an error occurred when executing the ONNX model in double.
2.2.4.3. ONNX representation extra_tree_regressor_float.onnx and extra_tree_regressor_double.onnx
Fig.110. ONNX representation of the extra_tree_regressor_float.onnx in Netron
Fig.111. ONNX representation of the extra_tree_regressor_double.onnx in Netron
2.2.5. sklearn.ensemble.ExtraTreesRegressor
ExtraTreesRegressor (Extremely Randomized Trees Regressor) is a machine learning method that represents a variation of Random Forests for regression tasks.
This method employs an ensemble of decision trees to predict numerical values of the target variable based on a set of features.
How ExtraTreesRegressor works:
- Beginning Construction: It starts with the original dataset, including features (independent variables) and their corresponding values of the target variable.
- Randomness in Splits: Unlike regular decision trees where the best split is selected to divide nodes, ExtraTreesRegressor uses random threshold values to split tree nodes. This randomness makes the splitting process more arbitrary and less prone to overfitting.
- Tree Building: ExtraTreesRegressor constructs multiple decision trees in the ensemble. The number of trees is controlled by the "n_estimators" hyperparameter. Each tree is trained on a random subsample of data (with replacement) and random subsets of features.
- Prediction: For predicting the target variable for new data, ExtraTreesRegressor aggregates the predictions of all trees in the ensemble (usually by averaging).
Advantages of ExtraTreesRegressor:
- Reduction in Overfitting: Using random node splits and data subsampling makes the method less prone to overfitting compared to conventional decision trees.
- High Parallelization: As trees are built independently, ExtraTreesRegressor can be easily parallelized for training on multiple processors.
- Robustness to Outliers: The method typically shows resilience to outliers in the data.
- Handling Numerical and Categorical Data: ExtraTreesRegressor can handle both numerical and categorical features without additional preprocessing.
Limitations of ExtraTreesRegressor:
- May Require Fine-Tuning of Hyperparameters: Although ExtraTreesRegressor usually works well with default parameters, fine-tuning of hyperparameters might be needed for achieving maximum performance.
- Less Interpretability: Like other ensemble methods, ExtraTreesRegressor is less interpretable compared to simpler models such as linear regression.
ExtraTreesRegressor can be a beneficial method for regression across various tasks, particularly when reducing overfitting and improving the model's generalization is necessary.
2.2.5.1. Code for creating the ExtraTreesRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.ensemble.ExtraTreesRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training ExtraTreesRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "ExtraTreesRegressor"
onnx_model_filename = data_path + "extra_trees_regressor"
# create an Extra Trees Regressor model
regression_model = ExtraTreesRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python ExtraTreesRegressor Original model (double) Python R-squared (Coefficient of determination): 1.0 Python Mean Absolute Error: 2.2302160118670144e-13 Python Mean Squared Error: 8.41048471722451e-26 Python Python ExtraTreesRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9999999999998015 Python Mean Absolute Error: 3.795239380975701e-05 Python Mean Squared Error: 2.627067474763585e-09 Python R^2 matching decimal places: 0 Python MAE matching decimal places: 0 Python MSE matching decimal places: 0 Python float ONNX model precision: 0 Python Python ExtraTreesRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_double.onnx
Errors tab:
ExtraTreesRegressor.py started ExtraTreesRegressor.py 1 1 Traceback (most recent call last): ExtraTreesRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) ExtraTreesRegressor.py 160 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 383 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 424 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\extra_trees_regressor_double.onnx failed:Type Erro onnxruntime_inference_collection.py 424 1 ExtraTreesRegressor.py finished in 4654 ms 5 1
Fig.112. Results of the ExtraTreesRegressor.py (float ONNX)
2.2.5.2. MQL5 code for executing ONNX Models
This code creates the extra_trees_regressor_float.onnx and extra_trees_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| ExtraTreesRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "ExtraTreesRegressor" #define ONNXFilenameFloat "extra_trees_regressor_float.onnx" #define ONNXFilenameDouble "extra_trees_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
ExtraTreesRegressor (EURUSD,H1) Testing ONNX float: ExtraTreesRegressor (extra_trees_regressor_float.onnx) ExtraTreesRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9999999999998015 ExtraTreesRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.0000379523938098 ExtraTreesRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0000000026270675 ExtraTreesRegressor (EURUSD,H1) ExtraTreesRegressor (EURUSD,H1) Testing ONNX double: ExtraTreesRegressor (extra_trees_regressor_double.onnx) ExtraTreesRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\ExtraTreesRegressor.mq5' (133:16) ExtraTreesRegressor (EURUSD,H1) model_name=ExtraTreesRegressor OnnxCreate error 5800
The float ONNX model executed normally, but an error occurred when executing the ONNX model in double.
2.2.5.3. ONNX representation of the extra_trees_regressor_float.onnx and extra_trees_regressor_double.onnx
Fig.113. ONNX representation of the extra_trees_regressor_float.onnx in Netron
Fig.114. ONNX representation of the extra_trees_regressor_double.onnx in Netron
2.2.6. sklearn.svm.NuSVR
NuSVR is a machine learning method used for regression tasks. This method is based on Support Vector Machine (SVM) but is applied to regression tasks instead of classification tasks.
NuSVR is a variation of SVM designed to solve regression tasks by predicting continuous values of the target variable.
How NuSVR works:
- Input Data: It starts with a dataset that includes features (independent variables) and values of the target variable (continuous).
- Kernel Selection: NuSVR uses kernels such as linear, polynomial, or radial basis function (RBF) to transform the data into a higher-dimensional space where a linear separating hyperplane can be found.
- Defining the Nu parameter: The Nu parameter controls model complexity and defines how many training examples will be considered as outliers. The Nu value should range from 0 to 1, influencing the number of support vectors.
- Support Vector Construction: NuSVR aims to find an optimal separating hyperplane that maximizes the gap between this hyperplane and the nearest sample points.
- Model Training: The model is trained to minimize regression error and meet the constraints associated with the Nu parameter.
- Making Predictions: After training, the model can be used to predict the values of the target variable on new data.
Advantages of NuSVR:
- Outlier Handling: NuSVR allows controlling outliers using the Nu parameter, regulating the number of training examples considered as outliers.
- Multiple Kernels: The method supports various types of kernels, enabling the modeling of complex nonlinear relationships.
Limitations of NuSVR:
- Nu Parameter Selection: Choosing the correct value for the Nu parameter may require some experimentation.
- Data Scale Sensitivity: SVM, including NuSVR, can be sensitive to data scale, so feature standardization or normalization might be necessary.
- Computational Complexity: For large datasets and complex kernels, NuSVR can be computationally expensive.
NuSVR is a machine learning method for regression tasks based on the Support Vector Machine (SVM) method. It allows the prediction of continuous values of the target variable and provides the capability to manage outliers using the Nu parameter.
2.2.6.1. Code for creating the NuSVR model and exporting it to ONNX for float and double
This code creates the sklearn.svm.NuSVR model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training NuSVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import NuSVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "NuSVR"
onnx_model_filename = data_path + "nu_svr"
# create a NuSVR model
nusvr_model = NuSVR()
# fit the model to the data
nusvr_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = nusvr_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(nusvr_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(nusvr_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python NuSVR Original model (double) Python R-squared (Coefficient of determination): 0.2771437770527445 Python Mean Absolute Error: 83.76666411704255 Python Mean Squared Error: 9565.381751764757 Python Python NuSVR ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\nu_svr_float.onnx Python Information about input tensors in ONNX: 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.27714379657935495 Python Mean Absolute Error: 83.766663385322 Python Mean Squared Error: 9565.381493373838 Python R^2 matching decimal places: 7 Python MAE matching decimal places: 5 Python MSE matching decimal places: 3 Python float ONNX model precision: 5 Python Python NuSVR ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\nu_svr_double.onnx
Errors tab:
NuSVR.py started NuSVR.py 1 1 Traceback (most recent call last): NuSVR.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) NuSVR.py 159 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 383 1 sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 435 1 onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for SVMRegressor(1) node with name 'SVM' onnxruntime_inference_collection.py 435 1 NuSVR.py finished in 2925 ms 5 1
Fig.115. Results of the NuSVR.py (float ONNX)
2.2.6.2. MQL5 code for executing ONNX Models
This code executes the saved nu_svr_float.onnx and nu_svr_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| NuSVR.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "NuSVR" #define ONNXFilenameFloat "nu_svr_float.onnx" #define ONNXFilenameDouble "nu_svr_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
NuSVR (EURUSD,H1) Testing ONNX float: NuSVR (nu_svr_float.onnx) NuSVR (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.2771437965793548 NuSVR (EURUSD,H1) MQL5: Mean Absolute Error: 83.7666633853219906 NuSVR (EURUSD,H1) MQL5: Mean Squared Error: 9565.3814933738358377 NuSVR (EURUSD,H1) NuSVR (EURUSD,H1) Testing ONNX double: NuSVR (nu_svr_double.onnx) NuSVR (EURUSD,H1) ONNX: cannot create session (OrtStatus: 9 'Could not find an implementation for SVMRegressor(1) node with name 'SVM''), inspect code 'Scripts\Regression\NuSVR.mq5' (133:16) NuSVR (EURUSD,H1) model_name=NuSVR OnnxCreate error 5800
The float ONNX model executed normally, but an error occurred when executing the ONNX model in double.
Comparison with the original double precision model in Python:
Testing ONNX float: NuSVR (nu_svr_float.onnx) Python Mean Absolute Error: 83.76666411704255 MQL5: Mean Absolute Error: 83.7666633853219906
2.2.6.3. ONNX representation of the nu_svr_float.onnx and nu_svr_double.onnx
Fig.116. ONNX representation of the nu_svr_float.onnx in Netron
Fig.117. ONNX representation of the nu_svr_double.onnx in Netron
2.2.7. sklearn.ensemble.RandomForestRegressor
RandomForestRegressor is a machine learning method used to solve regression tasks.
It's one of the most popular methods based on ensemble learning and employs the Random Forest algorithm to create powerful and robust regression models.
Here's how RandomForestRegressor works:
- Input Data: It begins with a dataset that includes features (independent variables) and a target variable (continuous).
- Random Forest: RandomForestRegressor uses an ensemble of decision trees to solve the regression task. Each tree in the forest works on predicting the target variable values.
- Bootstrap Sampling: Each tree is trained using bootstrap samples, which means random sampling with replacement from the training dataset. This allows diversity in the data each tree learns from.
- Random Feature Selection: When building each tree, a random subset of features is also selected, making the model more robust and reducing correlations between trees.
- Averaging Predictions: Once all the trees are constructed, RandomForestRegressor averages or combines their predictions to get the final regression prediction.
Advantages of RandomForestRegressor:
- Power and Robustness: RandomForestRegressor is a powerful regression method that often delivers good performance.
- Handling Large Data: It handles large datasets well and can handle a multitude of features.
- Resilience to Overfitting: Due to bootstrap sampling and random feature selection, the random forest is typically robust against overfitting.
- Feature Importance Estimation: Random Forest can provide information about the importance of each feature in the regression task.
Limitations of RandomForestRegressor:
- Lack of Interpretability: The model might be less interpretable compared to linear models.
- Not Always the Most Accurate Model: In some tasks, more complex ensembles might be unnecessary, and linear models could be more suitable.
RandomForestRegressor is a powerful machine learning method for regression tasks that uses an ensemble of random decision trees to create a stable and high-performing regression model. This method is particularly useful for tasks with large datasets and for evaluating feature importance.
2.2.7.1. Code for creating the RandomForestRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.ensemble.RandomForestRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training RandomForestRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "RandomForestRegressor"
onnx_model_filename = data_path + "random_forest_regressor"
# create a RandomForestRegressor model
regression_model = RandomForestRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python RandomForestRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9998854509605539 Python Mean Absolute Error: 0.9186485980852603 Python Mean Squared Error: 1.5157997632401086 Python Python RandomForestRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9998854516013125 Python Mean Absolute Error: 0.9186420704511761 Python Mean Squared Error: 1.515791284236419 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 5 Python MSE matching decimal places: 5 Python float ONNX model precision: 5 Python Python RandomForestRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_double.onnx
Errors tab:
RandomForestRegressor.py started RandomForestRegressor.py 1 1 Traceback (most recent call last): RandomForestRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) RandomForestRegressor.py 159 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 383 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 424 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\random_forest_regressor_double.onnx failed:Type Er onnxruntime_inference_collection.py 424 1 RandomForestRegressor.py finished in 4392 ms 5 1
Fig.118. Results of the RandomForestRegressor.py (float ONNX)
2.2.7.2. MQL5 code for executing ONNX Models
This code executes the saved random_forest_regressor_float.onnx and random_forest_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| RandomForestRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "RandomForestRegressor" #define ONNXFilenameFloat "random_forest_regressor_float.onnx" #define ONNXFilenameDouble "random_forest_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
RandomForestRegressor (EURUSD,H1) RandomForestRegressor (EURUSD,H1) Testing ONNX float: RandomForestRegressor (random_forest_regressor_float.onnx) RandomForestRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9998854516013125 RandomForestRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.9186420704511761 RandomForestRegressor (EURUSD,H1) MQL5: Mean Squared Error: 1.5157912842364190 RandomForestRegressor (EURUSD,H1) RandomForestRegressor (EURUSD,H1) Testing ONNX double: RandomForestRegressor (random_forest_regressor_double.onnx) RandomForestRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\RandomForestRegressor.mq5' (133:16) RandomForestRegressor (EURUSD,H1) model_name=RandomForestRegressor OnnxCreate error 5800
The float ONNX model executed normally, but an error occurred when executing the ONNX model in double..
2.2.7.3. ONNX representation of the random_forest_regressor_float.onnx and random_forest_regressor_double.onnx
Fig.119. ONNX representation of the random_forest_regressor_float.onnx in Netron
Fig.120. ONNX representation of the random_forest_regressor_double.onnx in Netron
2.2.8. sklearn.ensemble.GradientBoostingRegressor
GradientBoostingRegressor is a machine learning method used for regression tasks. It's part of the ensemble methods family and is based on the idea of building weak models and combining them into a strong model using gradient boosting..
Gradient boosting is a technique to enhance models by iteratively adding weak models and correcting the errors of previous models.
Here's how GradientBoostingRegressor works:
- Initialization: It starts with the original dataset containing features (independent variables) and their corresponding target values.
- First Model: It begins by training the first model, often chosen as a simple regression model (e.g., decision tree) on the original data.
- Residuals and Anti-Gradient: Residuals, the difference between the predicted values of the first model and the actual target variable values, are computed. Then, the anti-gradient of this loss function is calculated, indicating the direction to improve the model.
- Building the Next Model: The next model is constructed, focusing on predicting the anti-gradient (errors of the first model). This model is trained on residuals and added to the first model.
- Iterations: The process of constructing new models and correcting residuals is repeated multiple times. Each new model takes into account the residuals of the previous models and aims to enhance predictions.
- Model Combination: Predictions of all models are combined into the final prediction through averaging or weighting them according to their importance.
Advantages of GradientBoostingRegressor:
- High Performance: Gradient boosting is a powerful method capable of achieving high performance in regression tasks.
- Robustness to Outliers: It handles outliers in data and constructs models considering this uncertainty.
- Automatic Feature Selection: It automatically selects the most important features for predicting the target variable.
- Handling Various Loss Functions: The method allows the use of different loss functions depending on the task.
Limitations of GradientBoostingRegressor:
- Hyperparameter Tuning Required: Achieving maximum performance necessitates tuning hyperparameters such as learning rate, tree depth, and model count.
- Computationally Expensive: Gradient boosting can be computationally expensive, especially with large volumes of data and a high number of trees.
GradientBoostingRegressor is a powerful regression method often used in practical tasks to achieve high performance with the correct hyperparameter tuning.
2.2.8.1. Code for creating the GradientBoostingRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.ensemble.GradientBoostingRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training GradientBoostingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "GradientBoostingRegressor"
onnx_model_filename = data_path + "gradient_boosting_regressor"
# create a Gradient Boosting Regressor model
regression_model = GradientBoostingRegressor()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python GradientBoostingRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9999959514652565 Python Mean Absolute Error: 0.15069342754017417 Python Mean Squared Error: 0.053573282108575676 Python Python GradientBoostingRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9999959514739537 Python Mean Absolute Error: 0.15069457426101718 Python Mean Squared Error: 0.05357316702127665 Python R^2 matching decimal places: 10 Python MAE matching decimal places: 5 Python MSE matching decimal places: 6 Python float ONNX model precision: 5 Python Python GradientBoostingRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_double.onnx
Errors tab:
GradientBoostingRegressor.py started GradientBoostingRegressor.py 1 1 Traceback (most recent call last): GradientBoostingRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) GradientBoostingRegressor.py 161 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 419 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 452 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\gradient_boosting_regressor_double.onnx failed:Typ onnxruntime_inference_collection.py 452 1 GradientBoostingRegressor.py finished in 3073 ms 5 1
Fig.121. Results of the GradientBoostingRegressor.py (float ONNX)
2.2.8.2. MQL5 code for executing ONNX Models
This code executes the gradient_boosting_regressor_float.onnx and gradient_boosting_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| GradientBoostingRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "GradientBoostingRegressor" #define ONNXFilenameFloat "gradient_boosting_regressor_float.onnx" #define ONNXFilenameDouble "gradient_boosting_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
GradientBoostingRegressor (EURUSD,H1) Testing ONNX float: GradientBoostingRegressor (gradient_boosting_regressor_float.onnx) GradientBoostingRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9999959514739537 GradientBoostingRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 0.1506945742610172 GradientBoostingRegressor (EURUSD,H1) MQL5: Mean Squared Error: 0.0535731670212767 GradientBoostingRegressor (EURUSD,H1) GradientBoostingRegressor (EURUSD,H1) Testing ONNX double: GradientBoostingRegressor (gradient_boosting_regressor_double.onnx) GradientBoostingRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\GradientBoostingRegressor.mq5' (133:16) GradientBoostingRegressor (EURUSD,H1) model_name=GradientBoostingRegressor OnnxCreate error 5800
The float ONNX model executed normally, but an error occurred when executing the ONNX model in double.
Comparison with the original double precision model in Python:
Testing ONNX float: GradientBoostingRegressor (gradient_boosting_regressor_float.onnx) Python Mean Absolute Error: 0.15069342754017417 MQL5: Mean Absolute Error: 0.1506945742610172
Accuracy of ONNX float MAE: 5 decimal places.
2.2.8.3. ONNX representation of the gradient_boosting_regressor_float.onnx and gradient_boosting_regressor_double.onnx
Fig.122. ONNX-representastion of the gradient_boosting_regressor_float.onnx in Netron
Fig.123. ONNX representation of the gradient_boosting_regressor_double.onnx in Netron
2.2.9. sklearn.ensemble.HistGradientBoostingRegressor
HistGradientBoostingRegressor is a machine learning method that represents a variation of gradient boosting optimized for working with large datasets.
This method is used for regression tasks, and its name "Hist" signifies that it employs histogram-based methods to expedite the training process.
How HistGradientBoostingRegressor Works:
- Initialization: It starts with the original dataset containing features (independent variables) and their corresponding target values.
- Histogram-Based Methods: Instead of exact data splitting at tree nodes, HistGradientBoostingRegressor uses histogram-based methods to efficiently represent data in the form of histograms. This significantly speeds up the training process, especially on large datasets.
- Building Base Trees: The method constructs a set of base decision trees referred to as "histogram decision trees" using the histogram representations of the data. These trees are built based on gradient boosting and adjusted to residuals of the previous model.
- Gradual Training: HistGradientBoostingRegressor incrementally adds new trees to the ensemble, with each tree correcting the residuals of the previous trees.
- Model Combination: After building the base trees, predictions from all trees are combined to obtain the final prediction.
Advantages of HistGradientBoostingRegressor:
- High Performance: This method is optimized to handle large volumes of data and can achieve high performance.
- Noise Robustness: HistGradientBoostingRegressor generally performs well even in the presence of noise in data.
- High-Dimensional Efficiency: The method can handle tasks with a high number of features (high-dimensional data).
- Excellent Parallelization: It can efficiently parallelize training across multiple processors.
Limitations of HistGradientBoostingRegressor:
- Requires Hyperparameter Tuning: Achieving maximum performance demands tuning hyperparameters such as tree depth and model count.
- Less Interpretability Than Linear Models: Like other ensemble methods, HistGradientBoostingRegressor is less interpretable than simpler models like linear regression.
HistGradientBoostingRegressor can be a useful regression method for tasks involving large datasets where high performance and high-dimensional data efficiency are essential.
2.2.9.1. Code for creating the HistGradientBoostingRegressor model and exporting it to ONNX for float and double
This code creates the sklearn.ensemble.HistGradientBoostingRegressor model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training HistGradientBoostingRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "HistGradientBoostingRegressor"
onnx_model_filename = data_path + "hist_gradient_boosting_regressor"
# create a Histogram-Based Gradient Boosting Regressor model
hist_gradient_boosting_model = HistGradientBoostingRegressor()
# fit the model to the data
hist_gradient_boosting_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = hist_gradient_boosting_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(hist_gradient_boosting_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(hist_gradient_boosting_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python HistGradientBoostingRegressor Original model (double) Python R-squared (Coefficient of determination): 0.9833421349506157 Python Mean Absolute Error: 9.070567104488434 Python Mean Squared Error: 220.4295035561544 Python Python HistGradientBoostingRegressor ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.9833421351962779 Python Mean Absolute Error: 9.07056497799043 Python Mean Squared Error: 220.42950030536645 Python R^2 matching decimal places: 8 Python MAE matching decimal places: 5 Python MSE matching decimal places: 5 Python float ONNX model precision: 5 Python Python HistGradientBoostingRegressor ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_double.onnx
Errors tab:
HistGradientBoostingRegressor.py started HistGradientBoostingRegressor.py 1 1 Traceback (most recent call last): HistGradientBoostingRegressor.py 1 1 onnx_session = ort.InferenceSession(onnx_filename) HistGradientBoostingRegressor.py 161 1 self._create_inference_session(providers, provider_options, disabled_optimizers) onnxruntime_inference_collection.py 419 1 sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime_inference_collection.py 452 1 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\hist_gradient_boosting_regressor_double.onnx faile onnxruntime_inference_collection.py 452 1 HistGradientBoostingRegressor.py finished in 3100 ms 5 1
Fig.124. Results of the HistGradientBoostingRegressor.py (float ONNX)
2.2.9.2. MQL5 code for executing ONNX Models
This code executes the saved hist_gradient_boosting_regressor_float.onnx and hist_gradient_boosting_regressor_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| HistGradientBoostingRegressor.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "HistGradientBoostingRegressor" #define ONNXFilenameFloat "hist_gradient_boosting_regressor_float.onnx" #define ONNXFilenameDouble "hist_gradient_boosting_regressor_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
HistGradientBoostingRegressor (EURUSD,H1) Testing ONNX float: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_float.onnx) HistGradientBoostingRegressor (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.9833421351962779 HistGradientBoostingRegressor (EURUSD,H1) MQL5: Mean Absolute Error: 9.0705649779904292 HistGradientBoostingRegressor (EURUSD,H1) MQL5: Mean Squared Error: 220.4295003053665312 HistGradientBoostingRegressor (EURUSD,H1) HistGradientBoostingRegressor (EURUSD,H1) Testing ONNX double: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_double.onnx) HistGradientBoostingRegressor (EURUSD,H1) ONNX: cannot create session (OrtStatus: 1 'Type Error: Type (tensor(double)) of output arg (variable) of node (TreeEnsembleRegressor) does not match expected type (tensor(float)).'), inspect code 'Scripts\Regression\HistGradientBoostingRegressor.mq5' (133:16) HistGradientBoostingRegressor (EURUSD,H1) model_name=HistGradientBoostingRegressor OnnxCreate error 5800
The float ONNX model executed normally, but an error occurred when executing the ONNX model in double.
Comparison with the original double precision model in Python:
Testing ONNX float: HistGradientBoostingRegressor (hist_gradient_boosting_regressor_float.onnx) Python Mean Absolute Error: 9.070567104488434 MQL5: Mean Absolute Error: 9.0705649779904292
Accuracy of ONNX float MAE: 5 decimal places
2.2.9.3. ONNX representation of the hist_gradient_boosting_regressor_float.onnx and hist_gradient_boosting_regressor_double.onnx
Fig.125. ONNX representation of the hist_gradient_boosting_regressor_float.onnx in Netron
Fig.126. ONNX representation of the hist_gradient_boosting_regressor_double.onnx in Netron
2.2.10. sklearn.svm.SVR
SVR (Support Vector Regression) is a machine learning method used for regression tasks. It is based on the same concept as the Support Vector Machine (SVM) for classification but is adapted for regression. The primary goal of SVR is to predict continuous values of the target variable by relying on the maximum average distance between data points and the regression line.
How SVR Works:
- Boundary Definition: Similar to SVM, SVR constructs boundaries that separate different classes of data points. Instead of class separation, SVR aims to build a "tube" around data points, where the tube's width is controlled by a hyperparameter.
- Target Variable and Loss Function: Instead of using classes as in classification, SVR deals with continuous values of the target variable. It minimizes the prediction error measured using a loss function, such as the squared difference between the predicted and actual values.
- Regularization: SVR also supports regularization, aiding in controlling model complexity and preventing overfitting.
- Kernel Functions: SVR typically employs kernel functions that allow it to handle nonlinear dependencies between features and the target variable. Popular kernel functions include the radial basis function (RBF), polynomial, and linear functions.
Advantages of SVR:
- Robustness to Outliers: SVR can handle outliers in data as it aims to minimize prediction error.
- Support for Nonlinear Dependencies: The use of kernel functions enables SVR to model complex and nonlinear dependencies between features and the target variable.
- High Prediction Quality: In regression tasks that require precise predictions, SVR can provide high-quality results.
Limitations of SVR:
- Sensitivity to Hyperparameters: Choosing the kernel function and model parameters, such as the tube width (hyperparameters), may require careful tuning and optimization.
- Computational Complexity: Training the SVR model, especially when using complex kernel functions and large datasets, can be computationally intensive.
SVR is a machine learning method for regression tasks based on the idea of constructing a "tube" around data points to minimize prediction errors. It exhibits robustness to outliers and the ability to handle nonlinear dependencies, making it useful in various regression tasks.
2.2.10.1. Code for creating the SVR model and exporting it to ONNX for float and double
This code creates the sklearn.svm.SVR model, trains it on synthetic data, saves the model in the ONNX format, and performs predictions using both float and double input data. It also evaluates the accuracy of both the original model and the models exported to ONNX.
# The code demonstrates the process of training SVR model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "SVR"
onnx_model_filename = data_path + "svr"
# create an SVR model
regression_model = SVR()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python SVR Original model (double) Python R-squared (Coefficient of determination): 0.398243655775797 Python Mean Absolute Error: 73.63683696034649 Python Mean Squared Error: 7962.89631509593 Python Python SVR ONNX model (float) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\svr_float.onnx Python Information about input tensors in ONNX: Python 1. Name: float_input, Data Type: tensor(float), Shape: [None, 1] Python Information about output tensors in ONNX: Python 1. Name: variable, Data Type: tensor(float), Shape: [None, 1] Python R-squared (Coefficient of determination) 0.3982436352100983 Python Mean Absolute Error: 73.63683840363255 Python Mean Squared Error: 7962.896587236852 Python R^2 matching decimal places: 7 Python MAE matching decimal places: 5 Python MSE matching decimal places: 3 Python float ONNX model precision: 5 Python Python SVR ONNX model (double) Python ONNX model saved to C:\Users\user\AppData\Roaming\MetaQuotes\Terminal\D0E8209F77C8CF37AD8BF550E51FF075\MQL5\Scripts\Regression\svr_double.onnx
Fig.127. Results of the SVR.py (float ONNX)
2.2.10.2. MQL5 code for executing ONNX Models
This code executes the saved svr_float.onnx and svr_double.onnx models and demonstrating the use of regression metrics in MQL5.
//+------------------------------------------------------------------+ //| SVR.mq5 | //| Copyright 2023, MetaQuotes Ltd. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Ltd." #property link "https://www.mql5.com" #property version "1.00" #define ModelName "SVR" #define ONNXFilenameFloat "svr_float.onnx" #define ONNXFilenameDouble "svr_double.onnx" #resource ONNXFilenameFloat as const uchar ExtModelFloat[]; #resource ONNXFilenameDouble as const uchar ExtModelDouble[]; #define TestFloatModel 1 #define TestDoubleModel 2 //+------------------------------------------------------------------+ //| Calculate regression using float values | //+------------------------------------------------------------------+ bool RunModelFloat(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor float input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=(float)input_vector[k]; //--- prepare output tensor float output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Calculate regression using double values | //+------------------------------------------------------------------+ bool RunModelDouble(long model,vector &input_vector, vector &output_vector) { //--- check number of input samples ulong batch_size=input_vector.Size(); if(batch_size==0) return(false); //--- prepare output array output_vector.Resize((int)batch_size); //--- prepare input tensor double input_data[]; ArrayResize(input_data,(int)batch_size); //--- set input shape ulong input_shape[]= {batch_size, 1}; OnnxSetInputShape(model,0,input_shape); //--- copy data to the input tensor for(int k=0; k<(int)batch_size; k++) input_data[k]=input_vector[k]; //--- prepare output tensor double output_data[]; ArrayResize(output_data,(int)batch_size); //--- set output shape ulong output_shape[]= {batch_size,1}; OnnxSetOutputShape(model,0,output_shape); //--- run the model bool res=OnnxRun(model,ONNX_DEBUG_LOGS,input_data,output_data); //--- copy output to vector if(res) { for(int k=0; k<(int)batch_size; k++) output_vector[k]=output_data[k]; } //--- return(res); } //+------------------------------------------------------------------+ //| Generate synthetic data | //+------------------------------------------------------------------+ bool GenerateData(const int n,vector &x,vector &y) { if(n<=0) return(false); //--- prepare arrays x.Resize(n); y.Resize(n); //--- for(int i=0; i<n; i++) { x[i]=(double)1.0*i; y[i]=(double)(4*x[i] + 10*sin(x[i]*0.5)); } //--- return(true); } //+------------------------------------------------------------------+ //| TestRegressionModel | //+------------------------------------------------------------------+ bool TestRegressionModel(const string model_name,const int model_type) { //--- long model=INVALID_HANDLE; ulong flags=ONNX_DEFAULT; if(model_type==TestFloatModel) { PrintFormat("\nTesting ONNX float: %s (%s)",model_name,ONNXFilenameFloat); model=OnnxCreateFromBuffer(ExtModelFloat,flags); } else if(model_type==TestDoubleModel) { PrintFormat("\nTesting ONNX double: %s (%s)",model_name,ONNXFilenameDouble); model=OnnxCreateFromBuffer(ExtModelDouble,flags); } else { PrintFormat("Model type is not incorrect."); return(false); } //--- check if(model==INVALID_HANDLE) { PrintFormat("model_name=%s OnnxCreate error %d",model_name,GetLastError()); return(false); } //--- vector x_values= {}; vector y_true= {}; vector y_predicted= {}; //--- int n=100; GenerateData(n,x_values,y_true); //--- bool run_result=false; if(model_type==TestFloatModel) { run_result=RunModelFloat(model,x_values,y_predicted); } else if(model_type==TestDoubleModel) { run_result=RunModelDouble(model,x_values,y_predicted); } //--- if(run_result) { PrintFormat("MQL5: R-Squared (Coefficient of determination): %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_R2)); PrintFormat("MQL5: Mean Absolute Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MAE)); PrintFormat("MQL5: Mean Squared Error: %.16f",y_predicted.RegressionMetric(y_true,REGRESSION_MSE)); } else PrintFormat("Error %d",GetLastError()); //--- release model OnnxRelease(model); //--- return(true); } //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ int OnStart(void) { //--- test ONNX regression model for float TestRegressionModel(ModelName,TestFloatModel); //--- test ONNX regression model for double TestRegressionModel(ModelName,TestDoubleModel); //--- return(0); } //+------------------------------------------------------------------+
Output:
SVR (EURUSD,H1) Testing ONNX float: SVR (svr_float.onnx) SVR (EURUSD,H1) MQL5: R-Squared (Coefficient of determination): 0.3982436352100981 SVR (EURUSD,H1) MQL5: Mean Absolute Error: 73.6368384036325523 SVR (EURUSD,H1) MQL5: Mean Squared Error: 7962.8965872368517012 SVR (EURUSD,H1) SVR (EURUSD,H1) Testing ONNX double: SVR (svr_double.onnx) SVR (EURUSD,H1) ONNX: cannot create session (OrtStatus: 9 'Could not find an implementation for SVMRegressor(1) node with name 'SVM''), inspect code 'Scripts\R\SVR.mq5' (133:16) SVR (EURUSD,H1) model_name=SVR OnnxCreate error 5800
The float ONNX model executed normally, but an error occurred when executing the ONNX model in double.
Comparison with the original double precision model in Python:
Testing ONNX float: SVR (svr_float.onnx) Python Mean Absolute Error: 73.63683696034649 MQL5: Mean Absolute Error: 73.6368384036325523
Accuracy of ONNX float MAE: 5 decimal places
2.2.10.3. ONNX representation of svr_float.onnx and svr_double.onnx
Fig.128. ONNX representation of svr_float.onnx in Netron
Fig.129. ONNX representation of the svr_double.onnx in Netron
2.3. Regression Models that Encountered Problems When Converting to ONNX
Some regression models couldn't be converted into the ONNX format by the sklearn-onnx converter.
2.3.1. sklearn.dummy.DummyRegressor
The DummyRegressor is a machine learning method used in regression tasks to create a baseline model that predicts the target variable using simple rules. It's valuable for comparison with other more complex models and evaluating their performance. This method is often used in the context of assessing the quality of other regression models.
The DummyRegressor offers several strategies for prediction:
- "mean" (default): DummyRegressor predicts the mean value of the target variable from the training dataset. This strategy is useful to determine how much better another model is compared to simply predicting the mean.
- "median": DummyRegressor predicts the median value of the target variable from the training dataset.
- "quantile": DummyRegressor predicts the quantile value of the target variable (specified by the quantile parameter) from the training dataset.
- "constant": DummyRegressor predicts a constant value set by the user (using the strategy parameter).
Advantages of DummyRegressor:
- Performance Assessment: DummyRegressor is useful for evaluating the performance of other more complex models. If your model can't outperform predictions made by DummyRegressor, it might indicate issues in the model.
- Comparison with Baseline Models: DummyRegressor allows for comparing the performance of more complex models against a baseline (e.g., mean or median value).
- User-Friendly: DummyRegressor is easy to implement and use for comparative analysis.
Limitations of DummyRegressor:
- Not for Accurate Prediction: DummyRegressor provides only basic baseline predictions and is not intended for accurate forecasting.
- Ignores Complex Dependencies: DummyRegressor disregards complex data structures and feature dependencies.
- Not Suitable for Tasks Requiring Accurate Prediction: In real-world prediction tasks, using DummyRegressor for forecasting the target variable is insufficient.
DummyRegressor is valuable as a tool for a quick assessment and performance comparison of other regression models, but it isn't a standalone serious regression model.
2.3.1.1. Code for creating the DummyRegressor model
# The code demonstrates the process of training DummyRegressor model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.dummy import DummyRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "DummyRegressor"
onnx_model_filename = data_path + "dummy_regressor"
# create an Dummy Regressor model
regression_model = DummyRegressor(strategy="mean")
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python DummyRegressor Original model (double) Python R-squared (Coefficient of determination): 0.0 Python Mean Absolute Error: 100.00329851715793 Python Mean Squared Error: 13232.758393867645
Errors tab:
DummyRegressor.py started DummyRegressor.py 1 1 Traceback (most recent call last): DummyRegressor.py 1 1 onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12) DummyRegressor.py 87 1 onnx_model = convert_topology( convert.py 208 1 topology.convert_operators(container=container, verbose=verbose) _topology.py 1532 1 self.call_shape_calculator(operator) _topology.py 1348 1 operator.infer_types() _topology.py 1163 1 raise MissingShapeCalculator( _topology.py 629 1 skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.dummy.DummyRegressor'>'. _topology.py 629 1 It usually means the pipeline being converted contains a _topology.py 629 1 transformer or a predictor with no corresponding converter _topology.py 629 1 implemented in sklearn-onnx. If the converted is implemented _topology.py 629 1 in another library, you need to register _topology.py 629 1 the converted so that it can be used by sklearn-onnx (function _topology.py 629 1 update_registered_converter). If the model is not yet covered _topology.py 629 1 by sklearn-onnx, you may raise an issue to _topology.py 629 1 https://github.com/onnx/sklearn-onnx/issues _topology.py 629 1 to get the converter implemented or even contribute to the _topology.py 629 1 project. If the model is a custom model, a new converter must _topology.py 629 1 be implemented. Examples can be found in the gallery. _topology.py 629 1 DummyRegressor.py finished in 2565 ms 19 1
2.3.2. sklearn.kernel_ridge.KernelRidge
KernelRidge is a machine learning method used for regression tasks. It combines the kernel method of Support Vector Machines (Kernel SVM) and regression. KernelRidge enables the modeling of complex, nonlinear relationships between features and the target variable using kernel functions.
Working principle of KernelRidge:
- Input data: It starts with the original dataset containing features (independent variables) and their corresponding target variable values.
- Kernel functions: KernelRidge uses kernel functions (such as polynomial, RBF - radial basis function, and others) that transform data into a high-dimensional space, allowing the modeling of more complex nonlinear relationships.
- Model training: The model is trained on the data by minimizing the mean squared error between predicted values and the actual target variable values. Kernel functions are used to account for complex dependencies.
- Prediction: After training, the model can be used to predict target variable values for new data, using the same kernel functions.
Advantages of KernelRidge:
- Modeling complex nonlinear relationships: KernelRidge allows the modeling of complex and nonlinear dependencies between features and the target variable.
- Selection of different kernels: You can choose different kernels depending on the nature of the data and the task.
- Regularization: The method includes regularization, helping prevent model overfitting.
Limitations of KernelRidge:
- Lack of interpretability: Like many nonlinear methods, KernelRidge is less interpretable than linear models.
- Computational complexity: Using kernel functions can be computationally expensive with large volumes of data and/or high dimensionality.
- Parameter tuning requirement: Choosing the appropriate kernel and model parameters requires tuning and expertise.
KernelRidge is useful in regression tasks where data exhibits complex, nonlinear dependencies, and a model capable of considering these relationships is required. It is also helpful in tasks where kernel functions can be utilized to transform data into a more informative representation.
2.3.2.1. Code for creating the KernelRidge model
# The code demonstrates the process of training KernelRidge model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.kernel_ridge import KernelRidge
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "KernelRidge"
onnx_model_filename = data_path + "kernel_ridge"
# create an KernelRidge model
regression_model = KernelRidge(alpha=1.0, kernel='linear')
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python KernelRidge Original model (double) Python R-squared (Coefficient of determination): 0.9962137909675411 Python Mean Absolute Error: 6.36977985227399 Python Mean Squared Error: 50.10198935520715
Errors tab:
KernelRidge.py started KernelRidge.py 1 1 Traceback (most recent call last): KernelRidge.py 1 1 onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12) KernelRidge.py 87 1 onnx_model = convert_topology( convert.py 208 1 topology.convert_operators(container=container, verbose=verbose) _topology.py 1532 1 self.call_shape_calculator(operator) _topology.py 1348 1 operator.infer_types() _topology.py 1163 1 raise MissingShapeCalculator( _topology.py 629 1 skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.kernel_ridge.KernelRidge'>'. _topology.py 629 1 It usually means the pipeline being converted contains a _topology.py 629 1 transformer or a predictor with no corresponding converter _topology.py 629 1 implemented in sklearn-onnx. If the converted is implemented _topology.py 629 1 in another library, you need to register _topology.py 629 1 the converted so that it can be used by sklearn-onnx (function _topology.py 629 1 update_registered_converter). If the model is not yet covered _topology.py 629 1 by sklearn-onnx, you may raise an issue to _topology.py 629 1 https://github.com/onnx/sklearn-onnx/issues _topology.py 629 1 to get the converter implemented or even contribute to the _topology.py 629 1 project. If the model is a custom model, a new converter must _topology.py 629 1 be implemented. Examples can be found in the gallery. _topology.py 629 1 KernelRidge.py finished in 2516 ms 19 1
2.3.3. sklearn.isotonic.IsotonicRegression
IsotonicRegression - is a machine learning method used for regression tasks that models a monotonic relationship between features and the target variable. In this context, "monotonicity" means that an increase in the value of one of the features leads to an increase or decrease in the value of the target variable, while preserving the direction of change.
Working principle of IsotonicRegression:
- Input data: It starts with the original dataset containing features (independent variables) and their corresponding target variable values.
- Monotonic regression: IsotonicRegression aims to find the best monotonic function that describes the relationship between the features and the target variable. This function can be linear or nonlinear but must maintain monotonicity.
- Model training: The model is trained on the data to determine the parameters of the monotonic function. During training, the model tries to minimize the sum of squared errors between predictions and the actual target variable values.
- Prediction: After training, the model can be used to predict target variable values for new data while maintaining the monotonic relationship.
Advantages of IsotonicRegression:
- Modeling monotonic relationships: This method is an ideal choice when data demonstrates monotonic dependencies, and it's important to maintain this characteristic in the model.
- Interpretability: Monotonic models can be more interpretable as they allow a clear definition of the influence direction of each feature on the target variable.
Limitations of IsotonicRegression:
- Not suitable for complex, nonlinear relationships: This method is limited to modeling monotonic relationships and, therefore, is not suitable for modeling complex nonlinear dependencies.
- Parameter tuning: Some IsotonicRegression implementations might have parameters that require tuning to achieve optimal performance.
IsotonicRegression is useful in tasks where the monotonicity of the relationship between features and the target variable is considered an important factor, and there is a need to build a model that preserves this characteristic.
2.3.3.1. Code for creating the IsotonicRegression models
# The code demonstrates the process of training IsotonicRegression model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.isotonic import IsotonicRegression
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "IsotonicRegression"
onnx_model_filename = data_path + "isotonic_regression"
# create an IsotonicRegression model
regression_model = IsotonicRegression()
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python IsotonicRegression Original model (double) Python R-squared (Coefficient of determination): 0.9999898125037958 Python Mean Absolute Error: 0.20093409873424467 Python Mean Squared Error: 0.13480867590911208
Errors tab:
IsotonicRegression.py started IsotonicRegression.py 1 1 Traceback (most recent call last): IsotonicRegression.py 1 1 onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12) IsotonicRegression.py 87 1 onnx_model = convert_topology( convert.py 208 1 topology.convert_operators(container=container, verbose=verbose) _topology.py 1532 1 self.call_shape_calculator(operator) _topology.py 1348 1 operator.infer_types() _topology.py 1163 1 raise MissingShapeCalculator( _topology.py 629 1 skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.isotonic.IsotonicRegression'>'. _topology.py 629 1 It usually means the pipeline being converted contains a _topology.py 629 1 transformer or a predictor with no corresponding converter _topology.py 629 1 implemented in sklearn-onnx. If the converted is implemented _topology.py 629 1 in another library, you need to register _topology.py 629 1 the converted so that it can be used by sklearn-onnx (function _topology.py 629 1 update_registered_converter). If the model is not yet covered _topology.py 629 1 by sklearn-onnx, you may raise an issue to _topology.py 629 1 https://github.com/onnx/sklearn-onnx/issues _topology.py 629 1 to get the converter implemented or even contribute to the _topology.py 629 1 project. If the model is a custom model, a new converter must _topology.py 629 1 be implemented. Examples can be found in the gallery. _topology.py 629 1 IsotonicRegression.py finished in 2499 ms 19 1
2.3.4. sklearn.cross_decomposition.PLSCanonical
PLSCanonical (Partial Least Squares Canonical) is a machine learning method used to solve canonical correlation problems. It is an extension of the Partial Least Squares (PLS) method and is applied to analyze and model relationships between two sets of variables.
Working principle of PLSCanonical:
- Input data: It starts with two datasets (X and Y), where each set represents a collection of variables (features). Usually, X and Y contain correlated data, and the task is to find linear combinations of features that maximize the correlation between them.
- Selection of linear combinations: PLSCanonical finds linear combinations (components) in both X and Y to maximize the correlation between the components of the two datasets. These components are called canonical variables.
- Maximum correlation search: The primary goal of PLSCanonical is to find canonical variables that maximize the correlation between X and Y, highlighting the most informative relationships between the two datasets.
- Model training: Once the canonical variables are found, they can be used to create a model that predicts Y values based on X.
- Generating predictions: After training, the model can be used to predict Y values in new data using corresponding X values.
Advantages of PLSCanonical:
- Correlation analysis: PLSCanonical allows the analysis and modeling of correlations between two datasets, which can be useful for understanding the relationships between variables.
- Dimensionality reduction: The method can also be used to reduce the data dimensionality, highlighting the most important components.
Limitations of PLSCanonical:
- Sensitivity to the choice of the number of components: Selecting the optimal number of canonical variables may require some experimentation.
- Dependency on data structure: The results of PLSCanonical can heavily depend on the data structure and correlations between them.
PLSCanonical is a machine learning method used to analyze and model correlations between two sets of variables. This method enables studying relationships between data and can be useful for reducing data dimensionality and predicting values based on correlated components.
2.3.4.1. Code for creating the PLSCanonical
# The code demonstrates the process of training PLSCanonical model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import PLSCanonical
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name = "PLSCanonical"
onnx_model_filename = data_path + "pls_canonical"
# create an PLSCanonical model
regression_model = PLSCanonical(n_components=1)
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8, 5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python Python PLSCanonical Original model (double) Python R-squared (Coefficient of determination): 0.9962347199278333 Python Mean Absolute Error: 6.3561407034365995 Python Mean Squared Error: 49.82504148022689
Errors tab:
PLSCanonical.py started PLSCanonical.py 1 1 Traceback (most recent call last): PLSCanonical.py 1 1 onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12) PLSCanonical.py 87 1 onnx_model = convert_topology( convert.py 208 1 topology.convert_operators(container=container, verbose=verbose) _topology.py 1532 1 self.call_shape_calculator(operator) _topology.py 1348 1 operator.infer_types() _topology.py 1163 1 raise MissingShapeCalculator( _topology.py 629 1 skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.cross_decomposition._pls.PLSCanonical'>'. _topology.py 629 1 It usually means the pipeline being converted contains a _topology.py 629 1 transformer or a predictor with no corresponding converter _topology.py 629 1 implemented in sklearn-onnx. If the converted is implemented _topology.py 629 1 in another library, you need to register _topology.py 629 1 the converted so that it can be used by sklearn-onnx (function _topology.py 629 1 update_registered_converter). If the model is not yet covered _topology.py 629 1 by sklearn-onnx, you may raise an issue to _topology.py 629 1 https://github.com/onnx/sklearn-onnx/issues _topology.py 629 1 to get the converter implemented or even contribute to the _topology.py 629 1 project. If the model is a custom model, a new converter must _topology.py 629 1 be implemented. Examples can be found in the gallery. _topology.py 629 1 PLSCanonical.py finished in 2513 ms 19 1
2.3.5. sklearn.cross_decomposition.CCA
Canonical Correlation Analysis (CCA)is a multivariate statistical analysis method used to study the relationships between two sets of variables (set X and set Y). The main goal of CCA is to find linear combinations of variables X and Y that maximize the correlation between them. These linear combinations are called canonical variables.
Working principle of CCA:
- Input data: It starts with two sets of variables X and Y. There can be any number of variables in these sets, and CCA attempts to find linear combinations that maximize the correlation between them.
- Construction of canonical variables: CCA identifies canonical variables in X and Y that maximize their correlation. These canonical variables are linear combinations of the original variables, one for each canonical indicator.
- Correlation assessment: CCA evaluates the correlation between pairs of canonical variables. Canonical variables are usually ordered by decreasing correlation, so the first pair has the highest correlation, the second has the next highest, and so on.
- Interpretation: Canonical variables can be interpreted considering their correlation and variable weights. This allows understanding which variables from sets X and Y are most strongly related.
Advantages of CCA:
- Reveals hidden connections: CCA can help discover hidden correlations between two sets of variables that may not be obvious during initial analysis.
- Robust to noise: CCA can account for noise in data and focus on the most significant correlations.
- Multiple applications: CCA can be used in various fields including statistics, bioinformatics, finance, among others, to study relationships between sets of variables.
Limitations of CCA:
- Requires more data: CCA might require a larger amount of data than other analysis methods to reliably estimate correlations.
- Linear relationships: CCA assumes linear relationships between variables, which might be insufficient in some cases.
- Interpretation complexity: Interpreting canonical variables can be complex, especially when there are many variables in sets X and Y.
CCA is beneficial in tasks where studying the relationship between two sets of variables and uncovering hidden correlations is required.
2.3.5.1. Code for creating the CCA model
# The code demonstrates the process of training CCA model, exporting it to ONNX format (both float and double), and making predictions using the ONNX models.
# Copyright 2023, MetaQuotes Ltd.
# https://www.mql5.com
# function to compare matching decimal places
def compare_decimal_places(value1, value2):
# convert both values to strings
str_value1 = str(value1)
str_value2 = str(value2)
# find the positions of the decimal points in the strings
dot_position1 = str_value1.find(".")
dot_position2 = str_value2.find(".")
# if one of the values doesn't have a decimal point, return 0
if dot_position1 == -1 or dot_position2 == -1:
return 0
# calculate the number of decimal places
decimal_places1 = len(str_value1) - dot_position1 - 1
decimal_places2 = len(str_value2) - dot_position2 - 1
# find the minimum of the two decimal places counts
min_decimal_places = min(decimal_places1, decimal_places2)
# initialize a count for matching decimal places
matching_count = 0
# compare characters after the decimal point
for i in range(1, min_decimal_places + 1):
if str_value1[dot_position1 + i] == str_value2[dot_position2 + i]:
matching_count += 1
else:
break
return matching_count
# import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_decomposition import CCA
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
from sys import argv
# define the path for saving the model
data_path = argv[0]
last_index = data_path.rfind("\\") + 1
data_path = data_path[0:last_index]
# generate synthetic data for regression
X = np.arange(0,100,1).reshape(-1,1)
y = 4*X + 10*np.sin(X*0.5)
model_name="CCA"
onnx_model_filename = data_path + "cca"
# create an CCA model
regression_model = CCA(n_components=1)
# fit the model to the data
regression_model.fit(X, y.ravel())
# predict values for the entire dataset
y_pred = regression_model.predict(X)
# evaluate the model's performance
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
print("\n"+model_name+" Original model (double)")
print("R-squared (Coefficient of determination):", r2)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
# convert to ONNX-model (float)
# define the input data type as FloatTensorType
initial_type_float = [('float_input', FloatTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_float.onnx"
onnx.save_model(onnx_model_float, onnx_filename)
print("\n"+model_name+" ONNX model (float)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as FloatTensorType
initial_type_float = X.astype(np.float32)
# predict values for the entire dataset using ONNX
y_pred_onnx_float = onnx_session.run([output_name], {input_name: initial_type_float})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_float = r2_score(y, y_pred_onnx_float)
mse_onnx_float = mean_squared_error(y, y_pred_onnx_float)
mae_onnx_float = mean_absolute_error(y, y_pred_onnx_float)
print("R-squared (Coefficient of determination)", r2_onnx_float)
print("Mean Absolute Error:", mae_onnx_float)
print("Mean Squared Error:", mse_onnx_float)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_float))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_float))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_float))
print("float ONNX model precision: ",compare_decimal_places(mae, mae_onnx_float))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with float ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_float.png')
# convert to ONNX-model (double)
# define the input data type as DoubleTensorType
initial_type_double = [('double_input', DoubleTensorType([None, X.shape[1]]))]
# export the model to ONNX format
onnx_model_double = convert_sklearn(regression_model, initial_types=initial_type_double, target_opset=12)
# save the model to a file
onnx_filename=onnx_model_filename+"_double.onnx"
onnx.save_model(onnx_model_double, onnx_filename)
print("\n"+model_name+" ONNX model (double)")
# print model path
print(f"ONNX model saved to {onnx_filename}")
# load the ONNX model and make predictions
onnx_session = ort.InferenceSession(onnx_filename)
input_name = onnx_session.get_inputs()[0].name
output_name = onnx_session.get_outputs()[0].name
# display information about input tensors in ONNX
print("Information about input tensors in ONNX:")
for i, input_tensor in enumerate(onnx_session.get_inputs()):
print(f"{i + 1}. Name: {input_tensor.name}, Data Type: {input_tensor.type}, Shape: {input_tensor.shape}")
# display information about output tensors in ONNX
print("Information about output tensors in ONNX:")
for i, output_tensor in enumerate(onnx_session.get_outputs()):
print(f"{i + 1}. Name: {output_tensor.name}, Data Type: {output_tensor.type}, Shape: {output_tensor.shape}")
# define the input data type as DoubleTensorType
initial_type_double = X.astype(np.float64)
# predict values for the entire dataset using ONNX
y_pred_onnx_double = onnx_session.run([output_name], {input_name: initial_type_double})[0]
# calculate and display the errors for the original and ONNX models
r2_onnx_double = r2_score(y, y_pred_onnx_double)
mse_onnx_double = mean_squared_error(y, y_pred_onnx_double)
mae_onnx_double = mean_absolute_error(y, y_pred_onnx_double)
print("R-squared (Coefficient of determination)", r2_onnx_double)
print("Mean Absolute Error:", mae_onnx_double)
print("Mean Squared Error:", mse_onnx_double)
print("R^2 matching decimal places: ",compare_decimal_places(r2, r2_onnx_double))
print("MAE matching decimal places: ",compare_decimal_places(mae, mae_onnx_double))
print("MSE matching decimal places: ",compare_decimal_places(mse, mse_onnx_double))
print("double ONNX model precision: ",compare_decimal_places(mae, mae_onnx_double))
# set the figure size
plt.figure(figsize=(8,5))
# plot the original data and the regression line
plt.scatter(X, y, label='Original Data', marker='o')
plt.scatter(X, y_pred, color='blue', label='Scikit-Learn '+model_name+' Output', marker='o')
plt.scatter(X, y_pred_onnx_float, color='red', label='ONNX '+model_name+' Output', marker='o', linestyle='--')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title(model_name+' Comparison (with double ONNX)')
#plt.show()
plt.savefig(data_path + model_name+'_plot_double.png')
Output:
Python CCA Original model (double) Python R-squared (Coefficient of determination): 0.9962347199278333 Python Mean Absolute Error: 6.3561407034365995 Python Mean Squared Error: 49.82504148022689
Errors tab:
CCA.py started CCA.py 1 1 Traceback (most recent call last): CCA.py 1 1 onnx_model_float = convert_sklearn(regression_model, initial_types=initial_type_float, target_opset=12) CCA.py 87 1 onnx_model = convert_topology( convert.py 208 1 topology.convert_operators(container=container, verbose=verbose) _topology.py 1532 1 self.call_shape_calculator(operator) _topology.py 1348 1 operator.infer_types() _topology.py 1163 1 raise MissingShapeCalculator( _topology.py 629 1 skl2onnx.common.exceptions.MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.cross_decomposition._pls.CCA'>'. _topology.py 629 1 It usually means the pipeline being converted contains a _topology.py 629 1 transformer or a predictor with no corresponding converter _topology.py 629 1 implemented in sklearn-onnx. If the converted is implemented _topology.py 629 1 in another library, you need to register _topology.py 629 1 the converted so that it can be used by sklearn-onnx (function _topology.py 629 1 update_registered_converter). If the model is not yet covered _topology.py 629 1 by sklearn-onnx, you may raise an issue to _topology.py 629 1 https://github.com/onnx/sklearn-onnx/issues _topology.py 629 1 to get the converter implemented or even contribute to the _topology.py 629 1 project. If the model is a custom model, a new converter must _topology.py 629 1 be implemented. Examples can be found in the gallery. _topology.py 629 1 CCA.py finished in 2543 ms 19 1
Conclusion
The article reviewed 45 regression models available in the Scikit-learn library version 1.3.2.
1. Out of this set, 5 models faced difficulties when converting to the ONNX format:
- DummyRegressor (Dummy regressor);
- KernelRidge (Kernel Ridge Regression);
- IsotonicRegression (Isotonic Regression);
- PLSCanonical (Partial Least Squares Canonical Analysis);
- CCA (Canonical Correlation Analysis).
These models might be too complex in their structure or logic and could use specific data structures or algorithms that are not fully compatible with the ONNX format..
2. The remaining 40 models were successfully converted to ONNX with computations in float precision.
- ARDRegression: Automatic Relevance Determination Regression (ARD);
- BayesianRidge: Bayesian Ridge Regression with regularization;
- ElasticNet: Combination of L1 and L2 regularization for mitigating overfitting;
- ElasticNetCV: Elastic Net with automatic regularization parameter selection;
- HuberRegressor: Regression with decreased sensitivity to outliers;
- Lars: Least Angle Regression;
- LarsCV: Cross-validated Least Angle Regression;
- Lasso: L1-regularized regression for feature selection;
- LassoCV: Cross-validated Lasso regression;
- LassoLars: Combination of Lasso and LARS for regression;
- LassoLarsCV: Cross-validated LassoLars regression;
- LassoLarsIC: Information criteria for LassoLars parameter selection;
- LinearRegression: Simple linear regression;
- Ridge: Linear regression with L2 regularization;
- RidgeCV: Cross-validated Ridge regression;
- OrthogonalMatchingPursuit: Regression with orthogonal feature selection;
- PassiveAggressiveRegressor: Regression with a passive-aggressive learning approach;
- QuantileRegressor: Quantile regression;
- RANSACRegressor: Regression with the RANdom SAmple Consensus method;
- TheilSenRegressor: Non-linear regression based on Theil-Sen method.
- LinearSVR: Linear support vector regression;
- MLPRegressor: Regression using a multi-layer perceptron;
- PLSRegression: Partial Least Squares Regression;
- TweedieRegressor: Tweedie distribution-based regression;
- PoissonRegressor: Regression for modeling Poisson-distributed data;
- RadiusNeighborsRegressor: Regression based on radius neighbors;
- KNeighborsRegressor: Regression based on k-nearest neighbors;
- GaussianProcessRegressor: Gaussian process-based regression;
- GammaRegressor: Regression for modeling gamma-distributed data;
- SGDRegressor: Regression based on stochastic gradient descent;
- AdaBoostRegressor: Regression using the AdaBoost algorithm;
- BaggingRegressor: Regression using the Bagging method;
- DecisionTreeRegressor: Decision tree-based regression;
- ExtraTreeRegressor: Extra decision tree-based regression;
- ExtraTreesRegressor: Regression with extra decision trees;
- NuSVR: Continuous linear support vector regression (SVR);
- RandomForestRegressor: Regression with an ensemble of decision trees (Random Forest);
- GradientBoostingRegressor: Regression with gradient boosting;
- HistGradientBoostingRegressor: Regression with histogram gradient boosting;
- SVR: Support vector regression method.
3. The possibility of converting regression models into ONNX with calculations in double precision was also explored.
A serious issue encountered during the conversion of models to double precision in ONNX is the limitation of ML operators ai.onnx.ml.LinearRegressor, ai.onnx.ml.SVMRegressor, ai.onnx.ml.TreeEnsembleRegressor: their parameters and output values are of float type. Essentially, these are precision reduction components and their execution in double precision calculations is doubtful. For this reason, the ONNX Runtime library did not implement some operators for ONNX models in double precision (errors of NOT_IMPLEMENTED nature might occur: 'Could not find an implementation for the node LinearRegressor:LinearRegressor(1)', 'Could not find an implementation for SVMRegressor(1) node with name 'SVM', and so on). Thus, within the current ONNX specification, complete double precision operation for these ML operators is impossible.
For linear regression models, the sklearn-onnx converter managed to bypass the LinearRegressor limitation: MatMul() and Add() ONNX operators are used instead. Thanks to this approach, the first 30 models from the previous list were successfully converted into ONNX models with calculations in double precision, and these models retained the accuracy of the original models in double precision.
However, for more complex ML operators like SVMRegressor and TreeEnsembleRegressor, this was not achieved. Therefore, models like AdaBoostRegressor, BaggingRegressor, DecisionTreeRegressor, ExtraTreeRegressor, ExtraTreesRegressor, NuSVR, RandomForestRegressor, GradientBoostingRegressor, HistGradientBoostingRegressor, and SVR are currently available only in ONNX models with calculations in float.
Summary
The article covered 45 regression models from the Scikit-learn library version 1.3.2 and their conversion results into ONNX format for both float and double precision computations.
Out of all the reviewed models, 5 proved to be complex for ONNX conversion. These models include DummyRegressor, KernelRidge, IsotonicRegression, PLSCanonical, and CCA. Their complex structure or logic may require additional adaptation for successful ONNX conversion.
The remaining 40 regression models were successfully transformed into ONNX format for float. Among them, 30 models were also successfully converted into ONNX format for double precision, retaining their accuracy
Due to the limitation in ML operators for SVMRegressor and TreeEnsembleRegressor, the modles AdaBoostRegressor, BaggingRegressor, DecisionTreeRegressor, ExtraTreeRegressor, ExtraTreesRegressor, NuSVR, RandomForestRegressor, GradientBoostingRegressor, HistGradientBoostingRegressor and SVR are currently only available in ONNX models with computations in float.
All the scripts from the article are also available in the public project MQL5\Shared Projects\Scikit.Regression.ONNX.
Translated from Russian by MetaQuotes Ltd.
Original article: https://www.mql5.com/ru/articles/13538
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use