Machine learning in trading: theory, models, practice and algo-trading - page 199

 
Alexey Burnakov:

Comment then, in order of literalism, how for a uniform continuous distribution the density at the extreme point is positive and the integral is zero: https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)

Let's go back to the original statement about R errors in the article.

Our opinion stands - errors are present and caused by carelessness in implementation.

 
Renat Fatkhullin:

The point is that @Quantum is a pure implementation and full check of R's analog of math libraries in MQL5.

This is not the reasoning of a theorist. And he digs deep when writing unit tests, which provide confidence in the correctness of the library.


Do not assume a priori that everything is correct in R. On the contrary, I would say that even if there is a C++ implementation of functions there, everything is quite primitive. And in terms of speed, you can see that the MQL5 library in the source code on our compiler wins by 3 times on average.

We took the trouble to double-check everything and found obvious errors. These errors have been confirmed:

Look at the dates of publications, please. You will see how the work is going with the advice of scientists.

In addition, it would be a mistake not to consider @Quantum a scientist.

Dear Renat!

According to your last posts I have the following questions, which are fundamental for me:

1. Judging by the date of publication of your article it is the year 2003. Naturally, R, as well as any other software system, has bugs and always publishes a list of fixes when it publishes a release. At the same time, R has always emphasized that the advantage of R is the low level of bugs at the expense of an extremely large number of users. And here since 2003 a bug in the algorithm has been detected at the publication level and not fixed. This is not clear to me.

Have you made a request to R on this issue?

I'd like to see the code by which the performance of R and MQL5 was compared.

I appreciate it in advance.

 
SanSanych Fomenko:

Dear Renat!

I have the following questions about your last posts, which are fundamental for me:

1. Judging by the date of publication of the article - it is the year 2003. Naturally, R, as well as any other software system, has bugs and always publishes a list of fixes when it publishes a release. At the same time, R has always emphasized that the virtue of R is the low level of bugs at the expense of an extremely large number of users. And here since 2003 a bug in the algorithm has been detected at the publication level and not fixed. This is not clear to me.

It is elementary and absolutely clear.

Everyone makes mistakes, that's what developers are all about. We make a ton of mistakes and do not get discouraged.

This error in R is just from carelessness and reliance on one basic function that screwed up the others. Correct it.

Have you made a request to R on this issue?

We have conducted tests, we have looked into everything in details while writing the library, we have constantly compared the results of MQL5 - Wolfram Alpha - R and we have shown our results and are ready to publicly answer for them. Of course, we have attached three large scripts with unit tests and a benchmark to our math package (which is in the source code).

I'm sure @Quantum will write a bug report in R. The updated article was released just a couple of hours ago.


I'd like to see the code comparing the performance of R and MQL5.

The MQL5 benchmark code is located in \Scripts\UnitTests\Stat\TestStatBenchmark.mq5, and the R code is available at the end of the article Statistical Distributions in MQL5 - take the best from R and make it faster, see "Appendix. Results of calculating time for statistical functions".

Be sure to upgrade to MetaTrader 5 build 1467 by connecting to the MetaQuotes-Demo server, please. It is in this beta version that we have included the new library and all testing scripts.

 
Renat Fatkhullin:

This is elementary and perfectly understandable.

Everyone makes mistakes - that's the nature of developers. We make a ton of mistakes and don't get discouraged.

This error in R is just from carelessness and trust in one basic function that screwed up others. Correct it.

We have conducted tests, we have looked into everything in details while writing the library, we have constantly compared the results of MQL5 - Wolfram Alpha - R and we have shown our results and are ready to publicly answer for them. Of course, we have attached three large scripts with unit tests and a benchmark to our math package (which is in the source code).

I'm sure @Quantum will write a bug report in R. The updated article was released just a couple of hours ago.


The benchmark code in MQL5 can be found in \Scripts\UnitTests\Stat\TestStatBenchmark.mq5, and the code in R can be found at the end of the article Statistical Distributions in MQL5 - take the best from R and make it faster in "Appendix. Results of Calculating Time for Statistical Functions".

Be sure to upgrade to MetaTrader 5 build 1467 by connecting to the MetaQuotes-Demo server, please. It is in this beta version that we have included the new library and all testing scripts.

I cannot form my own opinion on the comparison of speeds so far. And it is a matter of principle.

The thing is that R is an ideal environment for development - an interpreter in one word. But the code that exists during development is very different from the working code - the number of lines many times. And the working code, it is very short and at the same time very capacious meaningfully. That's why you should compare on any functions from packages, which make sense when making trading decisions, for example, randomforest, which use computationally capacious algorithms, matrix operations, loading of all cores....

PS.

You are using an outdated version of R. You should take R version 3.3.1 (2016-06-21) from MRAN - Microsofn R Open website. It is obligatory to install MKL. Microsoft claimed in the mentioned R release that it was able to increase execution speed of some packages and functions up to 50 (!) times.

Microsoft R Open: The Enhanced R Distribution · MRAN
  • Microsoft Corporation
  • mran.revolutionanalytics.com
Microsoft R Open, formerly known as Revolution R Open (RRO), is the enhanced distribution of R from Microsoft Corporation. It is a complete open source platform for statistical analysis and data science. The current version, Microsoft R Open 3.3.1, is based on (and 100% compatible with) R-3.3.1, the most widely used statistics software in the...
 
SanSanych Fomenko:

So far I can't form my own opinion about the performance comparison. And this is a matter of principle.

The fact that R is the ideal environment for development - an interpreter in a word. But the code that exists during development is very different from the working code - the number of lines many times. And the working code, it is very short and at the same time very capacious meaningfully. Therefore we should compare it on any functions from packages, which make sense when making trading decisions, for example, randomforest, which uses computationally capacious algorithms, matrix operations, loading of all cores....

We methodically translate R features into MQL5. And in such a way that the essence of function calls turns out to be very similar.

Here is an example of the correspondence from the article:


Distribution
MQL5 Functions
Functions of the R language
1Normal
MathProbabilityDensityNormal
MathCumulativeDistributionNormal
MathQuantileNormal
MathRandomNormal
dnorm
pnorm
qnorm
rnorm
2Beta
MathProbabilityDensityBeta
MathCumulativeDistributionBeta
MathQuantileBeta
MathRandomBeta
dbeta
pbeta
qbeta
rbeta
3Binomial
MathProbabilityDensityBinomial
MathCumulativeDistributionBinomial
MathQuantileBinomial
MathRandomBinomial
dbinom
pbinom
qbinom
rbinom
4
Cauchy
MathProbabilityDensityCauchy
MathCumulativeDistributionCauchy
MathQuantileCauchy
MathRandomCauchy
dcauchy
pcauchy
qcauchy
rcauchy
5Chi-square
MathProbabilityDensityChiSquare
MathCumulativeDistributionChiSquare
MathQuantileChiSquare
MathRandomChiSquare
dchisq
pchisq
qchisq
rchisq
6Exponential
MathProbabilityDensityExponential
MathCumulativeDistributionExponential
MathQuantileExponential
MathRandomExponential
dexp
pexp
qexp
rexp
7Fisher's F
MathProbabilityDensityF
MathCumulativeDistributionF
MathQuantileF
MathRandomF
df
pf
qf
rf
8Gamma
MathProbabilityDensityGamma
MathCumulativeDistributionGamma
MathQuantileGamma
MathRandomGamma
dgamma
pgamma
qgamma
rgamma
9Geometric
MathProbabilityDensityGeometric
MathCumulativeDistributionGeometric
MathQuantileGeometric
MathRandomGeometric
dgeom
pgeom
qgeom
rgeom
10Hypergeometric
MathProbabilityDensityHypergeometric
MathCumulativeDistributionHypergeometric
MathQuantileHypergeometric
MathRandomHypergeometric
dhyper
phyper
qhyper
rhyper
11
Logistic
MathProbabilityDensityLogistic
MathCumulativeDistributionLogistic
MathQuantileLogistic
MathRandomLogistic
dlogis
plogis
qlogis
rlogis
12Lognormal
MathProbabilityDensityLognormal
MathCumulativeDistributionLognormal
MathQuantileLognormal
MathRandomLognormal
dlnorm
plnorm
qlnorm
rlnorm
13Negative binomial
MathProbabilityDensityNegativeBinomial
MathCumulativeDistributionNegativeBinomial
MathQuantileNegativeBinomial
MathRandomNegativeBinomial
dnbinom
pnbinom
qnbinom
rnbinom
14Noncentral beta
MathProbabilityDensityNoncentralBeta
MathCumulativeDistributionNoncentralBeta
MathQuantileNoncentralBeta
MathRandomNoncentralBeta
dbeta
pbeta
qbeta
rbeta
15Noncentral chi-square
MathProbabilityDensityNoncentralChiSquare
MathCumulativeDistributionNoncentralChiSquare
MathQuantileNoncentralChiSquare
MathRandomNoncentralChiSquare
dchisq
pchisq
qchisq
rchisq
16
Noncentral F
MathProbabilityDensityNoncentralF()
MathCumulativeDistributionNoncentralF()
MathQuantileNoncentralF()
MathRandomNoncentralF()
df
pf
qf
rf
17Decentral T Student
MathProbabilityDensityNoncentralT
MathCumulativeDistributionNoncentralT
MathQuantileNoncentralT
MathRandomNoncentralT
dt
pt
qt
rt
18Poisson's
MathProbabilityDensityPoisson
MathCumulativeDistributionPoisson
MathQuantilePoisson
MathRandomPoisson
dpois
ppois
qpois
rpois
19T Student
MathProbabilityDensityT
MathCumulativeDistributionT
MathQuantileT
MathRandomT
dt
pt
qt
rt
20
Uniform
MathProbabilityDensityUniform
MathCumulativeDistributionUniform
MathQuantileUniform
MathRandomUniform
dunif
punif
qunif
runif
21Weibull
MathProbabilityDensityWeibull
MathCumulativeDistributionWeibull
MathQuantileWeibull
MathRandomWeibull
dweibull
pweibull
qweibull
rweibull

We try to make the code from R be almost identical in size and time to write it in MQL5.

The first step is to put in beta the graphical library and demonstrate the same size code in R and MQL5 together with images.



You are using an outdated version of R. You should take R version 3.3.1 (2016-06-21) from MRAN - Microsofn R Open website. It is obligatory to install MKL. Microsoft claimed in the mentioned R release that it was able to increase the execution speed of some packages and functions up to 50 (!) times.

I doubt that the regular version of R can unexpectedly speed up - the code there does not change much. It is clear that some functions can be sped up, especially matrix ones. And your statement confirms my opinion that the code in R is written rather sloppily in terms of performance.

If you read the article, you'd see that we got speedups up to 46 times even on basic functions without any multithreading and MKL:

The calculations were done on an Intel Core i7-4790, 3.6 Ghz CPU, 16 GB RAM, Windows 10 x64. Measurement results of calculation time in microseconds


Distribution
MQL5 time
PDF calculation time (µs)
R calculation time
PDF calculation time (µs)
PDF
R/MQL5
MQL5 calculation time
CDF calculation time (µs)
R calculation time
CDF calculation time (µs)
CDF
R/MQL5
MQL5 calculation time
quantile time (µs)
R calculation time
quantile computation time (µs)
Quantile
R/MQL5
MQL5 generation time
random numbers generation time (μs)
R generation time
random numbers generation time (μs)
Random
R/MQL5
1
Binomial
4.39
11.663
2.657
13.65
25.316
1.855
50.18
66.845
1.332
318.73
1816.463
5.699
2
Beta
1.74
17.352
9.972
4.76
15.076
3.167
48.72
129.992
2.668
688.81
1723.45
2.502
3
Gamma
1.31
8.251
6.347
8.09
14.792
1.828
50.83
64.286
1.265
142.84
1281.707
8.973
4
Cauchy
0.45
1.423
3.162
1.33
15.078
11.34
1.37
2.845
2.077
224.19
588.517
2.625
5
Exponential
0.85
3.13
3.682
0.77
2.845
3.695
0.53
2.276
4.294
143.18
389.406
2.72
6
Uniform
0.42
2.561
6.098
0.45
1.423
3.162
0.18
2.846
15.81
40.3
247.467
6.141
7
Geometric
2.3
5.121
2.227
2.12
4.552
2.147
0.81
5.407
6.675
278
1078.045
3.879
8
Hypergeometric
1.8511.095
5.997
0.9
8.819
9.799
0.75
9.957
13.28
302.55
880.356
2.91
9
Logistic
1.27
4.267
3.36
1.11
4.267
3.844
0.71
3.13
4.408
178.65
626.632
3.508
10
Weibull
2.99
5.69
1.903
2.74
4.268
1.558
2.64
6.828
2.586
536.37
1558.472
2.906
11
Poisson
2.91
5.974
2.053
6.26
8.534
1.363
3.43
13.085
3.815
153.59
303.219
1.974
12
F
3.86
10.241
2.653
9.94
22.472
2.261
65.47
135.396
2.068
1249.22
1801.955
1.442
13
Chi Square
2.47
5.974
2.419
7.71
13.37
1.734
44.11
61.725
1.399
210.24
1235.059
5.875
14
Noncentral ChiSquare.
8.05
14.223
1.767
45.61
209.068
4.584
220.66
10342.96
46.873
744.45
1997.653
2.683
15
Noncentral F
19.1
28.446
1.489
14.67
46.935
3.199
212.21
2561.991
12.073
1848.9
2912.141
1.575
16
Noncentral Beta
16.3
26.739
1.64
10.48
43.237
4.126
153.66
2290.915
14.909
2686.82
2839.893
1.057
17
Negative Binomial
6.13
11.094
1.81
12.21
19.627
1.607
14.05
60.019
4.272
1130.39
1936.498
1.713
18
Normal
1.15
4.267
3.71
0.81
3.983
4.917
0.7
2.277
3.253
293.7
696.321
2.371
19
Lognormal
1.99
5.406
2.717
3.19
8.819
2.765
3.18
6.259
1.968
479.75
1269.761
2.647
20
T
2.32
11.663
5.027
8.01
19.059
2.379
50.23
58.596
1.167
951.58
1425.92
1.498
21
Noncentral T
38.47
86.757
2.255
27.75
39.823
1.435
1339.51
1930.524
1.441
1550.27
1699.84
1.096
<PDF R/MQL5>
3.474<CDF R/MQL5>
3.465
<Quantile R/MQL5>
7.03
<Random R/MQL5>
3.13



But we will check the specified version of course. Both for speed and performance.

 
SanSanych Fomenko:

You are wrong about the "wrong answer"

...

For example, the MQL documentation gives an example on the arcsine and states that arcsine(2) = infinity. That's not accurate. Exactly: arxinus(2) = NaN, i.e. no numerical value, arxinus(1) = Inf, but skipping quotes during trading = NA, i.e. should be (or could be on weekends) and they are not.

I wrote this with a bit of irony about the wrong answers. I should have added a smiley face... I actually added at the end that it's not a bug in both cases, as the behavior of compilers and interpreters in non-defenied function areas is entirely dependent on system architecture and developers. Better to return nan in that case, of course.
I mean, don't call a function with parameters for which it is not defined and then compare results with another library, you could find hundreds of "mistakes" that way.

By the way, it is interesting example with the arcsinus.
The mql is
MathArcsin(1) = MathArcsin(2) = -nan(ind)

Wolfram -
Arcsin(1) = Pi/2
Arcsin(2) = something complex. There is no solution with a valid result.

R -
asin(1) = Pi/2
asin(2) = nan (the answer for real numbers)
asin(2+0i) = something complex, like in wolfram

wiki says that asin(1) is still defined(https://en.wikipedia.org/wiki/Inverse_trigonometric_functions), you can write a bug report to servicedesk.
But asin(2) is already undefinable, everything is normal and matches everywhere.

And again about last post - to divide by 0 in simple math is impossible, so it's logical that mql script crashes with error, there are no bugs here. But it's very strange to see such meticulousness to the accuracy of results up to 16 decimal places, and return nan or Inf when dividing by zero is impossible for some reason. Imho need to return Inf and not torment developers with sudden crashes of their scripts.

 
Renat, was this translation of several functions from R to mql really the surprise you were talking about?
 

To disable real division control, use the parameter FpNoZeroCheckOnDivision=1 in the [Experts] section of the metaeditor.ini file

If this parameter is present, the following code will produce inf

void OnStart()
  {
   double x=0;  
   Print(1/x);
  }

Of course, the presence of this parameter will not save you from a compilation error when dividing by a constant 0.0
Print(1/0.0);

'0.0' - division by zero in the constant expression    s1.mq5    8    12
 
mytarmailS:
Renat, was this transfer of some functions from R to mql really the surprise you were talking about?

No.

Surprise doesn't make sense, we will do everything within MQL5 and MetaTrader 5.

 
Renat Fatkhullin:

If this parameter is present, the following code will produce inf

Thank you, very correct setup. And if you divide zero by zero you get not inf but nan, and that's even more correct, I didn't even expect such accuracy!