Machine learning in trading: theory, models, practice and algo-trading - page 200

 
Renat Fatkhullin:

Let's go back to the original statement about R's errors in the article.

Our opinion still stands: there are errors and they were caused by carelessness in implementation.

Your specialist does not want to answer me on the merits. He just poked me in the tungsten and that's all.

I will repeat our point. There is no error on the gamma distribution. This is a matter of convention in the mathematical community. There is an error phrase on your part that is not correct.

The derivation of zero for density in your version is, firstly, one of the possible options (conventions). And second, the essence of applied analysis from the choice of this value (nan, 0 or 1) does not affect anything. Distribution functions are used to estimate statistics. In this one, the distribution is zero. And that is sufficient.

Regarding the t distribution. I reproduced this error on the latest version of R. but did not get into the essence of the algorithm. I admit that the correction was really needed.

I'll try to ask a question to the R support team.
 

And you go into it, count it all up, spend a few weeks rechecking it, like he did. But instead you haven't even read our article.

For our part, we did the job. And we did a good job. If Wolfram isn't an authority for you either, then you should not have started this way of communicating.

Do not confuse the creators with users, please.

 
Renat Fatkhullin:

And you go into it, count it all up, spend a few weeks rechecking it, like he did. But instead you haven't even read our article.

For our part, we did the work. And we did a good job. If Wolfram isn't an authority for you either, then you should not have started this way of communicating.

Do not confuse the creators with the users, please.

I respect your opinion. But why are you responsible for it?

To repeat.

The derivation of zero for density in your version, firstly, is one of the possible options (conventions). And secondly, the essence of applied analysis from the choice of this value (nan, 0 or 1) does not affect anything. Distribution functions are used to estimate statistics. At this point, the distribution is zero. And this is enough.

I will add on the whole, the acceleration of calculations is a plus. That is, objectively, the implementation of functions is not worse or even better, according to your article.
 
I hope the emotional part is over...

I looked around for 0^0. This is exactly the point of contention - it is contained in the probability density equation for the gamma distribution when x = 0.


This is the answer that came up in this discussion:
0^0 and 0! are equal to 1 because of the empty product rule. The IEEE standard says that 0^0 should evaluate 1. So most computer software will evaluate it that way. There are no situations where evaluating to 0 is helpful.

There are many formulas that assume 0^0=1, e.g. notation for polynomials and power series, cardinal number arithmetic, the binomial theorem, etc., these all assume 0^0=1. For example, substituting x=0 into x^0+x^1+x^2+... = 1/(1-x) shows 0^0 = 1.

That is, there is a standard for bringing this expression to 1. And there is no benefit or harm in bringing the expression to 0.

I think in R hardcoded 1... and you have 0.

That's the point and there's no point in arguing about it. A computer can't calculate the value algebraically.
 
Alexey Burnakov:

The derivation of zero for density in your version is, firstly, one of the possible options (conventions).

We can assume that defining a function means specifying a way to calculate it.

Given expression
Alexey Burn akov:

Moreover, according to:

https://en.wikipedia.org/wiki/Gamma_distribution

when x = 0, alpha = 1, beta = 1, you get an indeterminate value in the numerator, which brings the whole fraction into indeterminacy.

We declare that, strictly speaking, the gamma distribution density at point zero is undefined. And when the limit on the right is taken, the density is equal to one.

In light of this, we believe that the wording of the statement "calculation errors in R" is not correct. More precisely, it is a matter of convention: what counts as equal to the expression zero to the power of zero. Equating the gamma distribution density to zero at the point zero does not seem to be any conditional practice.

does not fit the definition of the function because it contains an uncertainty at x=0, as you pointed out.

If instead of x>=0 you put condition x>0 then there will be no uncertainty and you can calculate values using this formula.

Regarding practice - Mathematica and Matlab can be considered as industry standards for engineering calculations, they produce 0, i.e. that expression is valid only for x>0.

 
Quantum:

We can assume that defining a function means specifying a way to calculate it.

The expression given is.

does not fit the definition of the function because it contains uncertainty at x=0, as you pointed out.

If you put the condition x>=0 instead of x>0, there is no uncertainty and you can calculate values using this formula.

Regarding practice - Mathematica and Matlab can be considered the industry standard for engineering calculations, they give 0, i.e. that expression is only true for x>0.

Correct. The function is defined on the positive region (0,inf).

Why do you still have 0 at point 0? and why is the error a 1?


By the way, it's interesting. Tungsten defines a gamma distribution probability function on the space [0,inf]. And so they give a certain value at zero... Which is odd.


 
Quantum:

Mathematica and Matlab can be considered the industry standard for engineering calculations

You took a situation for which there is no solution, looked up the result that Wolfram returns, and called all other results wrong. This is not an engineering calculation, but a dogma.

And you could take R as a benchmark, and write about how you found an error in Wolfram. I guess if you take all the math software, and divide them into two groups based on what they return in this situation - there will be a 50%/50% split, you can add a dozen more mql software that mql excels in the article about mql.

Thank you for the mentioned errors inAS 243. But you should not blame it on the behavior of other functions with parameters, for which there is no clear solution.
That's how you should have written in the article about the advantages of mql - in R there is a function that has an error of 15 decimal places. And we have another function in mql that is more accurate. Everything would be cultured and scientific, not dogmatic as it is now.

 
Alexey Burnakov:

Why do you still have 0 at point 0? and why is the error 1?

Let's consider an example with the parameters a=0.5, b=1 at the point x=0

> dgamma(0,0.5,1,log=FALSE)[1] Inf> pgamma(0,0.5,1,log=FALSE)[1] 0

If we don't exclude point x=0, density diverges and probability is all right.

And then there are no problems either:

> pgamma(0.00001,0.5,1,log=FALSE)[1] 0.003568236

It turns out that when calculating CDF R excludes point x=0, infinity has disappeared somewhere.

 
Quantum

Dear colleague!

For several pages there is a dispute about the differences between your and R's algorithms on the edges of the definition of functions. Extreme points are extreme points and in practice the differences could be neglected.

But in this case I have a much more substantial question:

Where is the documentation for all your functions?

Previously, I thought that we take your function, then take the documentation for R, since your functions are analogues, and delve into those parts of R's documentation that either describe the algorithms or follow the links provided by R. R has very high quality documentation and reference apparatus.

In the course of the argument, I found out that your functions are different from R - they are some other functions whose algorithms rely on other sources. There is nothing in the article itself about this, no documentation. And we learn about it from Renat in a completely different context.

In practice, it follows unambiguously that the porting of the code from R to MQL5 is impossible.

And here's why.

It's clear to me that if it says "analogue of R" and I'm not given documentation on the analogue, then it is 100% analogue and I can transfer the code from interpreter to compiler without worrying about it. If it's not so, then it's enough to put an end to the idea of porting code from R to MQL5. No one wants to get a completely deadlocked headache when transferring working code to R you get MQL code that does not work because of the subtleties in implementation of algorithms.

 
SanSanych Fomenko:

Where is the documentation for all your functions?

Previously, I thought that we take your function, then take the documentation for R, since your functions are analogues, and go into those parts of the R documentation that either describe the algorithms or go to the links provided by R. R has very high quality documentation and reference apparatus.

In the course of the argument, I found out that your functions are different from R - they are some other functions whose algorithms rely on other sources. There is nothing in the article itself about this, no documentation. And we learn about it from Renat in a completely different context.

In practice, it follows unambiguously that the porting of the code from R to MQL5 is impossible.

And here's why.

It's clear to me that if it says "analogue of R" and I'm not given documentation on the analogue, then it's 100% analogue and I can transfer the code from interpreter to compiler without worrying about it. If it's not so, then it's enough to put an end to the idea of porting code from R to MQL5. No one wants to get a complete deadlock, when after the migration of the working code into R you get MQL code, which does not work due to the subtleties of implementation of algorithms.

At the moment, the description of the functions can be found in the article https://www.mql5.com/ru/articles/2742.

Consider calculating a normal distribution with parameters mu=2, sigma=1 as an example:

n <- 10
k <- seq(0,1,by=1/n)
mu=2
sigma=1
normal_pdf<-dnorm(k, mu, sigma, log = FALSE)
normal_cdf<-pnorm(k, mu, sigma, lower.tail=TRUE,log.p = FALSE)
normal_quantile <- qnorm(normal_cdf, mu,sigma, lower.tail=TRUE,log.p = FALSE)
normal_pdf
normal_cdf
normal_quantile


1) The function dnorm() is analogous to the R function:

The function calculates the values of the normal distribution probability density function with parameters mu and sigma for the array of random variables x[]. In case of an error it returns false. An analogue of dnorm() in R.

bool MathProbabilityDensityNormal(
  const double   &x[],        // [in]  Массив со значениями случайной величины
  const double   mu,          // [in]  Параметр распределения  mean (математическое ожидание)
  const double   sigma,       // [in]  Параметр распределения sigma (среднеквадратическое отклонение)
  const bool     log_mode,    // [in]  Флаг расчета логарифма значения, если log_mode=true, то рассчитывается натуральный логарифм плотности вероятности
  double         &result[]    // [out] Массив для значений функции плотности вероятности
);

2) Analog pnorm:

The function calculates the value of the normal probability distribution function with parameters mu and sigma for an array of random variables x[]. In case of an error, it returns false. Analog to pnorm() in R.

bool MathCumulativeDistributionNormal(
  const double   &x[],        // [in]  Массив со значениями случайной величины
  const double   mu,          // [in]  Математическое ожидание
  const double   sigma,       // [in]  Среднеквадратическое  отклонение
  const bool     tail,        // [in]  Флаг расчета, если lower_tail=true, то рассчитывается вероятность того, что случайная величина не превысит x
  const bool     log_mode,    // [in]  Флаг расчета логарифма значения, если log_mode=true, то рассчитывается натуральный логарифм вероятности
  double         &result[]    // [out] Массив для значений функции вероятности
);

3) The analogue of qnorm:

For an array of probability values probability[] the function calculates the values of the inverse normal distribution function with parameters mu and sigma. In case of an error it returns false. The analogue of qnorm() in R.

bool MathQuantileNormal(
  const double   &probability[],// [in]  Массив со значениями вероятностей случайной величины
  const double   mu,            // [in]  Математическое ожидание
  const double   sigma,         // [in]  Среднеквадратическое отклонение
  const bool     tail,          // [in]  Флаг расчета, если lower_tail=false, то расчет ведется для вероятности 1.0-probability
  const bool     log_mode,      // [in]  Флаг расчета, если log_mode=true, то расчет ведется для вероятности Exp(probability)
  double         &result[]      // [out] Массив со значениями квантилей
);

Example of their use:

#include <Math\Stat\Normal.mqh>
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
//--- arrays for calculated values
   double x_values[];
   double normal_pdf[];
   double normal_cdf[];
   double normal_quantile[];
//--- prepare x values
   const int N=11;
   ArrayResize(x_values,N);
   for(int i=0;i<N;i++)
      x_values[i]=i*1.0/(N-1);
//--- set distribution parameters
   double mu=2.0;
   double sigma=1.0;
//--- calculate pdf, cdf and quantiles
   MathProbabilityDensityNormal(x_values,mu,sigma,false,normal_pdf);
   MathCumulativeDistributionNormal(x_values,mu,sigma,true,false,normal_cdf);
   MathQuantileNormal(normal_cdf,mu,sigma,true,false,normal_quantile);
//--- show calculated values
   for(int i=0;i<N;i++)
      PrintFormat("1 %d, x=%.20e PDF=%.20e, CDF=%.20e, Q=%.20e,",i,x_values[i],normal_pdf[i],normal_cdf[i],normal_quantile[i]);
  }

Result:

2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 0, x=0.00000000000000000000e+00 PDF=5.39909665131880628364e-02, CDF=2.27501319481792120547e-02, Q=0.0000000000000000e+00,
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 1, x=1.000000000000000000005551e-01 PDF=6.56158147746765951780e-02, CDF=2.87165598160018034624e-02, Q=1.00000000000000088818e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 2, x=2.000000000000000000011102e-01 PDF=7.89501583008941493214e-02, CDF=3.593031911292598098213e-02, Q=2.00000000000000177636e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 3, x=2.9999999999999999988898e-01 PDF=9.40490773768869470217e-02, CDF=4.456546247585430410108e-02, Q=3.00000000000000266454e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 4, x=4.00000000000000022204e-01 PDF=1.1092083434679455543315e-01, CDF=5.47992916995579740225e-02, Q=3.999999999999999999911182e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 5, x=5.00000000000000000000e-01 PDF=1.29517595665891743772e-01, CDF=6.68072012688580713080e-02, Q=5.00000000000000222045e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 6, x=5.99999999999999999999977796e-01 PDF=1.49727465635744877437e-01, CDF=8.07566592337710387195e-02, Q=6.00000000000000310862e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 7, x=6.999999999999999999955591e-01 PDF=1.71368592047807355438e-01, CDF=9.6800484585610344040793e-02, Q=7.00000000000000177636e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 8, x=8.0000000000000000044409e-01 PDF=1.94186054983212952330e-01, CDF=1.15069670221708289515e-01, Q=8.0000000000000000044409e-01
2016.11.11 11:56:46.413 Test (EURUSD,H1) 1 9, x=9.0000000000000000022204e-01 PDF=2.1785217177032550525793e-01, CDF=1.35666060946382671659e-01, Q=9.00000000000000133227e-01
2016.11.11:56:46.413 Test (EURUSD,H1) 1 10, x=1.000000000000000000000000e+00 PDF=2.41970724519143365328e-01, CDF=1.58655253931457046468e-01, Q=1.00000000000000000000e+00,

Calculation result in R:

> n <- 10> k <- seq(0,1,by=1/n)> mu=2> sigma=1> normal_pdf<-dnorm(k, mu, sigma, log = FALSE)> normal_cdf<-pnorm(k, mu, sigma, lower.tail=TRUE,log.p = FALSE)> normal_quantile <- qnorm(normal_cdf, mu,sigma, lower.tail=TRUE,log.p = FALSE)> normal_pdf [1] 0.05399097 0.06561581 0.07895016 0.09404908 0.11092083 0.12951760 0.14972747 0.17136859 [9] 0.19418605 0.21785218 0.24197072> normal_cdf [1] 0.02275013 0.02871656 0.03593032 0.04456546 0.05479929 0.06680720 0.08075666 0.09680048 [9] 0.11506967 0.13566606 0.15865525> normal_quantile [1] 0.0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Статистические распределения в MQL5 - берем лучшее из R и делаем быстрее
Статистические распределения в MQL5 - берем лучшее из R и делаем быстрее
  • 2016.10.06
  • MetaQuotes Software Corp.
  • www.mql5.com
Рассмотрены функции для работы с основными статистическими распределениями, реализованными в языке R. Это распределения Коши, Вейбулла, нормальное, логнормальное, логистическое, экспоненциальное, равномерное, гамма-распределение, центральное и нецентральные распределения Бета, хи-квадрат, F-распределения Фишера, t-распределения Стьюдента, а также дискретные биномиальное и отрицательное биномиальные распределения, геометрическое, гипергеометрическое и распределение Пуассона. Есть функции расчета теоретических моментов распределений, которые позволяют оценить степень соответствия реального распределения модельному.
Files:
Test.mq5  2 kb