Machine learning in trading: theory, models, practice and algo-trading - page 210

 

Alexey Burnakov:
I started getting answers to my question in R.

Continued correspondence with the same person...


Re: [Rd] dgamma density values in extreme point
Duncan Murdochmurdoch.duncan@gmail.com
сегодня в 12:59
Вам
:
burnakov@yandex.ru
Язык письма — английский. Перевести на русский?
Перевести
<span class="mail-Message-Widget-Inline_help ns-action nb-with-s-right-gap" data-click-action="common.show-hint" title="Узнать больше" data-params="pos=right&counter=71105:msg.click.quest&text=Вы можете переводить письма с иностранных языков нажатием одной кнопки. Кроме того, перевод писем доступен по ссылке «Перевести» за вкладкой «подробнее». Если вы считаете, что язык письма определён неправильно, пожалуйста, сообщите нам об этом." style="margin-right: 10px !important; display: inline-flex; cursor: pointer; color: rgb(187, 187, 187); flex-shrink: 0;">
Alexey Burnakov14 ноя. в 1:54 AM
 Hi Duncan,

 "As to the "correctness", we all know that the value of a density at any
 particular point is irrelevant. Only the integrals of densities have
 any meaning. "

 Thank you for clarification. Yes, I agree that what matters practically
 is the cumulative density. One more point.

 There is an opinion expressed by sometimes that while integral from the
 left in point zero of the support == 0, density in this particular point
 cannot be anything than zero. You think that is sound?

No. The value of a density at any particular point is irrelevant.

Duncan Murdoch

About the defendant: http://www.stats.uwo.ca/faculty/murdoch/other.shtml

In particular:

I am a member of the R core development group; see www.r-project.org for details about the R project. I maintain theWindows version of R and have aweb page of tips for people writing DLLs for R.

About Duncan Murdoch
  • www.stats.uwo.ca
My research interests are in applied statistics (especially orientation statistics) and statistical computing (especially the R project and perfect sampling). I write a fair...
 
SanSanych Fomenko:

Well, you know better.

Yes, you do, and more honestly.

And I don't have the political ability to sketch as an outside forum member.


You didn't even hesitate to call Matlab, Wolfram, and Mathematics "I don't know who that is".

Give me a link to the rankings of statistical packages that had Mathlab (Wolfram) in them. Matlab was, but has died. I have given in my blog on your site and many times posted on the forum

You are silent as to what:

  • you show a comparison of free R with paid packages
  • you ignore historical (5-10 years ago) positions of mathematical packages, stating on this basis "they are nobody"
  • you change the popularity and accuracy of calculations discussed

Matlab hasn't gone anywhere, and neither have the other packages. Yes, it's paid, but it's quality. Yes, popularity will go down, but accuracy and quality won't go anywhere.

Read about Wolfram Alpha and research its website - it's a fundamental investment in analytics that few people can do. And Wolfram Alpha has 30 digits accuracy, which proves their manic attention to calculation quality.

Wolfram|Alpha: Computational Knowledge Engine
  • www.wolframalpha.com
Wolfram|Alpha is more than a search engine. It gives you access to the world's facts and data and calculates answers across a range of topics, including science, nutrition, history, geography, engineering, mathematics, linguistics, sports, finance, music...
 
Alexey Burnakov:

Re: [Rd] dgamma density values in extreme point
Duncan Murdochmurdoch.duncan@gmail.com
сегодня в 12:59
Вам

Unfortunately, you incompletely phrased the question and received an ill-conceived and brief polite "it doesn't matter" answer.

You wanted a "so agreed/convention" answer by formulating it in the question itself. But Duncan got away with "what is right" the first time and repeated it the second time.

You didn't get a proof of accuracy in R and you didn't get an answer as to why the result is different in other packages. The question "why is the answer different in other packages" is more important and can disclose the subject.


Our position:

выражение для dgamma

(x)= 1/(s^a Gamma(a)) x^(a-1) e^-(x/s)

for x ≥ 0, a > 0 and s > 0


в точке 0 является неопределенным.

R believes that you can include this point in the calculation, but take the limit values even if they are infinity as in the case of dgamma(0,0.5,1).

However, if we calculate probabilities given infinity at the zero point, all integrals from dgamma formally become infinite and by this logic pgamma should be equal to infinity for all values of x.

However, this contradicts the results of pgamma, where all values turn out to be finite. They are correct, as if at point x=0 the density is assumed to be =0.
 

For @SanSanych Fomenko - MQL4/MQL5 has moved up to 41st place in global TIOBE programming language ranking: http://www.tiobe.com/tiobe-index/

R in 19th place, Mathlab in 15th place.

This is to dispel your statements about "not being an authority" and "not being the world's top.

TIOBE Index | TIOBE - The Software Quality Company
  • www.tiobe.com
TIOBE Index for November 2016 November Headline: Is Haskell finally going to hit the top 20? Some people say that Haskell is the most mature purely functional programming language available nowadays. It has quite a long history, dating back from 1990 and its community is growing, although slowly. This month Haskell is only 0.255% away from the...
 

In order not to lose the thread of the discussion, here is the error of calculating the quantiles of the noncentral t-distribution found in the process of testing R.

For example:

> n <- 10
> k <- seq(0,1,by=1/n)
> nt_pdf<-dt(k, 10,8, log = FALSE)
> nt_cdf<-pt(k, 10,8, log = FALSE)
> nt_quantile<-qt(nt_cdf, 10,8, log = FALSE)
> nt_pdf
 [1] 4.927733e-15 1.130226e-14 2.641608e-14 6.281015e-14 1.516342e-13 3.708688e-13 9.166299e-13
 [8] 2.283319e-12 5.716198e-12 1.433893e-11 3.593699e-11
> nt_cdf
 [1] 6.220961e-16 1.388760e-15 3.166372e-15 7.362630e-15 1.742915e-14 4.191776e-14 1.021850e-13
 [8] 2.518433e-13 6.257956e-13 1.563360e-12 3.914610e-12
> k
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
> nt_quantile
 [1]           -Inf -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154 -1.340781e+154
 [7] -1.340781e+154   7.000000e-01   8.000000e-01   9.000000e-01   1.000000e+00

The algorithm AS 243 proposed by Lenth [6] is used to calculate the probability of the non-central T-distribution of the Student's t-distribution in R. The advantage of this method is fast recurrence calculation of infinite series terms with incomplete beta-functions. However, it was shown in the article [7] that this algorithm leads to errors due to the error in accuracy estimation when summing the terms of the series (Table 2 in the article [7]), especially for large values of the noncentrality parameter delta. The authors of [7] proposed a corrected algorithm for recurrence-based calculation of the noncentral T-distribution probability.

Our statistical library MQL5 uses the correct algorithm for calculation of probabilities from the article [7] , which gives exact results.


In addition, in R, the way to determine densities for Gamma, ChiSquare and Noncentral ChiSquare distributions at point x=0 leads to infinite expressions:

> dgamma(0,0.5,1)
[1] Inf
> dchisq(0,df=0.5,ncp=1)
[1] Inf
> dchisq(0,df=0.5,ncp=0)
[1] Inf

Thus, it turns out, the point x=0 in R is included in the domain of definition of the expression for the densities and the solution is the marginal values.

In this case, the limiting value at x=0 is infinity. In this approach, as a result of integrating from 0 to x>0 because of the divergence at point x=0, the probabilities should be infinite.

However, the result of the probability calculation (e.g. for x=0.1) are finite expressions:

> pgamma(0.1,0.5,1)
[1] 0.3452792
> pchisq(0.1,df=0.5,ncp=0)
[1] 0.5165553
> pchisq(0.1,df=0.5,ncp=1)
[1] 0.3194965

In spite of the fact that at point x=0 the density is considered infinite, the results of probability calculation in R are not infinite, they coincide with the Wolfram Alpha values(Gamma, ChiSquare, NoncentralChiSquare).

To avoid problems with integrating functions that turn to infinity at x=0, in Wolfram Alpha (Mathematica) and Matlab the density at point x=0 is 0 by definition:


Figure 3. Determination of Gamma distribution probability density function in Wolfram Alpha


Fig. 4. Determining the ChiSquare distribution probability density in Wolfram Alpha



Fig. 5. Determining the probability density function of Noncentral ChiSquare distribution in Wolfram Alpha


We think that this approach is correct. It allows to avoid uncertainties in probability density function determination and solves the problem with infinite values that may occur when integrating the probability density function.

For this reason, at point x=0 the densities of these distributions are assumed to be zero by definition, not infinite as in R.

We have included several unit test scripts into the delivery in order to be sure in the accuracy of calculations and to give the possibility for third party developers to check the quality of the library.

Read more at .

  1. The R Project for Statistical Computing.
  2. Balakrishnan N., Johnson N.L., Kotz S. "Univariate continuous distributions: part 1." MOSCOW: BINOM. Laboratory of Knowledge, 2014.
  3. Balakrishnan N., Johnson N.L., Kotz S. "Univariate continuous distributions: part 2. " M.: BINOM. Knowledge Lab, 2014.
  4. Johnson N.L., Kotz S., Kemp A. "Univariate discrete distributions", M.: BINOM. Knowledge Lab, 2014.
  5. Forbes C., Evans M., Hastings N., Peacock B., "Statistical Distributions", 4th Edition, John Wiley and Sons, 2011.
  6. Lenth, R.V., "Cumulative distribution function of the noncentral t distribution," Applied Statistics, vol. 38 (1989), 185-189.
  7. D. Benton, K. Krishnamoorthy, "Computing discrete mixtures of continuous distributions: noncentral chisquare, noncentral t and the distribution of the square of the sample multiple correlation coefficient," Computational Statistics & Data Analysis, 43, (2003), 249-267
Wolfram|Alpha: Computational Knowledge Engine
  • www.wolframalpha.com
Wolfram|Alpha is more than a search engine. It gives you access to the world's facts and data and calculates answers across a range of topics, including science, nutrition, history, geography, engineering, mathematics, linguistics, sports, finance, music...
 
Renat Fatkhullin:

For @Sanych Fomenko - MQL4/MQL5 is 41st in the world TIOBE programming language ranking: http://www.tiobe.com/tiobe-index/

R in 19th place, Mathlab in 15th place.

This is to dispel your statements about "no authority" and "not the world's top.

I am discussing statistics. And my rankings on statistical packages.

Furthermore. In the quote you cited, R ranks above MQL4/5 in those statistics for algorithmic languages. But that is NOT a reason for me to switch from MQL to R, for example, at all. I am not discussing R's algorithmic capabilities at all.

For me, the main strength of R is its packages, it's support of this whole system, it's the authority of people who developed them, it's a huge R hangout, in the end a huge number of publications tied to R.

But as someone who was in academic councils for 15 years, I can tell you something else. If you write "analogue of R" then it is analog without any exceptions. And there is NO other way. Otherwise, it is NOT an analogue of R, it may well be much more correct, but it is not an analogue

 
SanSanych Fomenko:

I discuss statistics. And my ratings on statistical packages.

Furthermore. In the quote you cited, R ranks above MQL4/5 in these statistics for algorithmic languages. But to me this is NOT a reason at all to switch from MQL to R, for example. I'm not discussing R's algorithmic capabilities at all.

And we are discussing a specific error in R.

So don't bother with your ratings, since you don't know the math and the specific case you're dealing with.

 

SanSanych Fomenko:

...

For me the main strength of R is its packages, it's support of this whole system, it's authority of people who developed them, it's a huge hangout on R, in the end a huge number of publications, tied to R.
...

What a rotten thing this R is - a bicycle with square wheels. What to talk about some of its packages, when the basis, i.e. the core in R is crooked and needs serious fine-tuning with a "pencil and file"? What credibility can those who haven't even bothered to check the correctness of the basic functions in R for so many years have? What can be the "strength" in the weakness of R - the incorrectness of calculations through it?

Well, that MetaQuotes opened the eyes of some users to the fact that in fact this same R represents, the facts and tests with open source, so that everyone could independently double-check and make sure, but not groundless. Not all, of course, opened their eyes, because some religious fanatics from the destructive sect of R will continue to blindly believe in "infallibility" of calculations in their crooked language and packages, instead of turning to the presented tests and double-check them themselves, instead of fanatically bullshitting, defending crookedness of R as "generally accepted standard".

Now it is quite obvious that it would be better to use MQL functionality for creating trading strategies, because the result will be more correct, than to try to do it with slanted and crooked R.

MetaQuotes developers deserve a special thanks for a constructive approach, tests and their sources, as well as for the identification ofthe "Naked King - R"!

 
Quantum:
We are not interested in width 0, we need to understand how such an integral behaves, i.e., cdf(x). What kind of function is obtained? Will it coincide with pgamma(x)?

> dgamma_05_1 <- function(x)dgamma(x,0.5,1) #всего 1 параметр, чтоб удобней работать

> pgamma_05_1 <- function(x)pgamma(x,0.5,1#всего 1 параметр, чтоб удобней работать

> pgamma_05_1_integralform <- function(x)integrate(dgamma_05_1, 0, x)$value #вычисление pgamma путём интегрирования dgamma
>
> pgamma_05_1(0.00001)
[1] 0.003568236
> pgamma_05_1_integralform(0.00001)
[1] 0.003568236
> pgamma_05_1(0.00001) - pgamma_05_1_integralform(0.00001)
[1] -6.938894 e-18
>
> pgamma_05_1(0.0001)
[1] 0.01128342
> pgamma_05_1_integralform(0.0001)
[1] 0.01128342
> pgamma_05_1(0.0001) - pgamma_05_1_integralform(0.0001)
[1] 3.295975 e-17
>
> pgamma_05_1(0.001)
[1] 0.03567059
> pgamma_05_1_integralform(0.001)
[1] 0.03567059
> pgamma_05_1(0.001) - pgamma_05_1_integralform(0.001)
[1] 1.595946 e-16
>
> pgamma_05_1(0.01)
[1] 0.1124629
> pgamma_05_1_integralform(0.01)
[1] 0.1124629
> pgamma_05_1(0.01) - pgamma_05_1_integralform(0.01)
[1] 1.096345 e-15
>
> pgamma_05_1(0.1)
[1] 0.3452792
> pgamma_05_1_integralform(0.1)
[1] 0.3452792
> pgamma_05_1(0.1) - pgamma_05_1_integralform(0.1)
[1] 1.126876 e-13
>
> pgamma_05_1(1)
[1] 0.8427008
> pgamma_05_1_integralform(1)
[1] 0.8427008
> pgamma_05_1(1) - pgamma_05_1_integralform(1)
[1] 3.460265 e-11

pgamma() found in the standard way and integrate(dgamma()) almost coincide, we see that the error is only3.460265e-11 at x=1. But such an error is quite expected, the integration here is a sum of small steps, without any preliminary analysis or simplification. The pgamma() function itself is written in C++ and should be more accurate than integrating with integrate(). That's why you should use pgamma(x,0.5,1) instead of integrate(dgamma(x,0.5,1),0,x)

 
Yury Reshetov:

What can we say about some of its packages, when the basis, i.e. the core in R is crooked and needs serious fine-tuning with a "pencil and file"?

The R language is crooked and slow.

If you separate the discussion to purely classical statistics, which is in the basic package and in additional packages, I think there is no problem here. If you run millions of statistical tests, the performance of other languages (including MQL) will be a plus.

If we talk about how R is programmed in general, I will tell you, Yuri, people use again packages for fast data processing (dplyr, data.table, and for charts - ggplot2). R itself is again a scripting language, ancient, not designed for massive data.

But even in spite of all this, the mud flow from your side is still unjustifiably high. We were discussing statistics here, not even code refactoring and other technical things. It was a discussion of mathematical concepts.

Reason: