Machine learning in trading: theory, models, practice and algo-trading - page 212
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
R is an amazing system, which personally opened my eyes to how far we were in MetaTrader/MQL from real needs "to make complex calculations simply and right now".
We (C++ developers) have in our blood the approach "you can do everything yourself, and we give you the low-level base and the speed of calculations". We are fanatical about performance and we are good at it - MQL5 is great on 64 bits.
When I personally sat down to work on R I realized that I needed as many powerful functions in one line as possible and that I should be able to do research in general.
So we have made a sharp turn and started upgrading MetaTrader 5:
Of course, we are at the beginning of the way, but the right vector of forces is already clear.
Your motivation is great! If it's exactly as you say, you'll quickly bite the ninja to the bone with multichart)))
However, IMHO, here we have to create something radically new, that is, in addition to what you wrote, Mr. Reshetov, you need a research studio, to work with arbitrary datasets, not only downloadable from the market, because many things need to try very trivial, synthetic examples to understand what happens, well, you should understand me as a programmer programmer)) I would have to draw different charts, scatterplots, hitmaps, distributions and so on. In general, it would be really cool if directly from the metaeditor was available such a set of tools, but frankly I do not even hope ...
But in general, of course, I like the trend of your thoughts))
It was a polite answer with no details or verification. And the answer did not match Wolfram Alpha and Matlab, which is a problem.
There is no need to go sideways - the root question has been clearly stated.
What do you mean his answer didn't match Wolfram? Didn't match in that the person's answer was not "zero"? The man replied that he didn't think that at the zero point, where integral = 0, the density must necessarily be zero (I put the question to him that way). He explicitly said so. And he added that the density value at any point is irrelevant (I read "irrelevant" as irrelevant to the question at hand). This is quite a clear mathematical statement.
In the question at hand, the mathematics is important.
We have an integral of such-and-such a function (a gamma distribution probability density function). Everybody is used to that you can give Wolfram an equation with parameters: specify the area of integration and function parameters, and it will integrate and give the answer. But did you ever think that if you yourself sat down, calculated this integral on a given domain, you would get 0 at zero, and 1 on the whole domain, and some value [0,1] on some subarea. Simply by solving the equation!
The fact that the limit of the gamma distribution probability density function at the extreme point goes somewhere in the positive region is a property of that function. It has nothing to do with what you get by integrating that function. That's what the man wrote about.
I'm not dodging the root issues. I will reiterate that Our point has been validated by a person beyond our control - what density at zero is irrelevant (irrelevant).
Your motivation is great! If everything is exactly as you say, you will quickly bite to the bone ninja c multichart)))
However, IMHO, here we will have to create something radically new, that is, in addition to what you wrote, Mr. Reshetov, you need a research studio, to work with arbitrary datasets, not only with those downloaded from the market, since many things need to try on quite trivial, synthetic examples to understand what happens, well, you should understand me as a programmer programmer)) I would have to draw different charts, scatterplots, hitmaps, distributions and so on. In general, it would be really cool if directly from the metaeditor was available such a set of tools, but frankly I do not even hope ...
But in general, of course, the trend of your thoughts I like))
Are you referring to this "shot" by Reshetov?
"Some kind of rottenness this R - a bicycle with square wheels. What to talk about some of his packages, when the basis, i.e. the core in R is crooked and needs serious fine-tuning with a "pencil file"? What credibility can those who haven't even bothered to check the correctness of the basic functions in R for so many years have? What can be the "strength" in the weakness of R - the incorrectness of calculations through it?
Well, that MetaQuotes opened the eyes of some users to the fact that in fact this same R represents, the facts and tests with open source, so that everyone could independently double-check and make sure, but not groundless. Not all of course opened, because some religious fanatics from destructive sect of R will continue to blindly believe in "infallibility" of calculations in their crooked language and packages, instead of turning to the presented tests and double-check them independently, but not fanatically bullshitting, defending crookedness of R as "generally accepted standard".
Now it's quite obvious that it would be better to use MQL functionality for creating trading strategies, because the result will be more correct, than to try to do it with curve and slope of R.
Special thanks to MetaQuotes developers for the constructive approach, tests and their sources, as well as for the identification of the "naked king - R"! "
Is this Reshetov "shot" what you mean?
No, this is the message:
R, as well as many other languages, is so far more convenient for machine learning than MQL due to the fact that it has a deliberate functionality to process data in arrays. The thing is that a sample for machine learning is most often a two-dimensional array, and so it requires some functionality to work with arrays:
Until MQL has implemented the aforementioned functionality necessary for handling samples in arrays, most developers of machine learning algorithms will prefer other languages that already have all this available. Or they will use unpretentious MLP (algorithm of 1960s) from AlgLib which, if I remember correctly, for convenience represents two-dimensional arrays as one-dimensional ones.
Of course, functions of densities of random distributions are also necessary functionality. But such functions are not always needed in machine learning tasks, and in some tasks they are not used at all. But operations with samples as multidimensional arrays are what implementation of machine learning algorithms is always needed for any task, unless of course it is a task of training a grid to learn obviously normalized data from trivial COR.
Is this Reshetov "shot" what you mean?
"This R is rotten, a bicycle with square wheels. What to speak about some of his packages, when the very basis, i.e. the core in R is crooked and needs serious revision "with a pencil file"? What credibility can those who haven't even bothered to check the correctness of the basic functions in R for so many years have? What can be the "strength" in the weakness of R - the incorrectness of calculations through it?
Well, that MetaQuotes opened the eyes of some users to the fact that in fact this same R represents, the facts and tests with open source, so that everyone could independently double-check and make sure, but not groundless. Not all of course opened, because some religious fanatics from destructive sect of R will continue to blindly believe in "infallibility" of calculations in their crooked language and packages, instead of turning to the presented tests and double-check them independently, instead of fanatically bullshitting, defending crookedness of R as "generally accepted standard".
Now it's quite obvious that it would be better to use MQL functionality to create trading strategies, because the result will be more correct, than to try to do it with curve and slope of R.
Special thanks to MetaQuotes developers for the constructive approach, tests and their sources, as well as for the identification of the "Naked King - R"!
Have you already deleted your post about "minky MQL"? You're rubbing your posts the same way Radovian figures rub their Facebooks after Trump was elected.
Here is an example of gamma distribution in Wolfram Alpha for fun.
He is given a function, a slightly simplified gamma distribution density function.
The point is in the denominator x. The limit on the right, as you can see, at x->0 Wolfram estimates correctly: inf.
That is, the limit on the right is the zero-point density at infinity (which is exactly the answer to dgamma).
Let's integrate this function on a large saport:
The integral is 1 (rounded, of course, because the full sapport is not taken).
Conclusion, despite the fact that at the extreme point the function goes to infinity, the integral of this function counts fine as it should.
Here is an example of the gamma distribution in Wolfram Alpha for your amusement.
The conclusion is that even though the function goes to infinity at the extreme point, the integral of that function counts just fine.
Thanks for the example, you are right. This integral is convergent.
Marginal values at point x=0 can also be used to determine the density and it will not lead to divergence.
Thanks for the example, you are correct. That integral is convergent.
The limit values at x=0 can also be used to determine the density and this will not lead to divergence.
Thank you! Respect.
Example by R with Fast Processing Packages.
library(data.table)
library(ggplot2)
start <- Sys.time()
set.seed(1)
dummy_coef <- 1:9
x <- as.data.table(matrix(rnorm(9000000, 0, 1), ncol = 9))
x[, (paste0('coef', c(1:9))):= lapply(1:9, function(x) rnorm(.N, x, 1))]
print(colMeans(x[, c(10:18), with = F]))
x[, output:= Reduce(`+`, Map(function(x, y) (x * y), .SD[, (1:9), with = FALSE], .SD[, (10:18), with = FALSE])), .SDcols = c(1:18)]
x[, sampling:= sample(1000, nrow(x), replace = T)]
lm_models <- x[,
{
lm(data = .SD[, c(1:9, 19), with = F], formula = output ~ . -1)$coefficients
},
by = sampling]
lm_models[, coefs:= rep(1:9, times = 1000)]
avg_coefs <- lm_models[, mean(V1), by = coefs]
plot(dummy_coef, avg_coefs$V1)
lm_models[,
print(shapiro.test(V1)$p.value)
, by = coefs]
ggplot(data = lm_models, aes(x = V1)) +
geom_histogram(binwidth = 0.05) +
facet_wrap(~ coefs, ncol = 3)
Sys.time() - start
Running time: 5 sec. Constructed 1000 linear models. Each on 1000 observations.
[1] 0.8908975
[1] 0.9146406
[1] 0.3111422
[1] 0.02741917
[1] 0.9824953
[1] 0.3194611
[1] 0.606778
[1] 0.08360257
[1] 0.4843107
All coefficients are normally distributed.
And ggplot-ic for visualization.
And another example. It also correlates with large samples to simulate statistics.
########## simulate diffference between mean density with different sample size
library(data.table)
library(ggplot2)
rm(list=ls());gc()
start <- Sys.time()
x <- rnorm(10000000, 0, 1)
y <- rnorm(10000000, 0, 1)
dat <- as.data.table(cbind(x, y))
dat[, (paste0('sampling_', c(100, 1000, 10000))):= lapply(c(100, 1000, 10000), function(x) sample(x, nrow(dat), replace = T))]
dat_melted <- melt(dat, measure.vars = paste0('sampling_', c(100, 1000, 10000)))
critical_t <- dat_melted[,
{
mean(x) - mean(y)
}
, by = .(variable, value)]
ggplot(critical_t, aes(x = V1, group = variable, fill = variable)) +
stat_density(alpha = 0.5)
Sys.time() - start
gc()
Running time is 3.4 seconds.
Normally distributed samples centered at zero are created:
1,000 to 10,000 pairs of values.
10,000 of 1,000 pairs of values
100,000 of 100 pairs of values
The difference between the means (MO == 0) for each sample is counted.
The densities of sampling mean distributions for samples of different sizes are derived.
Only here sampling_100 means that you need 10,000,000 / 100 to get the sample size. That is, for smaller samples, the standard error is larger...