Machine learning in trading: theory, models, practice and algo-trading - page 3333
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
https://www.mql5.com/ru/articles/9138
It's been a year since anyone cared
I have written a dozen or twenty such algorithms, some of them have proved themselves well. The article is not the best in terms of stability of results, the first pancake.
so there's nothing to discuss, because there's nothing better yet.
Well, why there is no case - I think that python is not widespread among traders yet, so that people move to active discussion.
I will try your approach later on my sample.
Have you tried the CatBoost out-of-the-box method?
sibirqk #:
Они синхронизированы. Я же написал в начале поста - 'выровнял их по датам', что как раз и означает синхронизацию пар по времени.
"But unfortunately, imho, it's complete randomness again. The picture is a piece of graphs to illustrate."
You're right, it's not that simple
In the example there are 2 predictors, i.e. we change the distance in 2-dimensional space (calculate the hypotenuse). If there will be 5000 signs, then you will measure the distance in 5000-dimensional space (how to measure - see k-means code in alglib, it is the main task there - to measure distances, take it as a basis).
It looks like the root of the sum of squares of cathetes in all spaces https://wiki.loginom.ru/articles/euclid-distance.html.
An understanding is now emerging - thanks - I'll think about it.
If you will really do it - don't forget to adjust predictors, so that for example volumes 1...100000 did not swallow price deltas 0,00001...0,01000 in calculations.
Right, it is necessary to normalise. However, what if we unquantise them and calculate metrics purely by indices? :) And I don't like the idea of counting through cathetes - it's artificial as it is.
Although, the right thing would be to reproduce the proposed algorithm and then think about improving it.
How to detect it? That's the question. Especially on market data, where there will not be such a clear separation of the noisy area as in the example. Everything will be noisy, 90-99 per cent.
It may be easier to use ready-made packages for removing noisy lines, maybe they have a detector....
Actually, did you watch the video? There near the end of the video it says that the model is built, which just and detects to what area the data belong to, and if to non-cumulative, according to the given sample on which the training took place, the signal is ignored, as I understand. It is clear that we have much worse data than discussed there, but if it is 20%-30% of the target "1", I will already be happy.
Another option is to train the model to detect these excluded examples by marking up those rows in the overall sample.
That's right, we should normalise. However, what if they are unquantised and the metric is calculated purely by indices? :) And I don't like the idea of counting through the cathetes - it's artificial.
One chip is quantised to 2 quanta, the other to 32. It won't work.
Actually, have you watched the video? There near the end of the video it says that the model is built, which just and detects to what area the data belong to, and if to non-cumulative, according to the given sample on which the training took place, then the signal is ignored, as I understand. Clearly we have much worse data than discussed there, but if it's 20%-30% of the target "1", I'll be happy.
Another option is to train the model to detect these excluded examples by marking up those rows in the overall sample.
Haven't looked at it.
You can detect excluded examples without all these calculations. I already told you - just exclude leaves with probability of one of the classes about 50%.
One chip is quantised to 2 quanta, the other to 32. It's no good.
Nah, it will be the same relative place - number of splitters(splits) - fixed for all.
You can detect excluded examples without all these calculations. I've already told you - just exclude leaves that have about 50% probability of one of the classes.
There can be many different methods. I am interested in the variant of processing before model building - as it seems to me, it gives less variants of building combinations, which reduces the error at the final conclusion - whether the final model was successfully trained or not, in general.
Besides, if we talk about the fact that you can "throw out" something, you should mention what models we are talking about. If it's a forest, should we count the percentage of leaves without "discarded" or count the number of activations of these leaves near 50% and not react to the signal if their threshold is exceeded?
When bousting, it's even more fun there - uncertain leaves in total can shift the probability in one direction or another - I keep wanting to make a graph to show how weights are distributed depending on the probability shift, but I keep putting it off. For three days the computer considers similarity of model leaves - I think about optimising the algorithm - too long....
If there are 5000 features
Well, why is there no case - I think that python is just not widespread among traders yet, so that people move to active discussion.
I will try your approach later on my sample.
Have you tried the out-of-the-box method from CatBoost?
What is the method out of the box
This is the functionality.
There are different ways of dividing/separating data and they have been tried in this thread before - they did not show significant results, so they were "forgotten".
There are Bayesian networks - at first glance they are interesting just because of their ability to restore cause-and-effect relationships.
Here's that functionality.
There are different ways to split/separate data and they have been tried in this thread before - they did not show significant results, so they were "forgotten".
There are Bayesian networks - at first glance they are interesting just because of their ability to restore cause-effect relations.
No, it will be the same relative place - the number of dividers (splits) - fixed for all.
You have plenty of binary predictors with 0 and 1. They won't split into 32. But if you normalise them, you might get something with Uniform quantisation. If non-uniform quanta, then just by numbers all distances will be distorted, you need to abs values after normalisation.
The error will be in prediction if you can't get rid of noise like in training.
Busting is even more fun there - uncertain leaves in the sum can shift the probability in one direction or another - I keep wanting to make a graph to show how weights are distributed depending on the probability shift, but I keep putting it off. For three days the computer considers similarity of model leaves - I think about optimising the algorithm - too long....
It doesn't matter if it's tree, forest or bush. If the model prediction is 50% then there will be 50% 0 and 50% 1 in the prediction.