Machine learning in trading: theory, models, practice and algo-trading - page 2552

 
Vladimir Perervenko #:

There is another problem with using predictors: their drift.

Is drift the same as non-stationarity, as I understand it?

What if we train a model that takes a price as an input and outputs a maximum stochastic series that correlates with the price? Have you tried something like that?

 
Vladimir Perervenko #:

There is another problem when using predictors - their drift. And this problem must be determined and taken into account in testing and in operation. In the appendix you can find the translation of the article (look for others on the net) and there is a drifter package. It is not the only one. But the point is that when selecting predictors, you need to consider not only their importance, but also their drift. If they drift much, they should be discarded or transformed, and if they drift little, they should be taken into account (corrected) when testing and working.

I agree, the non-stationarity (discontinuity) complicates things a lot. Unfortunately, it is much more complicated than in the example with spam. But it is necessary to take it into account.

 
elibrarius #:
The color of a candle, even with an error of 30% can be a loser. We don't know how much profit we'll get from it... the color is usually well guessed in the slow moves of the price (night), and 1 unguessed strong day candle may be worth 10 small night ones. I think that guessing the color of candlesticks is again a random output (due to the random dimensions).
That's why I did the classification with TP, SL. If they are equal, then 52% of successful trades are already profitable. If TP=2*SL. Then >33% of successful trades will be profitable. The best I have had is 52-53% of successful trades with TP=2*SL over 2 years. But in general, I'm thinking to use regression with fixed TP/SL. Or rather somehow make a classification on regression.

Yes, that reminds me, I don't think such a targeting is quite effective, because it doesn't take volatility into account.

Am I correct in assuming that a position is opened virtually every bar to prepare a sample?

 
SanSanych Fomenko #:

In principle, there are not and cannot be mathematical methods that will make candy out of garbage. Either there is a set of predictors that predicts the teacher, or there isn't.

And models play practically no role, as well as various cross-validations and other computationally capacious perversions.


PS.

By the way, the "importance" of predictors in a model has nothing to do with the ability to predict a teacher.

You are deeply mistaken - there are no perfect model-building methods capable of selecting the "right" predictors on their own. Or such I am not aware of.

Maybe you can't describe the market perfectly, but by applying analysis of sampling and predictors you can significantly improve the result of the model, albeit with a peek at the data for which the training takes place.

The question is how to effectively select the predictors and control their abnormal changes when applying the model.

 
Vladimir Perervenko #:

There are three options for processing noise samples: delete, re-partition (fix the markup), and separate noise samples into a separate class. In my experience, about 25% of the sample is "noise". Quality improvement of about 5%, depending on models and data preparation. I use it sometimes.

There is one more problem when using predictors - their drift. And this problem should be defined and taken into account in testing and in operation. In the appendix you can find the translation of the article (look for others on the net) and there is a drifter package. It is not the only one. But the point is that when selecting predictors, you need to consider not only their importance, but also their drift. For high drifters throw them away or transform them, for low drifters take them into account (correct them) when testing and working.

Good luck

As I understand it, the authors of the article propose to analyze the distribution of predictor values per window, and if it is very different, then signal an anomaly. If I understand correctly, the example takes a window of 1000 indicators - this is a large window, but apparently statistically justified. Question, what metrics are used to compare the two distributions to detect a significant change in it?

Further thoughts, the change itself may be predicted by some other predictor, let's say we have a global trend change in weeks caused by a change in the interest rate - in all samples there are few such changes - let 3 and the model may simply not pick up these predictors, but if the two predictors are combined, the "abnormal" change lends itself to interpretation. Thus, I come to the idea that drift itself is not a reason to throw out a predictor, but a reason to look for a factor explaining it, i.e. to try to find a correlating predictor and to combine them, creating a new predictor.

In turn I will briefly tell about my method - I take apart predictors into "quanta" (segments) and estimate already binary response of quanta through their predictive ability. By making a cross-section of such estimates over history, it is possible to select good sets of quanta, which can serve as individual predictors, and can be used to select base predictors. This approach also allows to improve the results. Accordingly, estimation of stability of behavior of quanta, and their selection, on control samples essentially improves results of model training in CatBoost, and here I think, whether it is admissible to use it or it is already self-deception.

 
Aleksey Vyazmikin #:

Yeah, that reminds me, I don't think that target is very effective because it doesn't take volatility into account.

I agree. At night the deal will hang for several hours, and in the daytime it may finish in 5 minutes. I wonder how to attach a regression model to the classification. I cannot predict the figures 0,1,2 "with a stick figure". I have to do something smarter.

Do I correctly understand that a position is opened virtually every bar in order to prepare the sample?

Yes, if there is a predicted buy/sell class. There is also a class - wait.

 
elibrarius #:

I agree. At night the deal will hang for several hours, and during the day it can be over in 5 minutes. That's why I'm thinking how to attach a regression model to the classification. I cannot predict the figures 0,1,2 "with a stick figure". I have to do something smarter.

Logistic regression
 
SanSanych Fomenko #:

I haven't been on the forum for several years, and it's still there. As in the song: "What you were, what you have remained, steppe eagle, dashing Cossack...".

Statistics begins with an axiom, which, being an axiom, is not discussed:


"Garbage in, garbage out."


In principle, there are not and cannot be mathematical methods which will make a candy out of garbage. Either there is a set of predictors that PREVENT the teacher, or there isn't.

And models play practically no role, as well as various cross-validations and other computationally capacious perversions.


PS.

By the way, the "importance" of predictors in a model has nothing to do with the ability to predict the teacher.

There are always those who, like Comrade Sukhov, think, "Better to torture, of course.")

I agree that finding the right predictors is more important than the specific model. And it is better to build them based, first of all, on the study of the subject domain, rather than relying only on the power of MO algorithms (and constructing predictors in an uninterpretable way from bars).

No less important than predictors is the loss function, which must correspond well to the subject area and the problem to be solved.

 
I don't even know if you can call working with AMO a search for patterns, it's more like a simple approximation/fitting to the target function.
Can AMO come up with something clever?
 
mytarmailS #:
Can AMO come up with anything clever?

No, it's a database of memorized history. What is a leaf in a tree? 10-20-100-1000 examples/strings from the past, somehow selected as similar. Sheet answer: for classification - % of most frequent class or just most frequent class, for regression - arithmetic mean of all values.

Further, if forest, it averages the value of all trees in the forest. If boosting, it sums the values of all trees (each subsequent tree corrects the sum of all previous trees to get the most accurate answer).