Machine learning in trading: theory, models, practice and algo-trading - page 2878

 
Maxim Dmitrievsky #:

Well done, even picked up something interesting for myself in the context of changing window lengths.

If you have any more questions, please sketch them out, I will ask you after the New Year.

Ok, Happy New Year to all of us!)

 
Aleksey Nikolayev #:

Ok, Happy New Year to all of us :)

Likewise :)

 

I'm not quite sure what you got out of the GPT conversation.

He sometimes answers the wrong thing. Here's an example

В идеале, алгоритм должен получать на вход всю доступную историю, которая очевидно со временем растёт. Он сам должен определять на какие куски её нарезать и что с ними делать.

*Yes, ideally the algorithm should be able to handle any number of features

You asked about varying row lengths, and he answered about varying column lengths.

Practically, the length of the history can be changed by retraining the model. For example, train on 1 day, 3, 7, month, 2, ... on 1 year, 2, 3 ... whichever length of history will predict well - use it.
 
Aleksey Nikolayev #:

It's a very important question, I'm always thinking about it) Let's just talk about the length of the history used. There should be a reasonable compromise between relevance and length for calculations. The shorter, the more relevant, but the longer, the more accurate the calculations. Sometimes a good compromise is unattainable in principle.

I also wondered about this question a long time ago, imho, it is one of the most important moments to build a working TS. For myself, I use this approach - I roughly analyse some characteristics of a financial asset on a known large history, find the coordinates of changing trends - trend, volatility, etc. and then work from the last point of change, assuming that this global characteristic will remain for some time.

 
elibrarius row lengths and he answered about varying column lengths.
Practically the length of the history can be changed by retraining the model. For example, train on 1 day, 3, 7, month, 2, ... on 1 year, 2, 3 ... whichever length of the history will predict well - use that length.

Columns have not been discussed at all yet - it's still a long way off. The confusion is due to the lack of saying that signs are price (bars, renko, etc.). That is, we are talking about an arbitrary length of a vector of homogeneous attributes. If in addition to the arbitrary length of the vector of attributes we want to have arbitrary types of attributes, it is already a clear overkill.

 
Aleksey Nikolayev #:

The problem is that SB is quite good at making it look like there are rules - the only problem is that they will be different at different sites.

then if you think about it, it's not a problem of an arbitrary number of features, it's a problem of feature invariance in the first place.

https://homes.esat.kuleuven.be/~tuytelaa/tutorial-ECCV06.pdf
 
Aleksey Nikolayev #:

It's a very important question, I'm always thinking about it) Let's just talk about the length of the history used. There should be a reasonable compromise between relevance and length for calculations. The shorter, the more relevant, but the longer, the more accurate the calculations. Sometimes a good compromise is unattainable in principle.

You need a criterion, and the only criterion is the model fitting error.

Here's a picture


This is a sample of 2000 bars, 43 variables. We see that it is pointless to increase the number of trees over 100. I changed the sample size. The result is that the picture does not change above 1500 bars. This means that the number of patterns in my predictors for my teacher is about 100 pieces and they can all be found in 1500 bars of history. Further these patterns are repeated.

 
mytarmailS #:

then if you think about it, it's not a problem of an arbitrary number of features, it's a problem of feature invariance in the first place

https://homes.esat.kuleuven.be/~tuytelaa/tutorial-ECCV06.pdf

If you compare it to picture recognition, it is roughly a matter of finding, for each point, the boundary of the object (blob) in which that point is located.

The problem is that the picture is of extremely poor quality and it is not quite clear what is actually depicted on it.

A small object is simply unrealistic to select in such conditions, and a large object will be selected ambiguously.

 
СанСаныч Фоменко #:

You need a criterion, and the only criterion is model fitting error.

Here's a picture


This is a sample of 2000 bars, 43 variables. We see that it is pointless to increase the number of trees over 100. I changed the sample size. The result is that the picture does not change above 1500 bars. This means that the number of patterns in my predictors for my teacher is about 100 pieces and they can all be found in 1500 bars of history. Further these patterns are repeated.

1500 bars is the average temperature of the hospital. There will be break points, when halves of the history are very different and when it is better just not to count and trade anything.

 
Aleksey Nikolayev #:

1500 bars is the "average temperature of the hospital". There will be break points, when halves of the history are very different and when it is better to just not count and trade anything.

There is no maths for fractures - there is nothing to discuss.