Machine learning in trading: theory, models, practice and algo-trading - page 2788

 
СанСаныч Фоменко #:

Nothing can be done just like that, after reading textbooks and articles - it is a separate stage and is called study. Without systematic knowledge of statistics there is nothing to do in the MoE.

It is always necessary to do, trying to achieve the goal.

If we take an intermediate goal - maximum predictive ability of the predictor, then:

1. It is obligatory to remove outliers. If values greater than 0.5% of the quantile are considered as outliers, then outliers are less than 1%. By the way, this is the percentage of triggered stops in the future. We are developing the trading system itself, we have digital constraints.

2. Preprocessing is mandatory, but again it depends on what kind. If we are talking about the predictive ability of the predictor, then you can't correct slopes, which increase the predictive ability. This is an example. In general, we take some preprocessing algorithm and evaluate its influence on the predictive power. Here is the answer.

3. Always keep in mind the meaning of MO, which in my mind is to search for some patterns. Most obviously in RF. what number of patterns is contained in e.g. 5000 bars? Or from what window value does increasing the number of patterns not reduce the error? Or for some fixed window, from what value of the number of patterns will the error stop falling?

Answers for RF.

1. it makes no sense to increase the window above 1500 bar.

2. the relationship between the error and the number of patterns (trees) can be clearly seen on the graph:

Minimum 50. Generally from 100 to 200. The graph does not change when the window is increased up to 5000.

It is always necessary to clearly formulate the goal and the criterion for achieving the goal. Everything else is blah blah.

I detected emissions through isolation forest, deleted them, the result of training did not change. Tried to train on emissions - no change. I got the impression that the model (catbust) does not care about emissions. As if they are well recognised through the search for anomalies, but their removal is not necessary.
 
Maxim Dmitrievsky #:
Detected emissions through isolation forest, deleted them, the training result did not change. Tried to train on emissions - no results. I got the impression that the model (catbust) does not care about emissions. As if they are well recognised through anomaly search, but their removal is not necessary.

The outliers strongly affect the predictive power, and the stability of the predictive power affects the stability of the prediction error.

And for the model itself, it depends on the model, especially if the training sample is obtained from the sample.

 
Aleksey Nikolayev #:

The idea of a local decision tree came to mind. It is something like an analogue of KNN or local regression (also potentially suitable for non-stationarity). The idea is that we split into boxes only the box containing the point of interest (up to at least a given number of K points in it), and do not care about the rest of the boxes. It may be better than KNN or local regression if the boundaries between classes are sharp and the point is close to such a boundary.

I wonder if the approach makes sense at all.

it seems to me that you are comparing incomparable things - scaling is scaling (even multidimensional if you want, as long as the distance su its you), and filtering-noise - you can do with derivatives (1st and 2nd).-- well, or switch to vector matrices in a completely unsupervised manner, instead of proving the significance of class differences (labelled) through covariance matrices of the labelled data and further exploiting the confirmed significance for classification of the subject of your interest...

hypotheses, gentlemen, hypotheses are not a way of calculations, but a subject of proof (or refutation)....

 
JeeyCi #:

it seems to me that you are comparing incomparable things - scaling is scaling (even multidimensional if you want, as long as the distance suits you), and filtering-noise - you can do with derivatives (1st and 2nd).-- well, or switch to vector matrices in a completely unsupervised manner, instead of proving the significance of class differences (labelled) through covariance matrices of the labelled data and further exploiting the confirmed significance for classification of the subject of your interest...

hypotheses, gentlemen, hypotheses are not a way of calculations, but a subject of proof (or refutation)....

Didn't understand anything, but very interesting.

 
СанСаныч Фоменко #:

The outliers strongly affect the predictive ability, and the stability of the predictive ability wiggles the stability of the prediction error.

And for the model itself, it depends on the model, especially if the training sample is obtained from the sample.

what is the R2 value between your method of determining predictive ability and feature importance from random forest?

 

Hi all.
I have a question, is it even realistic to use a hash as a predictor?

For example
LlLCmywDpe8dj_j8t8DWwoMjaIhTLnOedRh6KET7R7k

where the target is
1.04.

Does it make sense to somehow convert it to a number or other form?

 
Roman #:

Hi all.
A question came up, is it even realistic to use a hash as a predictor?

like this
LlLCmywDpe8dj_j8t8DWwoMjaIhTLnOedRh6KET7R7k

where the target
1.04

Does it make sense to somehow convert it to a number or other form?

So it is a number in 256-item notation (if the string is ANSI-encoded). Since hashes have a fixed length, you can still represent them as vectors of numbers from 0 to 255.

Do you want to crack bitcoin?)

 
Aleksey Nikolayev #:

So it is a number in a 256-item record (if the string is ANSI-encoded). Since hashes have a fixed length, they can also be represented as vectors of numbers from 0 to 255.

Do you want to crack bitcoin?)

Man,how the string type relaxes you that you forget about ANSI-encoding.
No, not bitcoin, online sweepstakes :))))


 
Evgeni Gavrilovi #:

what is the R2 value between your method of determining predictive ability and feature importance from random forest?

Explained many times.

 
СанСаныч Фоменко #:

Sick or something!

Moderator! At least clean the forum with this lady!

Yes, I've already banned her - it doesn't help.
Yes, sometimes she makes off-topic posts, recalling Maxim's posts here in the thread.
And often - on topic.
For now, just ignore ...
Reason: