Machine learning in trading: theory, models, practice and algo-trading - page 3519

 

The original idea is that if there is some structure in the data markup and the dataset is profitable, it will persist on new data.

Whereas if there is no structure in the labels, then they are just fitted to the noise.

 

Changed the autopartitioning method and entropy decreased compared to the other way.

I will do more detailed tests with results later. For now it's hard to compare.

Iteration: 0, Cluster: 5, PE: 0.4486169556211518
R2: 0.9768988376324667
Iteration: 0, Cluster: 10, PE: 0.4669088793243399
R2: 0.9741796468253388
Iteration: 0, Cluster: 2, PE: 0.46459273655864924
R2: 0.9574754821062279
Iteration: 0, Cluster: 9, PE: 0.4617870814003918
R2: 0.9710405592188178
Iteration: 0, Cluster: 7, PE: 0.44973808720127306
R2: 0.9756506928909877
Iteration: 0, Cluster: 14, PE: 0.4659979449727183
R2: 0.9517591162928314
Iteration: 0, Cluster: 11, PE: 0.4459637631384201
R2: 0.9734964538920183
Iteration: 0, Cluster: 1, PE: 0.47443727646415623
R2: 0.969078168370678
Iteration: 0, Cluster: 8, PE: 0.46687563336247373
R2: 0.9791214350912251
Iteration: 0, Cluster: 4, PE: 0.4566470188416623
R2: 0.9812114918306366
Iteration: 0, Cluster: 12, PE: 0.47363779551886176
R2: 0.9336815818695965
Iteration: 0, Cluster: 0, PE: 0.4478528064300205
R2: 0.9741209899697653
Iteration: 0, Cluster: 3, PE: 0.4375103321344003
R2: 0.9036380849528642
Iteration: 0, Cluster: 6, PE: 0.433662587848933
R2: 0.9280739709436694
 
The exact same check can be made when marking on the SB and on the quotes, to determine the difference. The values can be compared, because the rows are binary in both cases. The lengths of rows with marks should +- coincide and the parameters of entropy calculation.

Then it will confirm or deny my long-standing thesis that the problem is not in the signs, but in the quality of markup.
 
Maxim Dmitrievsky #:
the problem is not the signs, but the quality of the markings.

Unfortunately, the problem is there and there.

 
Maxim Dmitrievsky #:
You need other datasets, you can do the math on yours.
Permutation Entropy (train): 0.49329517325507943
Permutation Entropy (test): 0.4724266203519978
Permutation Entropy (exam): 0.4889780996100367
 
Aleksey Vyazmikin #:

You have to measure on the traine before training and see how this metric affects OOS trading. When entropy decreases, the OOS should improve.

 
Maxim Dmitrievsky #:

You have to measure on the traine before training and see how this metric affects OOS trading. When entropy decreases, the OOS should improve.

I took three samples and measured the index. Or what?

 

Other sampling and labelling

Permutation Entropy (train): 0.7652171231205366
Permutation Entropy (test): 0.7575991534095268
Permutation Entropy (exam): 0.7589746432648261

Is the indicator not affected by class balance?

 
Aleksey Vyazmikin #:

I took three samples and measured the index. Or what?

Measure the entropy of the labels before training. Then compare it with the results on the OOS of the trained model using your own estimates. Reducing the entropy should improve the trading on the OOS.
 

ChatGPT:
Permutation entropy measures the complexity or unpredictability of a time series based on permutations of its values. This indicator is based on the frequency of occurrence of different permutations in the data series.

The dependence of permutation entropy on class balance depends on which classes you consider in your time series and how often they occur. If classes occur about equally often in the data, this can lead to a more even distribution of permutations and, as a result, a lower permutation entropy.

However, if one or more classes significantly dominate the others, this may lead to a more uneven distribution of permutations and hence a higher permutation entropy.

Thus, it can be expected that balanced classes may lead to lower permutation entropy, while unbalanced classes may lead to higher permutation entropy. However, this may depend on the specific data and its distribution.