Machine learning in trading: theory, models, practice and algo-trading - page 3254

 
Forester #:

There is a double correlation calculation function in alglib. I think you can just change all the variables to char/uchar and everything will work. There are dozens of other used functions that should also be redone. And from CMatrixDouble we should switch to dynamic arrays or something else.

Pearson product-moment correlation matrix                        |
//| INPUT PARAMETERS:                                                |
//|     X   -   array[N,M], sample matrix:                           |
//|             * J-th column corresponds to J-th variable           |
//|             * I-th row corresponds to I-th observation           |
//|     N   -   N>=0, number of observations:                        |
//|             * if given, only leading N rows of X are used        |
//|             * if not given, automatically determined from input  |
//|               size                                               |
//|     M   -   M>0, number of variables:                            |
//|             * if given, only leading M columns of X are used     |
//|             * if not given, automatically determined from input  |
//|               size                                               |
//| OUTPUT PARAMETERS:                                               |
//|     C   -   array[M,M], correlation matrix (zero if N=0 or N=1)  |
//+------------------------------------------------------------------+
static bool CBaseStat::PearsonCorrM(const CMatrixDouble &cx,const int n,
                                    const int m,CMatrixDouble &c)


And if you have a homemade program, you will have to do quantisation too, if you don't have a ready-made package that does it.

I think I'm stupid... it's very fast to calculate through Nampai ) it's long and memory-consuming to calculate through Panda. I'll double-check everything later.

 
Maxim Dmitrievsky #:

no pattern, patterns are searched by corr. matrix

Maybe there's something I don't understand.

 
mytarmailS #:

Maybe there's something I don't understand.

Pattern = a whole set of samples with high correlation between each other.

There can be many such patterns in the entire dataset, each with a different number of coincidences in the history.

Without a matrix, you won't find anything, or you'll pick a fragmented part of it, and I'm counting all possible variants.

So you have to take each row and calculate the correlation with all the other rows, you get a matrix.
 
Maxim Dmitrievsky #:

Pattern = entire set of samples with high correlation between each other

There may be many such sets in the entire dataset

Without a matrix you won't find anything, or you'll pick a fragmented part of it, and I'm counting all possible variants.

So you have to take each row and calculate the correlation with all the others, you get a matrix.
I did something similar in 15-16. I took the current situation, for example the last 20-50 bars and searched history for 20 most similar examples. And drew the average future from those 20 examples. Almost always I got a straight line +-5 pts. At that time the profit of 5 pts seemed a little small to me, on the edge of noise. In the end I switched to MO, hoping that it would be bigger. But here it is the same
In general, it is similar to clustering.
Here the similarity of examples is maximised.
.


Classification/regression in trees maximises the future similarity of these examples. It makes the past similarity worse.

 
Forester #:

Also did a similar thing a long time ago, now with new thoughts I'm redoing it

 
Maxim Dmitrievsky #:

Pattern = the whole set of samples with high correlation between each other

There may be many such patterns in the entire dataset, each with a different number of matches in history

Without a matrix, you won't find anything, or you'll pick a fragmented part of it, and I'm counting all possible variants.

So you have to take each row and calculate the correlation with all the other rows, you get a matrix.

We have some three-dimensional data.

A row is an observation, a column is a trait.

The first row is like the most recent data.

X
      [,1] [,2] [,3]
 [1,]    1    4    1   посл. строка
 [2,]    4    1    2
 [3,]    1    2    5
 [4,]    2    5    3
 [5,]    5    3    2
 [6,]    3    2    3
 [7,]    2    3    3
 [8,]    3    3    1
 [9,]    3    1    5
[10,]    1    5    5
[11,]    5    5    2
[12,]    5    2    2
[13,]    2    2    1
[14,]    2    1    5
[15,]    1    5    5
[16,]    5    5    1
[17,]    5    1    1
[18,]    1    1    5
[19,]    1    5    5
[20,]    5    5    2
[21,]    5    2    2
[22,]    2    2    1
[23,]    2    1    4
[24,]    1    4    1
[25,]    4    1    4
[26,]    1    4    3
[27,]    4    3    2
[28,]    3    2    2

we can calculate the correlation of the last row with each of the other rows.

             cor
 [1,] 1 4 1  1.0000000
 [2,] 4 1 2 -0.7559289
 [3,] 1 2 5 -0.2773501
 [4,] 2 5 3  0.9449112
 [5,] 5 3 2 -0.1889822
 [6,] 3 2 3 -1.0000000
 [7,] 2 3 3  0.5000000
 [8,] 3 3 1  0.5000000
 [9,] 3 1 5 -0.8660254
[10,] 1 5 5  0.5000000
[11,] 5 5 2  0.5000000
[12,] 5 2 2 -0.5000000
[13,] 2 2 1  0.5000000
[14,] 2 1 5 -0.6933752
[15,] 1 5 5  0.5000000
[16,] 5 5 1  0.5000000
[17,] 5 1 1 -0.5000000
[18,] 1 1 5 -0.5000000
[19,] 1 5 5  0.5000000
[20,] 5 5 2  0.5000000
[21,] 5 2 2 -0.5000000
[22,] 2 2 1  0.5000000
[23,] 2 1 4 -0.7559289
[24,] 1 4 1  1.0000000
[25,] 4 1 4 -1.0000000
[26,] 1 4 3  0.7559289
[27,] 4 3 2  0.0000000
[28,] 3 2 2 -0.5000000

And we get this "similarity pattern" between the last/current row and the history.

You could do clustering and get something like that too.

                cor    cluster
 [1,] 1 4 1  1.0000000      10
 [2,] 4 1 2 -0.7559289       6
 [3,] 1 2 5 -0.2773501       5
 [4,] 2 5 3  0.9449112      10
 [5,] 5 3 2 -0.1889822       7
 [6,] 3 2 3 -1.0000000       3
 [7,] 2 3 3  0.5000000       1
 [8,] 3 3 1  0.5000000       4
 [9,] 3 1 5 -0.8660254       5
[10,] 1 5 5  0.5000000       1
[11,] 5 5 2  0.5000000       2
[12,] 5 2 2 -0.5000000       9
[13,] 2 2 1  0.5000000       4
[14,] 2 1 5 -0.6933752       5
[15,] 1 5 5  0.5000000       1
[16,] 5 5 1  0.5000000       4
[17,] 5 1 1 -0.5000000       9
[18,] 1 1 5 -0.5000000       5
[19,] 1 5 5  0.5000000       1
[20,] 5 5 2  0.5000000       2
[21,] 5 2 2 -0.5000000       9
[22,] 2 2 1  0.5000000       4
[23,] 2 1 4 -0.7559289       5
[24,] 1 4 1  1.0000000      10
[25,] 4 1 4 -1.0000000       8
[26,] 1 4 3  0.7559289       1
[27,] 4 3 2  0.0000000       7
[28,] 3 2 2 -0.5000000       9


but why we need to count the whole correlation matrix if we only need the state of affairs with respect to the current/last row I don't understand.

            [,1]        [,2]        [,3]        [,4]       [,5]       [,6]       [,7]
 [1,]  1.0000000 -0.75592895 -0.27735010  0.94491118 -0.1889822 -1.0000000  0.5000000
 [2,] -0.7559289  1.00000000 -0.41931393 -0.92857143  0.7857143  0.7559289 -0.9449112
 [3,] -0.2773501 -0.41931393  1.00000000  0.05241424 -0.8910421  0.2773501  0.6933752
 [4,]  0.9449112 -0.92857143  0.05241424  1.00000000 -0.5000000 -0.9449112  0.7559289
 [5,] -0.1889822  0.78571429 -0.89104211 -0.50000000  1.0000000  0.1889822 -0.9449112
 [6,] -1.0000000  0.75592895  0.27735010 -0.94491118  0.1889822  1.0000000 -0.5000000
 [7,]  0.5000000 -0.94491118  0.69337525  0.75592895 -0.9449112 -0.5000000  1.0000000
 [8,]  0.5000000  0.18898224 -0.97072534  0.18898224  0.7559289 -0.5000000 -0.5000000
 [9,] -0.8660254  0.32732684  0.72057669 -0.65465367 -0.3273268  0.8660254  0.0000000
[10,]  0.5000000 -0.94491118  0.69337525  0.75592895 -0.9449112 -0.5000000  1.0000000
[11,]  0.5000000  0.18898224 -0.97072534  0.18898224  0.7559289 -0.5000000 -0.5000000
[12,] -0.5000000  0.94491118 -0.69337525 -0.75592895  0.9449112  0.5000000 -1.0000000
[13,]  0.5000000  0.18898224 -0.97072534  0.18898224  0.7559289 -0.5000000 -0.5000000
[14,] -0.6933752  0.05241424  0.88461538 -0.41931393 -0.5765567  0.6933752  0.2773501
[15,]  0.5000000 -0.94491118  0.69337525  0.75592895 -0.9449112 -0.5000000  1.0000000
[16,]  0.5000000  0.18898224 -0.97072534  0.18898224  0.7559289 -0.5000000 -0.5000000
[17,] -0.5000000  0.94491118 -0.69337525 -0.75592895  0.9449112  0.5000000 -1.0000000
[18,] -0.5000000 -0.18898224  0.97072534 -0.18898224 -0.7559289  0.5000000  0.5000000
[19,]  0.5000000 -0.94491118  0.69337525  0.75592895 -0.9449112 -0.5000000  1.0000000
[20,]  0.5000000  0.18898224 -0.97072534  0.18898224  0.7559289 -0.5000000 -0.5000000
[21,] -0.5000000  0.94491118 -0.69337525 -0.75592895  0.9449112  0.5000000 -1.0000000
[22,]  0.5000000  0.18898224 -0.97072534  0.18898224  0.7559289 -0.5000000 -0.5000000
[23,] -0.7559289  0.14285714  0.83862787 -0.50000000 -0.5000000  0.7559289  0.1889822
[24,]  1.0000000 -0.75592895 -0.27735010  0.94491118 -0.1889822 -1.0000000  0.5000000
[25,] -1.0000000  0.75592895  0.27735010 -0.94491118  0.1889822  1.0000000 -0.5000000
[26,]  0.7559289 -1.00000000  0.41931393  0.92857143 -0.7857143 -0.7559289  0.9449112
[27,]  0.0000000  0.65465367 -0.96076892 -0.32732684  0.9819805  0.0000000 -0.8660254
[28,] -0.5000000  0.94491118 -0.69337525 -0.75592895  0.9449112  0.5000000 -1.0000000
            [,8]       [,9]      [,10]      [,11]      [,12]      [,13]       [,14]
 [1,]  0.5000000 -0.8660254  0.5000000  0.5000000 -0.5000000  0.5000000 -0.69337525
 [2,]  0.1889822  0.3273268 -0.9449112  0.1889822  0.9449112  0.1889822  0.05241424
 [3,] -0.9707253  0.7205767  0.6933752 -0.9707253 -0.6933752 -0.9707253  0.88461538
 [4,]  0.1889822 -0.6546537  0.7559289  0.1889822 -0.7559289  0.1889822 -0.41931393
 [5,]  0.7559289 -0.3273268 -0.9449112  0.7559289  0.9449112  0.7559289 -0.57655666
 [6,] -0.5000000  0.8660254 -0.5000000 -0.5000000  0.5000000 -0.5000000  0.69337525
 [7,] -0.5000000  0.0000000  1.0000000 -0.5000000 -1.0000000 -0.5000000  0.27735010
 [8,]  1.0000000 -0.8660254 -0.5000000  1.0000000  0.5000000  1.0000000 -0.97072534
 [9,] -0.8660254  1.0000000  0.0000000 -0.8660254  0.0000000 -0.8660254  0.96076892
[10,] -0.5000000  0.0000000  1.0000000 -0.5000000 -1.0000000 -0.5000000  0.27735010
[11,]  1.0000000 -0.8660254 -0.5000000  1.0000000  0.5000000  1.0000000 -0.97072534
[12,]  0.5000000  0.0000000 -1.0000000  0.5000000  1.0000000  0.5000000 -0.27735010
[13,]  1.0000000 -0.8660254 -0.5000000  1.0000000  0.5000000  1.0000000 -0.97072534
[14,] -0.9707253  0.9607689  0.2773501 -0.9707253 -0.2773501 -0.9707253  1.00000000
[15,] -0.5000000  0.0000000  1.0000000 -0.5000000 -1.0000000 -0.5000000  0.27735010
[16,]  1.0000000 -0.8660254 -0.5000000  1.0000000  0.5000000  1.0000000 -0.97072534
[17,]  0.5000000  0.0000000 -1.0000000  0.5000000  1.0000000  0.5000000 -0.27735010
[18,] -1.0000000  0.8660254  0.5000000 -1.0000000 -0.5000000 -1.0000000  0.97072534
[19,] -0.5000000  0.0000000  1.0000000 -0.5000000 -1.0000000 -0.5000000  0.27735010
[20,]  1.0000000 -0.8660254 -0.5000000  1.0000000  0.5000000  1.0000000 -0.97072534
[21,]  0.5000000  0.0000000 -1.0000000  0.5000000  1.0000000  0.5000000 -0.27735010
[22,]  1.0000000 -0.8660254 -0.5000000  1.0000000  0.5000000  1.0000000 -0.97072534
[23,] -0.9449112  0.9819805  0.1889822 -0.9449112 -0.1889822 -0.9449112  0.99587059
[24,]  0.5000000 -0.8660254  0.5000000  0.5000000 -0.5000000  0.5000000 -0.69337525
[25,] -0.5000000  0.8660254 -0.5000000 -0.5000000  0.5000000 -0.5000000  0.69337525
[26,] -0.1889822 -0.3273268  0.9449112 -0.1889822 -0.9449112 -0.1889822 -0.05241424
[27,]  0.8660254 -0.5000000 -0.8660254  0.8660254  0.8660254  0.8660254 -0.72057669
[28,]  0.5000000  0.0000000 -1.0000000  0.5000000  1.0000000  0.5000000 -0.27735010
           [,15]      [,16]      [,17]      [,18]      [,19]      [,20]      [,21]
 [1,]  0.5000000  0.5000000 -0.5000000 -0.5000000  0.5000000  0.5000000 -0.5000000
 [2,] -0.9449112  0.1889822  0.9449112 -0.1889822 -0.9449112  0.1889822  0.9449112
 [3,]  0.6933752 -0.9707253 -0.6933752  0.9707253  0.6933752 -0.9707253 -0.6933752
 [4,]  0.7559289  0.1889822 -0.7559289 -0.1889822  0.7559289  0.1889822 -0.7559289
 [5,] -0.9449112  0.7559289  0.9449112 -0.7559289 -0.9449112  0.7559289  0.9449112
 [6,] -0.5000000 -0.5000000  0.5000000  0.5000000 -0.5000000 -0.5000000  0.5000000
 [7,]  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000
 [8,] -0.5000000  1.0000000  0.5000000 -1.0000000 -0.5000000  1.0000000  0.5000000
 [9,]  0.0000000 -0.8660254  0.0000000  0.8660254  0.0000000 -0.8660254  0.0000000
[10,]  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000
[11,] -0.5000000  1.0000000  0.5000000 -1.0000000 -0.5000000  1.0000000  0.5000000
[12,] -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000
[13,] -0.5000000  1.0000000  0.5000000 -1.0000000 -0.5000000  1.0000000  0.5000000
[14,]  0.2773501 -0.9707253 -0.2773501  0.9707253  0.2773501 -0.9707253 -0.2773501
[15,]  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000
[16,] -0.5000000  1.0000000  0.5000000 -1.0000000 -0.5000000  1.0000000  0.5000000
[17,] -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000
[18,]  0.5000000 -1.0000000 -0.5000000  1.0000000  0.5000000 -1.0000000 -0.5000000
[19,]  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000
[20,] -0.5000000  1.0000000  0.5000000 -1.0000000 -0.5000000  1.0000000  0.5000000
[21,] -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000
[22,] -0.5000000  1.0000000  0.5000000 -1.0000000 -0.5000000  1.0000000  0.5000000
[23,]  0.1889822 -0.9449112 -0.1889822  0.9449112  0.1889822 -0.9449112 -0.1889822
[24,]  0.5000000  0.5000000 -0.5000000 -0.5000000  0.5000000  0.5000000 -0.5000000
[25,] -0.5000000 -0.5000000  0.5000000  0.5000000 -0.5000000 -0.5000000  0.5000000
[26,]  0.9449112 -0.1889822 -0.9449112  0.1889822  0.9449112 -0.1889822 -0.9449112
[27,] -0.8660254  0.8660254  0.8660254 -0.8660254 -0.8660254  0.8660254  0.8660254
[28,] -1.0000000  0.5000000  1.0000000 -0.5000000 -1.0000000  0.5000000  1.0000000
           [,22]      [,23]      [,24]      [,25]       [,26]      [,27]      [,28]
 [1,]  0.5000000 -0.7559289  1.0000000 -1.0000000  0.75592895  0.0000000 -0.5000000
 [2,]  0.1889822  0.1428571 -0.7559289  0.7559289 -1.00000000  0.6546537  0.9449112
 [3,] -0.9707253  0.8386279 -0.2773501  0.2773501  0.41931393 -0.9607689 -0.6933752
 [4,]  0.1889822 -0.5000000  0.9449112 -0.9449112  0.92857143 -0.3273268 -0.7559289
 [5,]  0.7559289 -0.5000000 -0.1889822  0.1889822 -0.78571429  0.9819805  0.9449112
 [6,] -0.5000000  0.7559289 -1.0000000  1.0000000 -0.75592895  0.0000000  0.5000000
 [7,] -0.5000000  0.1889822  0.5000000 -0.5000000  0.94491118 -0.8660254 -1.0000000
 [8,]  1.0000000 -0.9449112  0.5000000 -0.5000000 -0.18898224  0.8660254  0.5000000
 [9,] -0.8660254  0.9819805 -0.8660254  0.8660254 -0.32732684 -0.5000000  0.0000000
[10,] -0.5000000  0.1889822  0.5000000 -0.5000000  0.94491118 -0.8660254 -1.0000000
[11,]  1.0000000 -0.9449112  0.5000000 -0.5000000 -0.18898224  0.8660254  0.5000000
[12,]  0.5000000 -0.1889822 -0.5000000  0.5000000 -0.94491118  0.8660254  1.0000000
[13,]  1.0000000 -0.9449112  0.5000000 -0.5000000 -0.18898224  0.8660254  0.5000000
[14,] -0.9707253  0.9958706 -0.6933752  0.6933752 -0.05241424 -0.7205767 -0.2773501
[15,] -0.5000000  0.1889822  0.5000000 -0.5000000  0.94491118 -0.8660254 -1.0000000
[16,]  1.0000000 -0.9449112  0.5000000 -0.5000000 -0.18898224  0.8660254  0.5000000
[17,]  0.5000000 -0.1889822 -0.5000000  0.5000000 -0.94491118  0.8660254  1.0000000
[18,] -1.0000000  0.9449112 -0.5000000  0.5000000  0.18898224 -0.8660254 -0.5000000
[19,] -0.5000000  0.1889822  0.5000000 -0.5000000  0.94491118 -0.8660254 -1.0000000
[20,]  1.0000000 -0.9449112  0.5000000 -0.5000000 -0.18898224  0.8660254  0.5000000
[21,]  0.5000000 -0.1889822 -0.5000000  0.5000000 -0.94491118  0.8660254  1.0000000
[22,]  1.0000000 -0.9449112  0.5000000 -0.5000000 -0.18898224  0.8660254  0.5000000
[23,] -0.9449112  1.0000000 -0.7559289  0.7559289 -0.14285714 -0.6546537 -0.1889822
[24,]  0.5000000 -0.7559289  1.0000000 -1.0000000  0.75592895  0.0000000 -0.5000000
[25,] -0.5000000  0.7559289 -1.0000000  1.0000000 -0.75592895  0.0000000  0.5000000
[26,] -0.1889822 -0.1428571  0.7559289 -0.7559289  1.00000000 -0.6546537 -0.9449112
[27,]  0.8660254 -0.6546537  0.0000000  0.0000000 -0.65465367  1.0000000  0.8660254
[28,]  0.5000000 -0.1889822 -0.5000000  0.5000000 -0.94491118  0.8660254  1.0000000

What is the depth of thought here?

Is it that we will find all patterns at once? Do we need all patterns? Or do we need the one that corresponds to the current situation and the last observation?

 
mytarmailS #:

Do we have any three-dimensional data

A row is an observation, a column is a trait.

The first row is like the most recent data.

You can calculate the correlation from the last row to each of the other rows.

And we get this "similarity pattern" between the last/current row and the history

You can do clustering and get something like that too.


but I don't understand why we need to calculate the whole correlation matrix if we only need the state of affairs with respect to the current/last row.

What is the depth of thought here?

Is it that we will find all patterns at once, and we need all patterns or we need the one that corresponds to the current situation and the last observation?

There's no current, it's just a history search.

then you sort the patterns by your metrics, then you sew the best ones into the bot.

 
Maxim Dmitrievsky #:

there's no current, it's just a history search

Well you can search one pattern at a time for almost free in terms of RAM, why would you want to see all patterns in the whole history if at any given moment you can only be in one pattern, not all of them....


Or do I not understand something?

 
mytarmailS #:

Well you can search one pattern at a time for almost free in terms of RAM, why would you want to see all the patterns in the whole history if at any given moment you can only be in one pattern, not all of them...


Or do I not understand something?

you still need to go through all of them and choose the best ones to check with the new data.

#32456
Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Если объем добавок в позицию зависит от текущей просадки, тогда никакие ТС не работают.
Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Если объем добавок в позицию зависит от текущей просадки, тогда никакие ТС не работают.
  • 2023.09.21
  • www.mql5.com
Корреляционная матрица между строками заданных признаков. берется статистика по всем строкам как было в будущем в среднем. в тестере ищем корреляцию текущих значений с эталоном. Но я делаю в питоне и считаю корреляцию сразу для всех возможных пар
 
Maxim Dmitrievsky #:

you still need to go through all of them and pick the best ones to check with the new data.

#32456
correlation matrix between rows of given traits, then the most correlated rows are selected


You build a correlation matrix (large) of traits and then select the most correlated rows? Are these like patterns?

Reason: