Machine learning in trading: theory, models, practice and algo-trading - page 3253

 
I won't be able to do it that quickly, not until later tonight or the other day.
 
Maxim Dmitrievsky #:
I won't be able to do it so soon, closer to the evening, or the other day.
The weather is good))))
 
Memory overflow on small TFs. The memory overflows with 16 osu and swap file (swap on a mac) of 30gig. Well, there is a 50k by 50k correlation matrix, for example.

Pandas and Nampay crashes, it is not designed for large data. I'll try dask. Or filter the history.

In short, that MO does not pull on ordinary hardware, that this approach.
 
Maxim Dmitrievsky #:
Memory overflow on small TFs. The memory overflows with 16 osu and swap file (swap on a mac) of 30gig. There is a correlation matrix 50k by 50k, for example.

Pandas and Nampay crashes, it is not designed for large data. I'll try dask. Or filter the history.

In short, that MO does not pull on ordinary hardware, that this approach.
Are you doing quantisation? The main purpose of quantisation is to reduce data size. 4 byte float to 1 byte uchar or char.
16g matrix will become 4g.
.


And all calculations are done in RAM - you need to add more. Memory is inexpensive nowadays.

 
Forester #:
Do you do quantisation? The main purpose of quantisation is to reduce data size. 4 byte float to 1 byte uchar or char.
16g matrix will become 4g.
.


And all calculations are done in RAM - you need to add more. Memory is inexpensive nowadays.

I don't know how to calculate the correlation

It's not that easy to add memory to a macbook.) It is still extremely inefficient for time series, I need to redo it somehow

Especially since I'll go down another TF and need 5 times more resources.

Will it be efficient to calculate through SQL?

 
Maxim Dmitrievsky #:

I don't know how to calculate the correlation afterwards

It's not easy to add memory to a macbook ) It is still very inefficient for time series, I need to redo it somehow

Especially since I will go down to a lower TF and need 5 times more resources.

There is a double correlation calculation function in alglib. I think you can just change all the variables to char/uchar and everything will work. There are dozens of other used functions that need to be redone too. And from CMatrixDouble go to dynamic arrays or something else.

Pearson product-moment correlation matrix                        |
//| INPUT PARAMETERS:                                                |
//|     X   -   array[N,M], sample matrix:                           |
//|             * J-th column corresponds to J-th variable           |
//|             * I-th row corresponds to I-th observation           |
//|     N   -   N>=0, number of observations:                        |
//|             * if given, only leading N rows of X are used        |
//|             * if not given, automatically determined from input  |
//|               size                                               |
//|     M   -   M>0, number of variables:                            |
//|             * if given, only leading M columns of X are used     |
//|             * if not given, automatically determined from input  |
//|               size                                               |
//| OUTPUT PARAMETERS:                                               |
//|     C   -   array[M,M], correlation matrix (zero if N=0 or N=1)  |
//+------------------------------------------------------------------+
static bool CBaseStat::PearsonCorrM(const CMatrixDouble &cx,const int n,
                                    const int m,CMatrixDouble &c)


And if you have a homemade program, you will have to do quantisation too, if you don't have a ready-made package that does it.

 
Maxim Dmitrievsky #:
would it be efficient to read through SQL?
I don't know
 
Maxim Dmitrievsky #:
Memory overflow on small TFs. The memory overflows with 16 osu and swap file (swap on a mac) 30gig. There is a correlation matrix 50k by 50k, for example.

Pandas and Nampay crashes, it is not designed for large data. I'll try dask. Or filter the history.

In short, that MO does not pull on ordinary hardware, that this approach.

Why do you even need a correlation matrix?

There is a pattern, there is an array of history to compare the pattern with, what's the problem?

 
mytarmailS #:

Why do you need a correlation matrix at all?

There is a pattern, there is an array of history to compare the pattern with, what is the problem?

There is no pattern, patterns are searched by correlation matrix.

 
Maxim Dmitrievsky #:

would it be efficient to read through SQL?

never in my life

....

Try apache Arrow or DuckDB.

but still RAM is the fastest way.

.....

The problem itself is solved through G...ugly, your problem is the cor. matrix, which is not needed.