Machine learning in trading: theory, models, practice and algo-trading - page 3256
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
It must be some peculiarities of Python, because the algorithm is the same in MQL.
This is a frontal variant. The sieve is even faster.
Let's assume one million bars. The length of the string is 10. Then 1d-array for 10 million double-values is 80 Mb. p.3. - Well, let it be 500 Mb in terms of memory consumption. What haven't I taken into account?
Correlation of the matrix of all rows to all rows is considered many times faster than cycles (1 row to every other row) and even loop (1 row to all rows). There's some kind of acceleration there due to the algorithm. I checked it on the alglib version of correlation calculation.
Give me the code, let's check it.
MathAbs () seems unnecessary to me
It must be some peculiarities of Python, because the algorithm is the same in MQL.
This is a frontal variant. The sieve is even faster.
Let's assume one million bars. The length of the string is 10. Then 1d-array for 10 million double-values is 80 Mb. p.3. - Well, let it be 500 Mb in terms of memory consumption. What haven't I taken into account?
I was baffled myself that no library in Python can calculate it, so I ended up confused.
Pandas overflows the RAM, overhead is gigantic.
Nampai just crashes and kills the interpreter session :) without displaying any errors.Give me the code, let's check it out.
functions
PearsonCorrM - Correlation of all rows to all rows is the fastest.
//+------------------------------------------------------------------+
//| Pearson product-moment correlation matrix |
//| INPUT PARAMETERS: |
//| X - array[N,M], sample matrix: |
//| * J-th column corresponds to J-th variable |
//| * I-th row corresponds to I-th observation |
//| N - N>=0, number of observations: |
//| * if given, only leading N rows of X are used |
//| * if not given, automatically determined from input |
//| size |
//| M - M>0, number of variables: |
//| * if given, only leading M columns of X are used |
//| * if not given, automatically determined from input |
//| size |
//| OUTPUT PARAMETERS: |
//| C - array[M,M], correlation matrix (zero if N=0 or N=1) |
//+------------------------------------------------------------------+
static bool CBaseStat::PearsonCorrM(const CMatrixDouble &cx,const int n,
const int m,CMatrixDouble &c)
PearsonCorr2 - Correlation row to row. For a full matrix: the 1st row is checked with all rows after 1, the 2nd row with all rows after the 2nd, etc.
//| Pearson product-moment correlation coefficient |
//| Input parameters: |
//| X - sample 1 (array indexes: [0..N-1]) |
//| Y - sample 2 (array indexes: [0..N-1]) |
//| N - N>=0, sample size: |
//| * if given, only N leading elements of X/Y are |
//| processed |
//| * if not given, automatically determined from |
//| input sizes |
//| Result: |
//| Pearson product-moment correlation coefficient |
//| (zero for N=0 or N=1) |
//+------------------------------------------------------------------+
static double CBaseStat::PearsonCorr2(const double &cx[],const double &cy[],
const int n)
And through PearsonCorrM2 you can write the full matrix into 1 matrix and another row to be checked. So you can check 1 row to all rows at once.But there is obvious unnecessary work, because for the 10th row the correlation with rows above 10 is already calculated.
const int n,const int m1,const int m2,
CMatrixDouble &c)
A 20k*20k Numpy matrix weighs 2gb.
400 million double numbers weighs 3 gigs.
MathAbs () seems redundant to me.
You can also check the signs separately. Not the point.
400 million double numbers weigh 3 gigabytes.
It's understandable, there's not enough memory for all this joy.
in statistics.mqh.
functions
PearsonCorrM - Correlation of all rows to all rows is the fastest.