Dependency statistics in quotes (information theory, correlation and other feature selection methods) - page 16
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
And I have conceptualised a further course of research (writing for those interested, not those swearing in formula language). If we have volatility dependence (dependence on both nearest lags and cyclic - 24-hour for H1), then why not calculate the same mutual information for returns taken modulo (that will be pure volatility), and then subtract the obtained amount of information from the similar (where returns were with +- sign). If everything is calculated correctly, then we will have a sign dependence in the backlog. This case can be compared with noise time series.
Even if something non-trivial is found, there will always be a question of correct application of formulas and most importantly how to apply it in practice. I.e. the interest is purely academic)))
But could you, Alexey, more clearly formulate (using your table) what hypothesis about the distribution of returns corresponds to the chi-square estimation?
The prima facie "brown" one, or something cooler?
None. When estimating the chi-squared relationship, no hypothesis about the distributions is invented. It turns out to be a non-parametric criterion.
None. No hypotheses about distributions are invented when estimating the chi-squared relationship. So it's a non-parametric criterion.
What do you mean none?
Can you write down the dependence to be estimated?
Maybe after the formulas I'll get the idea. Or are you hoping for a uniform distribution...
;)
Even if something non-trivial is found, there will always remain the question of the correct application of the formulas and most importantly how to apply it in practice. I.e. the interest is purely academic)))
Can you write down the estimated dependency?
Maybe after the formulas I'll get a feel for it. Or are you hoping for a uniform distribution...
No, not at all, I tell you the truth. Hypotheses non fingo.
Have you ever tried applying the chi-square criterion of independence of values? I didn't know how to do it myself a few months ago, but I just did it. Try it, there's nothing difficult there. Find a matstat manual for some out-of-state institution and read it. The simpler and clearer the method will be described, the faster you will understand it.
Actually, there are several chi-square criteria. But I'm talking about the one that evaluates the independence of the values. This criterion doesn't evaluate it based on a priori given distributions. It only tests the hypothesis of independence of two variables at a given level of significance (usually 0.95 or 0.99). The closer the significance level is to 1, the more reliable the conclusion.
The ideological basis of the criterion is the usual formula for the joint probability of two quantities. On the fingers: if P(X=x1 && Y=y1) = P(X=x1)*P(Y=y1) for any admissible x1,y1, then X and Y are independent. And vice versa. And the chi-square calculates, roughly speaking, a weighted sum of deviations from this equality for all possible cases and compares it to some boundary value. If the obtained sum is greater than this boundary value, then the hypothesis of independence of the variables (Null) is not accepted. If it is less, then the Null hypothesis is not rejected.
None at all, I'm telling you the truth. Hypotheses non fingo.
Have you ever tried applying the chi-square criterion of independence of values? I didn't know how to do it myself a few months ago, but I just did it. Try it, there's nothing difficult there. Find a matstat manual for some out-of-state institution and read it. The simpler and clearer the method will be described, the faster you will understand it.
In fact, there are several chi-square criteria. But I'm talking about the one that evaluates the independence of the values. It does not evaluate independence based on a priori distributions. It only tests the hypothesis of independence of two variables at a given level of significance (usually 0.95 or 0.99). The closer the significance level is to 1, the more reliable the conclusion.
The ideological basis of the criterion is the usual formula for the joint probability of two quantities. On the fingers: if P(X=x1 && Y=y1) = P(X=x1)*P(Y=y1) for any admissible x1,y1, then X and Y are independent. And vice versa. And the chi-square calculates, roughly speaking, a weighted sum of deviations from this equality for all possible cases and compares it to some boundary value. If the obtained sum is greater than this boundary value, then the hypothesis of dependence of the values is accepted. If less, the hypothesis of independence is not rejected.
Don't be ridiculous...
You were asked about the hypothesis of the distribution, and you're about to learn about this method only yesterday.
I persistently want to know - what is the null hypothesis? That they 're independent?
Zero - "the returns are independent". It's not funny, really!
No hypothesis about the distributions I tested! That's another chi-square. And I only tested the dependence!
If you want to check the distribution, be my guest. It's Laplacian with decent accuracy.
Zero - 'returns are independent'. Nothing funny, really!
No hypothesis about the distributions I tested! That's another chi-square. And I only tested the dependence!
If you want to test the distribution, be my guest. It's Laplacian with decent accuracy.
Okay.
Let's take a look.
---
Is the hypothesis of independence the same as the hypothesis of a uniform distribution or a normal distribution?
That's what I want to find out.
Then with "Laplace-like" it all makes sense.
And I have conceptualised a further course of research (writing for those interested, not swearing in formula language). If we have dependencies on volatility (dependency on nearest lags as well as cyclic - 24-hour for H1), then why not calculate the same mutual information for returns taken modulo (that would be pure volatility), and then subtract the obtained amount of information from the similar (where returns were with +- sign). If everything is calculated correctly, then we will have a sign dependence in the backlog. This case can already be compared to a noise time series.
Can I put it in my own way?
So, the chosen approach shows that there are dependencies. The most obvious, reasonable and visible to the naked eye is the daily periodicity of volatility.
Therefore, the next logical step in my research would be to try to exclude this obvious and very strong dependence from the data and see if our (your) method shows the presence of other dependencies.
As a method of elimination I propose to simply relate the increments to the daily volatility profile.