Dependency statistics in quotes (information theory, correlation and other feature selection methods) - page 23

 
joo:
... and in what combination?"
It's more complicated than that. Much more complicated and computationally expensive. Let me put it this way: selecting a set of informative variables is easy, removing redundant (mutually informative) variables is more difficult; and selecting pairs, triples.... variables whose combinations influence the target variable is exponentially more difficult, first, because of the enormous amount of computation.
 
alexeymosc:
It's more complicated than that. Much more complicated and computationally expensive. Let me put it this way: it is easy to select a set of informative variables, it is more difficult to remove redundant (mutually informative) variables; and it is more difficult to select pairs, triples.... of variables whose combinations influence the target variable is exponentially more difficult, first, because of huge calculations.
and how is the search carried out, not by brute force?
 

It is very strange that the resulting distribution for GARCH(1,1) looks normal. More to the point, it simply cannot be. A trademark of such models is fat-tailedness and elongation - just to imitate real market distributions. Apparently, the resulting chart is just not representative or the period of volatility accounting (P=1, Q=1) is not long enough which is why it shows signs of thinness.

But another thing is interesting:

On the calculated graph the trace of GARCH(1,1) influence is clearly seen, namely, on the first leg there is a significant perturbation of "relationship" and uncertainty on all other data. This is exactly how it should be, because the model only remembers the volatility of the previous bar. I'm sure that the first three lags will be clearly marked for GARCH(3,3), the first twenty lags for GARCH(20, 20), etc.

I'll try to fight with MathLab and obtain the GARCH(20, 20) data. If their analysis shows correlation of 20 periods, the matter is clear - the formula shows correlation of volatility.

 
C-4:


I will try to fight with MathLab and get GARCH(20, 20) data. If their analysis will show correlation for 20 periods, the case is clear - the formula shows correlation of volatility.

It doesn't. I already know that the formula takes THIS into account... Take a look at the 5 minute chart. The obvious relationship of vol at the closest lags and at lag 288 is a daily cycle. If you want, go ahead, though. I'll check it.

We are trying to find "other dependencies", because the mutual information absorbs all possible dependencies. We must be able to separate them.

 

EURUSD H1.

I by original series (same discretisation by 5 quantiles):

Sum of mutual information: 3.57 Bits! The highest value of all the timeframes tested.

Now let's take returns ^ 2, get rid of the sign, and study volatility:

It looks like. But the sum of I = 5.35 Bits.

Makes sense! After all, the uncertainty on the net volatility series is smaller.

And what happens if you subtract one from the other?

 
alexeymosc: This is what a frequency matrix looks like (1st lag is the target variable) for random data with 5-minute characteristics.

Bloody hell. Your matrices gave me the impression of a breakthrough and a logit. I looked it up on the search engine, and what's it all about... then realised it was just probability and its logarithm.

P.S. Lu-u-u-u-dee, is everything clear to you in these tables? If you don't understand - ask, ask. For stupid questions, we will not beat (because I'm a little bit dummy feel myself here).

For once there is a decent topic, which is almost no humor, snide and focus on immediate fish instead of fishing rod - but there is a very interesting process of searching for truth...

Where in econometrics can one find such questions?

 
anonymous:


True, the market is more complex. But that is no reason to ignore the observed phenomenon

I don't do scientific research on the market. The specific objective is to make a prediction one step ahead.

About the tests: heteroscedasticity is a generally accepted fact in the literature

It's a slogan that says it's seen something somewhere. Reading these publications doesn't make me any more money.

To be precise, heteroscedasticity tests for heteroscedasticity are not only for reterms, which are calculated using different formulas, but for model residuals, which is the standard in econometrics packages.

Sometimes heteroscedasticity tests are applied to predictors and model errors.

If by "model errors" you mean the residual from the model = the difference between the original quote and the model, then I agree. And heteroscedasticity tests are applied not sometimes, but always. If there is heteroscedasticity in the residual from the model, then it is modelled, and the ultimate goal of the aggregate model is to obtain a stationary residual (mo and variance are a constant). If mo and/or variance are variables - then prediction is not possible, as prediction error becomes a variable.

 
Mathemat:?

For once there's a decent topic...

I would like to point out that all the posts that questioned the decency of the topic were ignored.

 
do not generate SBs based on GARCH. You need to take a real series and generate a SB based on real volatility. I posted a script here https://forum.mql4.com/ru/41986/page10 which replaces the offline history of a real instrument with a SB using tick volumetric. Such a SB will almost 100% replicate the real vol. GARCH, etc. they do not take into account many nuances such as different wave cycles and many others. If there is any difference between this row of SB and the row it is generated from, it is more interesting :)
 
alexeymosc:

EURUSD H1.

I on the original series (same discretisation by 5 quantiles):

Do I remember correctly that the raw data here is the percentage incremental module?

But if so, it is in fact the same volatility (i.e. its monotonous and unambiguous function), we can expect that all volatility-related effects will appear here too, albeit in a somewhat filtered form. And since the effects of volatility seem to far outweigh all other market phenomena, the prospect of seeing "something else" in their background looks rather problematic. I repeat, I think it is more promising to try to consistently exclude known but "useless" effects from the raw data.

By the way, Alexey(Mathemat), do you have the raw data as modules too?