Dependency statistics in quotes (information theory, correlation and other feature selection methods) - page 2

 
alexeymosc:

Firstly, the cyclicality is not on the daily chart, but on the hourly chart! I wrote there, by the way.

And for day charts the result will not be cyclical, you are right.

Pardon me, we repeat for the hourly chart.

The original chart for 120 hours.

I don't see cyclicality in the graph, the trend is there. Let's check for normality:

On the Roast-Beer, it's not normal at all. Check the ACF:

There is a trend and no cyclicality - different result.

If there is a trend, there is no need to do statistical analysis. Let's detrend with the same Hodrick:

The residual is white noise. Look at the cycles in it:


Of course there is a wave, but it is not solid and not at all beautiful compared to yours. I think the whole difference is detrending. Without removal of detrended components it is impossible to make statistics.

 
You are doing something of your own. No connection at all to what I do ))) Let's start with the fact that I work with a series of increments. Then, if you take the values of this series modulo (i.e. pluses and minuses) and build an autocorrelogram, I bet you get a nice cyclicity with period 24. This is logically closer to my business.
 
Yesterday I added an extract on information theory to a Hubra article. It may help in understanding the process of finding important variables.
 
alexeymosc:
You're doing something of your own. No relation at all to what I'm doing ))) Let's start with the fact that I work with a number of increments. Then, if you take the values of this series modulo (that is pluses and minuses) and build an autocorrelogram, I'll bet you'll get a nice cyclicity with period 24. This is logically closer to my business.

Whatever you say. I calculate the increment as the difference of each successive one to the previous one. I get a graph:

For these increments I calculate ACF

Please note the last column is the probability of no correlation between the bars.

I take the square of the increments. Here is the graph:

These are the peaks of volatility, what does cyclicality of increments have to do with it? Maybe the cyclicality of volatility? That's interesting too. Let's check the cyclicality of accretion:

Well, there is no cyclicality here, and note the last column - extremely high probability of no correlation.

Two other figures are interesting. Let's check for normality of the increments:

Note that according to Jarque-Bera the probability of normality is equal to zero!

What kind of distribution is this? I wish it were normal. I've always found the idea of working with increments derived as the difference of the subsequent to the previous one questionable.

Finally. For some reason I cannot get your result.

 

faa1947, your calculations have nothing to do with the average flow of information that the topic starter was talking about. You are processing data on the last 5 days, while Alexei's graph is the result of processing data on watches over a dozen years. Alexei's is a statistic, while yours is a single, isolated case that proves nothing in the context of the discussion.

The periodicity shown by the topicstarter has nothing directly related to volatility or returns. It is not a price periodicity, but an in-forma-tsion-na-na. On the abscissa axis is lag and on the ordinate axis is average mutual information in bits. And autocorrelogram was mentioned by Alexey to confuse everyone :) It's not the autocorrelation of returns! We don't talk about it at all, because these information dependencies are obviously non-linear for the most part, and they cannot be detected by ACF returns at all.

Have you read the article on hubra carefully? It has nothing to do with your beloved stationarity, nor with the normality of the return stream, nor even with the conditional periodicity of volatility. Of course, it would be nice to check for stationarity here as well, but it would be of a very different kind, information-wise (if there is one).

2 Avals: I'm afraid I can't find a deep tick history to test your volatility hypothesis directly. Yes, and the calculations here would be completely insane in volume (they are already quite voluminous). We will judge what is found by direct prediction attempts (if it works, of course; there are many, many pitfalls).

 
Mathemat:

Alexey's is a statistic, while yours is a single, isolated case, which proves nothing in the context of the discussion.

I just want to note that when the number of observations exceeds 30, t-statistics converges to z-statistics. It is big news to me that 10000 observations is necessarily better than 1000. To reveal weekly periodicity for hourly data - you need several weeks in hours. But that's beside the point.


The periodicity shown by the topic-starter has nothing to do with volatility or returns. It's not a price periodicity at all, it's an in-form-ma-tsion-na-na.

Much more important is the methodological value of the approach. It is axiomatic to me that any mathematical calculation must have a qualitative economic interpretation. Information periodicity is some formula that reveals periodicity in the data, which is inherently an incremental relationship. Going back we must be able to go back to the original time series, find these places and find an economic explanation, i.e. going back to prices is mandatory, otherwise just another mathematical cleverness. That's why I was linking this topic to regular cycles.
 
Mathemat: This is not autocorrelation of returns! We are not talking about it at all, as these information dependencies are obviously non-linear for the most part and are not detectable by ACF returns at all.

Actually, the usual methods of mathematical statistics were applied at the end of the article.

I make up for my misunderstanding and take the ratio of neighbouring prices.

Graph of the price ratio:

Check for normality

Surprisingly, normality is strictly rejected.

We plot the ACF - it's the dependencies between the lags + the partial ACF that is cleaned from the dependencies in the ACF

Note the last column - very high probability of no dependencies.

I have a clear economic explanation for these pictures, well supported by the quotes chart. How is it confirmed on the initial quotes, what is the economic justification? Without answers to these questions, I cannot understand the meaning of "information dependence".

 
The easiest answer for you is. You are using autocorrelation, i.e. you are only looking for linear dependencies. Mutual information indicates the presence of arbitrary dependencies, which is where all the difference comes from. Also, I experimented with statistically redundant samples of thousands and tens of thousands of increments, and you took one week. That week could be anything, it's a special case. There is no significance in your results.
 
faa1947: Informational periodicity is some formula that reveals periodicity in data that is inherently an incremental relationship.

Wrong fundamentally. There is no question of any periodicity in data like an incremental relationship.

The information dependence is revealed, which does not have to lead to periodicity of the ratio of increments at all. That's the thing about Data Mining, it makes it possible to identify structures that are not on the surface.

Going back we must be able to go back to the original time series, find those locations and find an economic explanation, i.e. going back to prices is mandatory, otherwise just another mathematical cleverness. That's why I linked this topic to regular cycles.

Yes, they should, I'm not arguing. There doesn't have to be an economic explanation. It's enough to go back to prices. But your linking this phenomenon to normal cycles is wrong. I am not so blind as not to notice the lack of pronounced periodicity on the chart.

Alexey has already told you about the difference between linear and non-linear dependencies.

 
alexeymosc:
The easiest answer for you. You are using autocorrelation, that is, you are looking exclusively for linear dependencies. Mutual information indicates the presence of dependencies of an arbitrary kind, hence all the difference. Also, I experimented with statistically redundant samples of thousands and tens of thousands of increments, and you took one week. That week could be anything, it's a special case. There is no significance in your results.

Also, I was experimenting with statistically redundant samples of thousands and tens of thousands of increments, and you took one week. That week could be anything, it's a special case. There is no significance in your results.

It seems to me that increasing the sample size is of interest only within the limit theorem of probability convergence to the normal law. I want to disappoint you that if we do not set ourselves such an objective, then simply increasing the sample size does nothing. Below I give a sample increase of 10 times.

The graph of increments as the ratio of the next price to the previous one:

The square of this graph:

The graph is somewhat similar to yours. I had a question about the economic interpretation of this graph, but you didn't give an answer


Next:


If you compare with a sample 10 times smaller, nothing has changed!



Something new here: the probability of no relationship is zero.


Mutual information indicates the presence of dependencies of an arbitrary kind, hence all the difference.

I would also be careful about "linearity" and "non-linearity", because this question can and must be put in the framework of the model, by which you approximate the time series. By analysing the coefficients of this model you can conclude that these coefficients are: constants (or almost constants), deterministic functions or stochastic functions. This is a perfectly concrete and constructive process of analysing the type of dependencies. And what is constructive about discovering this information dependence? And again, how do you see it on the original time series?