Dependency statistics in quotes (information theory, correlation and other feature selection methods) - page 7

 
TheXpert:

How can they be discrete if you are working with relative increments ?

And the second question -- what is the number of characters ) ?


And we discretise them. There are two main schemes: they are quantiles (making the PDF equal) and equal spacing (the PDF is very similar to the result on the raw data).

The number of characters is set by the researcher.

 
Mathemat: And for me in this task, TI is primarily a datamining tool. What to do with this data is another matter. The important thing is that we do see something that is not visible to the naked eye. And what other sciences are you talking about?

I open in package STATISTICS the tab "data mining" - about 20 names of sections and separate procedures. All this is perfectly in line with the textbooks and monographs in this field, but nothing about TI for data mining.

 
alexeymosc:
Obviously, it seems that in our interpretation of the process, these are discrete values of returns.

If you don't involve "economic and other meanings", then what processes are we talking about? A process is a "physical" phenomenon, it has causes and consequences. For example, the process of an apple falling on Newton's head. In the application to markets, the process of buying and selling. Where is all of this in the marketplace?

Next point. The ter.ver, on which the ter.inf is based, requires the independence of the events in question, or symbols. Otherwise the use of these mathematical apparatuses is incorrect. Where is the independence inherent? Suppose I, from speculative intentions, buy some shares (I mean the real market, not the brokerage house), and a return happened in prices. After a while, after some time, I decided to sell these shares and another return happened. These two events are quite clearly linked to each other via me and my speculative intentions. Since there are many fools like me in the market, and they all buy and sell in the same way, all returns turn out to be linked - dependent. So why are you trying to apply a mathematical apparatus to dependent events from independent ones? Is it correct?

On this, everything is far from obvious.

 
faa1947:

I open in the package STATISTICS the tab "data mining" - about 20 names of sections and individual procedures. All this is perfectly in line with the textbooks and monographs in the field, but nothing about TI for data mining.


This is a flaw in Statistics. I use it myself, by the way.
 
alexeymosc:

And we discretise them. There are two main schemes: these are quantiles (making the PDF equal) and equal spacing (the PDF is very similar to the result on the raw data).

The number of characters is set by the researcher.

I.e. if we don't know the alphabet of the market, let us come up with one ourselves, and that is what we will study.

I could be wrong, of course, and I do it not infrequently, but this approach does not strike me as a good one.

 
HideYourRichess:

I.e. if we don't know the alphabet of the market, let's make it up ourselves and study it exactly.

I could be wrong, of course, and I do it not infrequently, but this approach does not strike me as a good one.


You see, I don't want to argue and I don't like it, but that's what researchers do for continuous variables, they discretize them. There is no other way, the alternative is not to apply TI to continuous variables at all.

How to do this is a separate topic. There is a methodology for determining the number of characters of an alphabet through continuous value distribution analysis (called Parzen Windows - google rules...), but I haven't used it in this case and I think I've lost a bit.

 
You don't seem to have understood what you were talking about at all. Well, good luck.
 
HideYourRichess:
You don't seem to have understood what you were talking about at all. Well, good luck.

I understand your reasoning about independence of increments. I'm not sure I can completely agree. I would also consult Mathemat on this subject.
 
HideYourRichess:

If you don't involve "economic and other meanings", then what processes are we talking about? A process is a "physical" phenomenon, it has causes and consequences. For example, the process of an apple falling on Newton's head. In the application to markets, the process of buying and selling. Where is it all in the market?

Next point. Ter.ver, on which ter.inf. is based, requires independence of the events in question, or symbols. Otherwise the use of these mathematical apparatuses is incorrect. Where is the independence inherent? Suppose I, from speculative intentions, buy some shares (I mean the real market, not the brokerage house), and a return happened in prices. After a while, after some time, I decided to sell these shares and another return happened. These two events are quite clearly linked to each other via me and my speculative intentions. Since there are many fools like me in the market, and they all buy and sell in the same way, all returns turn out to be linked - dependent. So why are you trying to apply a mathematical apparatus to dependent events from independent ones? Is it correct?

On this, everything is far from obvious.


In this case, independence is not required as I understand it, but is precisely the subject of evaluation.
 
Many examples of application of TI, in Russian, refer to the analysis of the alphabets of Russian, and other languages, as well as to the analysis of words and phrases (word sequences). And all these characters are not statistically independent a priori, and by these examples the mutual information is estimated, a value that shows the amount of dependence. So the a priori independence of the values under study is not a prerequisite for the correct application of TI.