Machine learning in trading: theory, models, practice and algo-trading - page 637

 
Alexander_K2:

:)))) In this case we should call Warlock for help :)))).

The only thing I can say is that it is the nonentropy that is responsible for the trend/flat state. The trend is a "memory" of the process, its "tail" of distribution and the non-entropy is huge, while in the flat state it is almost zero. I only deal with it myself, but I understand the importance of this little-studied parameter.

Nothing can help it now. The trend/fleet switch is like a spoon to the dinner, it should be good in time.

The red line is actual. The red line is actual, the blue line is model. In this example, it is late. Bottom picture, the model is late.


 
Mihail Marchukajtes:
Brothers, there is only one small step left for us, but it will be a huge step for the whole mankind.....

Misha, I believed! I knew it! Handsome! You are the best!))

 
Vizard_:

Misha, I believed! I knew it! Handsome! You're the best!))

Thank you for your support FRIEND!!!!! I really need it. Just spell it right next time you're addressing me. HandsomeGGGGG!!!! That sounds so much more solid....

 

Let's continue...

There are two columns with values, I calculated the probability of an event in column A and the probability of another event in column B. The condition is very simple >0. I counted the number of events greater than zero in column A and divided by the total number of rows, and I also counted the number of values in column B and divided by the total number of observations.

Next, using my values, how do I calculate the conditional probability???? Considering that I have two columns and both have 40 rows????

 
Mihail Marchukajtes:

Okay, I'll come at it from the other side. Suppose I have an input set of 100 inputs. I calculate the entropy for each input and get results from -10 to 10. Question: Which inputs are preferable to take????

Let's say I have 10 inputs below zero, the rest are higher, BUT all values lie between -10 and 10.....

Mihail Marchukajtes:

And also... I can not calculate the mutual information .... Or rather conditional probability, for further calculation of entropy and VI.

Can someone explain on the fingers or better example.

first column 40 lines input variable

second column 40 lines output....

Did a lot of work overnight to identify the hypothesis. Stuck on these things and no way. Please help and I will give my thoughts on my hypothesis...


I haven't studied information theory, but I have some experience with entropy in R.

Basically, the higher the entropy the more chaos in the data. A predictor with high entropy is rather poorly related to the target. Conversely, low entropy indicates that the target is easily determined from the predictor.

Non-entropy is the opposite of entropy, it brings no new knowledge compared to entropy, it is just introduced for convenience. If the predictor has a large entropy, then the nonentropy is small. If the entropy is small, the non-entropy is large. It's like heat and cold, light and darkness, etc., one flows seamlessly into the other.

But that's not all, there's also cross-entropy. This is how the two predictors together are related to the target, high cross-entropy is bad, low cross-entropy is good. In machine learning it often happens that two predictors with high entropy when used together give low cross-entropy, which is what we all need. Even though each of the predictors may be badly related to the target by itself (high entropy for both), but together they can hit the bullseye (low cross-entropy). So you can't just measure the entropy of each predictor separately, and choose a set according to the estimate. You have to pick the whole set of predictors with low cross-entropy, I for example don't look at what their entropy is individually at all.

Here are some examples.

1) Predictor with high entropy. It makes it impossible to predict the targeting class at all.

2) Predictor with low entropy. If you look closely, if the value of the predictor is from 0 to 0.25 or less than 0.4, then the class value = 1. Otherwise, class = 2. This is a very handy predictor to use in MO.

3) Two predictors, each has high entropy, and the model will never be able to predict the target using only the first or only the second predictor. But by drawing them together (the X-axis is the value of the first, and the Y-axis is the value of the second) we can immediately see that they together give very good information about the target class (same sign for both predictors = class1, different sign = class2). This is an example of low cross-entropy.


 
Mihail Marchukajtes:

Thank you for your support FRIEND!!!!! I really need it. Just spell it right next time when you refer to me. HandsomeGGGGG!!!! That sounds so much more solid....

That's why we love you, Teacher! Always tips, always corrects! You are our dear man!!!))

"Mishanin's Witnesses. February 2018.


 
Notwithstanding the reproach, I will continue. I calculated the conditional probability as follows. The number of data satisfying the condition in the first column is 19, in the second 20. To find the conditional probability. I add up 19+20 and divide it all by the total number of entries, which is 80 (40 in the first column and 40 in the second). And you have to divide by the probability.... If I have column A-entry and column B-exit, then to know the conditional probability of entry to exit you need to divide the total probability by the probability of the entry column. Is this correct???
 
Mihail Marchukajtes:

Again, a question. There are 8 models of NS. At the current signal the entropies of the NS outputs are

5.875787568 -5.702601649 5.066989592 9.377441857 7.41065367 1.401022575 4.579082852 5.119647925

Which one should I choose? The red one, because it has negative entropy, or the blue one, which is closer to zero. I will say that these two models look in different directions, but we know that time will show who was right.... In the end, one or the other will win. Who's thinking about it?

Summarizing what I wrote above - you need to first determine the cross-entropy of the predictor combinations, and take the predictor combination where the cross-entropy is lower. Strange that it is negative, in my case it is just from infinity to zero, but never mind, take the most negative one then.

Entropy of NS output - in my opinion is bad as an estimation of the neuronics itself. You can adjust outputs of the network to give a correct answer in 100% of cases, and it will have a low entropy, but it may have a large overfit. Overfeeds are bad.

 

The thing is that I found an add-on for Excel, which calculates entropy. I finished it the way I wanted, without changing the logic of the calculation, and consequently I have this question. Explain what is going on here in the calculation of these cyclopes. What exactly they do, I understand, but in some other way.... HM....

For Each Value In ActiveSheet.Range(Data1)

X(I) = Value

Nn = Nn + Value

I = I + 1

Next Value

In this loop, the array X is written and there is also an accumulative sum, as if there is no question, but further....

For I = 1 To N

X(I) = X(I) / Nn

Next I

We divide each element of the array by the total sum of the values, I suspect this is exactly the search for frequency. Right????

Okay... I think I get it, we need to add up all the frequencies to find the probability. Right?

 
Dr. Trader:

Summarizing what I wrote above - you first need to determine the cross-entropy of predictor combinations, and take the combination of predictors where the cross-entropy is lower. It is strange that it is negative, in my case it is just from infinity to zero, but never mind, take the most negative one then.

Entropy of NS output - in my opinion is bad as an estimation of the neuronics itself. You can adjust outputs of the network to give a correct answer in 100% of cases, and it will have a low entropy, but it may have a large overfit. Overfeeding is bad.

To find the cross entropy you first need to find the conditional entropy of two events, which is what I'm doing now....

And the estimation of model entropy is needed at the moment when the model is in OOS. Having given out a signal we can calculate entropy of this signal and use it to draw conclusions. The entropy of the signal has increased. Fuck it, it fell - it's our steam locomotive....

Reason: