Machine learning in trading: theory, models, practice and algo-trading - page 1957

 
elibrarius:

That's right. It would be good to describe the sequence of actions right away...
Once again I thought about your description, I assume the following sequence:

1. Calculate correlation of all predictors on train
2. Build the tree
3. On the last split, remember e.g. the last 100 best splits. Reserve up to 100, so there will be plenty to choose from.
4. From these 100 choose 5 non-correlated with the predictor of the best split and non-correlated with each other.

Further it is not clear which of these 5 different splits to choose?
If random - then there will be an analogue of a random forest, which gives each tree random predictors and builds a tree on them.
If average, then again analog of random forest, the forest then out of random trees finds arithmetic average.

Now you got it right!

Random forest s are just random, because they're full of trash, and the conditions described are not necessarily met, although there can be similar situations, and it's possible that a successful model can be found exactly when there's a similar difference. Here, however, it will be a more controlled process.

The weights of each split in the sheet will be weighed, of course, and maybe we will give the same coefficients, we can pick up the coefficients on the same history. That's what I do now in general when assembling a model from leaves.

 
Valeriy Yastremskiy:

Nah, analog actions are sum, subtraction, multiplication, division, and possibly more complex logarithmic dependencies, power relations. And these are not calculations, but analog gauges in each cell. And DACs and ADCs are the input output, they don't take part in calculations, they provide the digital.

In the Neumann architecture both procedures and data are stored in memory and there is no parallel access to procedures and data, access to data, then to procedure, and back to data, hence limitations in data processing. Here procedures are stored in each cell by a small device and there is access to procedure at once, with access to data.

Here's what I don't understand about the data, is it conditionally speaking, each instruction has direct access to memory and calculation results without conveyor?

 
Aleksey Vyazmikin:

Now you've got it right!

Random forests are random because they are full of debris, and the conditions described are not necessarily met, although there may be similar situations, and it is possible that a successful model is obtained with similar differences. Here, however, it will be a more controlled process.

The weights of each split in the sheet will be weighed, of course, and maybe we will give the same coefficients, we can pick up the coefficients on the same history. That's what I do now, in general, when assembling a model from leaves.

I did not understand the final step, - which of the 5 splits to choose?
 
elibrarius:
I did not understand the final step, - which of the 5 splits to choose?

It is necessary to take into account the readings of all 5 splits, due to this the stability is increased.

Suppose we give 0.6 weight to the best split, and 0.1 weight to the other four, and if the sum gives 0.8 or other index, determined by sampling, then we consider that the answer is true "1" or other class, which is expected in the sheet.

We also need to check for Recall , i.e., how many recalls have splits on that subsample.
 
Aleksey Vyazmikin:

What I don't understand about the data is that, conventionally speaking, each instruction has direct access to memory and calculation results without a conveyor?

There is no 'data', only electrons whose current is controlled by transistors and so on. The NS architecture itself is printed on the board, not in digital form. They have long been making such analog neural networks in the form of coprocessors, for example in iPhones.

There is nothing new in the article.
 
Maxim Dmitrievsky:

There is no 'data', only electrons whose current is controlled by transistors and so on. The NS architecture itself is printed on the board, not in digital form. For a long time they already make such analog neural networks in the form of coprocessors, for example in iPhones.

There is nothing new in the article.

And I realized that we are talking about dynamic calculations, not static, predetermined.

 
Aleksey Vyazmikin:

And I understood that we are talking about dynamic calculations, not static, predetermined.

For example, the signal from the iPhone camera matrix is fed directly to the analog NS, bypassing digitization. The NS does image preprocessing to improve quality (filters out noise, etc.)

and then it gets converted into digital pics

 
Aleksey Vyazmikin:

And I understood that we are talking about dynamic calculations, not static, predetermined.

Rough analogy of electronic gate valves and compressors. And calculations can also be dynamic, if we change the input signal, we get dynamic output at the output.

 
Aleksey Vyazmikin:

It is necessary to take into account the readings of all five splits, because of this the stability is increased.

Suppose we give 0.6 weight to the best split, and 0.1 weight to the other four, and if the sum gives 0.8 or other index determined by sampling, then we assume that the answer is true "1" or other class expected in the sheet.

You also need to check for Recall, i.e. how many recalls the splits have on that subsample.
We mix the cleanest split with less clean ones. I.e. we worsen the result on the tray, in principle it is not important for us. But also it is not the fact that it will improve the result on the test, i.e. generalizability. Someone should try it... Personally, I don't think that generalization will be any better than a random forest.

It's much easier to limit the depth of the tree and not do the last split, stopping at the previous one. We'll end up with the same less clear sheet than if we did an extra split. Your option will give something in between whether we did a split or not. I.e. for example you will average a sheet at the 7th level of depth with your method. It will be slightly cleaner than the 6th depth level sheet. I think the generalization won't change much from this, and it's a lot of work to test the idea. You can also average several trees with depth levels 6 and 7 - we will get approximately the same as we will get by your method.
 
Aleksey Vyazmikin:

Some kind of action with waves in essence? Incoming data is converted to a polynomial, and then the polynomial is converted to a wave and the waves are somehow "collided/merged"?

Well, yes.

There have been attempts to build analog computers before, but they were either very slow or energy intensive.

Reason: