Machine learning in trading: theory, models, practice and algo-trading - page 1613

 
Aleksey Mavrin:

What arguments, if you do that, then either you do not understand something, or I do.

The essence of my surprise is this - a trained model, we are talking about them here, must be trained on the raw data.

If the input data is correlated, you must combine it with uncorrelated.

Here is an example - we teach the model to classify the color shade by 3 digits - RGB. Three digits, that's the pure raw data!!! With your approach, you have to make sort of predictors:

1- R 2-G 3-B - 4 More red 5 -More green 6- More red than green and blue together .... 100500 Not as red as it would be if green was as red as blue. ))

Doesn't the model have to learn itself?, it has the raw data and that's what it's for!


And you do the opposite - multiply the original data, correlated with each other.

Maybe I'm wrong, but it seems to me that you can get a clear pattern only by dividing signs into elementary parts (such as logging rules). Let's go back to the same example with the candles.

We have 45 possibilities that if the pure and only pattern is

open[-1]<low

and that's it!, there's nothing else in those 45 choices. I've done the enumeration, picked one rule (one fix) and I'm using it.

You propose to take the "original series" because "the network will find itself" as I understand in your case it would be :

open[1:2] ; high[1:2] ; low[1:2] ; close[1:2]

So the bottom line is,

I have one thing, aclean pattern in the form of one rule, one value, no noise

You have four rows (OHLC) with two values, total 8 values, + noise

Question: Who has a more redundant and correlated sample?

 
mytarmailS:


And you are doing the opposite - generating initial data, correlated with each other.

Maybe I'm wrong, but it seems to me that you can pull out a pure pattern only by dividing the signs into elementary parts (such as log rules). Let's go back to the same example with the candles.

We have 45 possibilities that if the pure and only pattern is

and that's it!, there's nothing else in those 45 choices. I've done the enumeration, picked one rule (one fix) and I'm using it.

You propose to take the "original series" because "the network will find itself" as I understand in your case it would be :

So the bottom line is,

I have one thing, aclean pattern in the form of one rule, one value, no noise

You have four rows (OHLC) with two values, total 8 values, + noise

Question : who has more redundant and correlated sampling ?

1. Most likely you are wrong.

2.3 This does not happen, because the series is non-stationary. You just fitted the model to the series, it is not even clear what MO has to do with it at all, if you supposedly "isolated a pure pattern",. If there was such pure regularity, the MO would not even invent, it is found by other elementary methods.

4. If you as you say singled out one feature - you have 100% over-trained=under-trained, 100% blind, not seeing anything, and therefore 100% "dumb" model, sorry to be blunt)

S.S. Long live MO to the masses! )))

 
Aleksey Mavrin:

1. You're probably wrong.

Ok, most likely you're right, but I think you understand that all this description with one rule was just an example, for more clear expression of thought, of course we should do an ensemble of rules... And it's interesting what is better, an ensemble of 100 hard (statistically) rules or an ensemble of 3000 weaker (probabilistic) rules. I think if we solve the problem straightforwardly, i.e. if we train on input data, the second variant is better due to the same non-stationarity, but if we want to build a market model with stationary properties, the first variant is probably better... I won't argue anymore, you convinced me more than not...

 
Aleksey Vyazmikin:

I am not very happy with the results. I have collected a decent amount of leaves, but then there is a question - how to make them work better with each other. The thing is that often they overlap each other by 20%-50% or more, and therefore it turns out that they give the same signal, which is not very good. The idea is that would group them together and do on each group activation threshold - so I think, how best to do this.

The question of selection of leaves until the end is not solved, even selecting leaves that have shown good results in each of the 5 years can expect that 20%-40% stop working, which is even more sad, is the inability to understand whether to turn them off or not - especially by quarters did the test, it turned out that the loss leaves in the past quarter in the subsequent quarters offset the loss (many).

The leaf selection method itself seems promising, but the process is extremely slow.

A bit of necroposting to ask - why can't you initially build a tree based on the optimality condition of a portfolio of its leaves (roughly like Markowitz's theory)? Perhaps this has already been discussed somewhere, but I haven't seen it.

 
Aleksey Nikolayev:

A bit of necroposting to ask - why can't you initially build a tree based on the optimality condition of a portfolio of its leaves (roughly like Markowitz's theory)? Perhaps this has already been discussed somewhere, but I haven't seen it.

I've already written many times that the available algorithms for building MO models are not suitable for trading, for the reason that they do not take into account the nuances of noisy time series. This is evident, for example, when using a predictor value for splitting, which gives an aggregate preferential distribution of the probability of correct classification over the entire sample, but this distribution can only be caused by a rare phenomenon that is clustered in one part of the sample. I examined the sampled leaves for activation frequency and it became obvious to me.

So yes - you can build initially what you need, but you need to change the learning algorithm for that (I don't have enough competence in programming here), or estimate randomness using different methods, which is what I do. Although I do not understand what is meant by"optimal portfolio conditions".

There is also an option - to select the ranges of predictor values that improve the shift of probability of classification of targets relative to the entire sample, and make separate predictors for them - this idea I am implementing right now, but I do not know what the result will be.

By the way, I don't remember that there was a discussion of predictor grid splitting into ranges for further use in building tree models and, it seems to me, this topic has important aspects for discussion and directly affects model building and hence the final result.

 
Aleksey Vyazmikin:

I've already written many times that the available algorithms for building MO models are not suitable for trading

Maybe you want to say: The standard types of information representation for MOs are not suitable for trading... It's not the IO's fault.

Aleksey Vyazmikin:


By the way, I don't recall there being any discussion of a prediction grid dividing a predictor into ranges for further use in building tree models

What do you mean by a partitioning grid?

 
Aleksey Vyazmikin:

Although I do not understand what is meant by"the conditions of optimality of the portfolio.

Maximization of portfolio profitability at a fixed (acceptable) level of risk (volatility or drawdown).

Apparently yes, the algorithms will have to change. Many correlations between equities of different leaves will have to be calculated and it may turn out to be quite expensive in terms of time.

I just thought that such a topic might have been discussed on the forum before.

 
Aleksey Vyazmikin:

By the way, I don't recall there being any discussion of a grid of predictor ranges for later use in building tree models, and it seems to me that this topic has important aspects to discuss and directly affects model building, and thus the final result.

The tree does just that, it takes a different range from each predictor and checks which one is better.

First divides in half, the best half in half again, the best quarter in half again, etc., and so on with each predictor. The node becomes the best division out of all those slices over all the predictors.
Do you do this manually? The algorithm does it perfectly and quickly.

Aleksey Vyazmikin:

But this distribution can only be caused by a rare phenomenon that is clustered in one part of the sample. I examined the sampled leaves for activation frequency and it became obvious to me.

It is necessary to look for predictors by which this rare phenomenon can be detected. If there are predictors, then the simplest standard models will find everything.

 
mytarmailS:

You probably meant that the standard types of information representation for MOs are not suitable for trading... It's not the fault of MOs.)

I said what I wanted to say - there are many nuances that are not taken into account in training by common methods of model building. The problem can be solved by refining these methods, the selection of performance results and additional training of predictors, maybe there are other options, but so far I do not know them.


mytarmailS:

What does the partitioning grid mean ?

It is an algorithm that checks the range of predictor values for predictive ability and tries to partition the portions of the range so that they better highlight predictive ability. Let's say there is a sample with 3 targets, the sample is distributed as 1 - 24%, 2 - 50%, 3 - 26%, and there is some predictor with a range of values, so the goal of the grid is to find areas of predictor values where, say target 1 will be "predicted" by more than 24%, and the split will highlight that area. There are different variants of algorithms for building such grids.


Aleksey Nikolayev:

Maximization of the portfolio profitability at a fixed (acceptable) level of risk (volatility or drawdown).

Apparently yes, the algorithms will have to be changed. Many correlations between equities of different leaves will have to be calculated and it may turn out to be quite expensive in terms of time.

It just occurred to me that a similar topic might have been discussed on the forum before.

Now the correlation of activations is taken into account, the conditionally correlated leaves go into the same group, and that is how a portfolio is created. But the portfolio has one underlying strategy, and you have to do a lot of them for stability. Strategies simply should not overlap in activation on the time interval, if the same predictors are used. In general, this is a realistic thing to do.


elibrarius:

The tree does exactly that, it takes a different range from each predictor and checks which one is better.

First splits in half, the best half further in half, the best quarter further in half, etc., and so on with each predictor. The node becomes the best division out of all those pieces across all the predictors.
Do you do this manually? The algorithm does it perfectly and quickly.

You have to look for predictors by which you can detect this rare phenomenon. If there are predictors, then the simplest standard MO models will find everything.

What do you mean it does it perfectly? I don't do it manually, I write a script that will do it the way I see it now.

In my case, the ideal would be a separate evaluation of each predictor value variant. And, I want to merge the activation ranges of a predictor, reinforcing one target, into one predictor, which the meshes I know don't do by dividing the ranges sequentially, similar to merging the predictor in the ranking to spar (by building a node) with another predictor. So far in theory.

The figure below shows the usual predictor by time - literally recording the clock, I filtered the activations of the predictor less than 10% deviation of each target of the entire sample. It turns out that hours 18 and 19 are favorable for the target Minus and hour 15 is not favorable, the output I get is a new predictor with a sample value of 1 - combining the predictor values of 18 and 19 , -1 - value of 15 and 0 - all other values.

What kind of partitioning grid would aggregate the split ranges into a single split, eliminating the intermediate values, as in the figure below, values 1 and 4?


 
Aleksey Vyazmikin:

Now the correlation of activations is taken into account, the conditionally correlated sheets go into one group, and that's how a portfolio is created. But the portfolio has one basic strategy, and you have to do a lot of them for stability. Strategies simply should not overlap in activation on the time interval, if the same predictors are used. In general, it is realistic to do this.

If, for example, all strategies do only BUY, then everything will probably boil down to minimizing their intersection by time (minimizing correlations that are always positive). If BUY and SELL are allowed, then time-crossings can be useful to mutually compensate for bad parts of strategies (negative correlations are usually good for the portfolio).

Probably, the correlation can be determined simply through the running time of the strategies and the time of their crossover.