Machine learning in trading: theory, models, practice and algo-trading - page 1614

 
Aleksey Nikolayev:

If, for example, all strategies do only BUY, then probably everything will come down to minimization of their intersection in time (minimization of correlations, which are always positive). If BUY and SELL are allowed, then time crossovers can be useful to mutually compensate for bad parts of strategies (negative correlations are usually good for the portfolio).

I guess the correlation can be determined simply through the timing of the strategies and the timing of their crossovers.

In general, I agree about the mutual compensation of differently directed signals, but for that in my case I need to apply different strategies and make a markup for each - that is another song, but I plan to apply that as well.

And to find similar strategies in order to select them out of a group or to divide the risk (lot), I will need to consider not only entry and exit times but also entry direction. I should think how to do it better.

 
Aleksey Vyazmikin:

What do you mean it does it perfectly? I don't do it by hand, I write a script that does it the way I see it now.

In my case, the ideal would be a separate evaluation of each variant of the predictor value. And, I want to merge the activation ranges of a predictor, reinforcing one target, into one predictor, which the meshes I know don't do by dividing the ranges sequentially, similar to merging the predictor in the ranking to spar (by building a node) with another predictor. So far in theory.


What kind of partitioning grid would aggregate the ranges in a single split, eliminating the intermediate values, as in the figure below values 1 and 4?


Perfect - in the sense of perfectly accurate according to the split evaluation function. It will evaluate thousands of options and remember the best one, it will become a node.

It is easiest to train 24 standard forests/busts, each feeding predictors of the corresponding hour.

 
elibrarius:

Perfect - in the sense of perfectly accurate according to the division evaluation function. It will evaluate thousands of options and remember the best one, and it will become a node.

It is clear that according to the algorithm, but what is the right algorithm? CatBoost alone has 3 algorithms to build the mesh.

elibrarius:

The easiest way is to train 24 standard forests/busts, each feeding predictors of the corresponding hour.

This will reduce the sample by roughly 24 times (and my sample is small as it is), and then, observing the greed principle of building trees (not always true, as it turned out from my experiments with splitting trees) we will select only those predictors for branching the tree, which exactly in a particular hour statistically had the best probability, But, in my opinion, it is necessary to find the predictors and put them into the tree, those which independently of other conditions gave the advantage on the whole sample, then we will get not a fitting for a concrete hour of the day (conditionally more precise description of one event for activation), but an accumulation of independent probabilities in one leaf.

 
Aleksey Vyazmikin:

This is an algorithm that checks the range of predictor values for predictive ability and tries to partition the parts of the range so that they better highlight the predictive ability. Let's say there is a sample with 3 targets, the sample is distributed as 1 - 24%, 2 - 50%, 3 - 26%, and there is some predictor with a range of values, so the goal of the grid is to find areas of predictor values where, say target 1 will "predict" more than 24%, and the split will highlight that area. There are different variants of algorithms for building such grids.

It's an interesting idea, but isn't it easier to do it this way?

p.1) We have some kind of machine learning algorithm. MO

p.2) There is a sample that is divided into a train and a test.

p.3) there is a price that is clustered according to some principle (time, graphical pattern, all together, something else ... ...) (it may be perceived as a market state or just a cluster)

pp. there should be many or very many clusters


algorithm of actions :

1) Teach MO on a tray

2) predict the test by a semantic model

3. on the test we determine the points which the model predicted without errors, we will call them ht (good point)

4) each ht corresponds to some cluster from step 3 above...

That's it, now we know what clusters (states) of the market the model trades well... Clusters (states) are analogue of your grid, so we try to differentiate through clusters (states) what we can forecast and what we can't...


But both your approach and what I suggested have conceptual defects which should be eliminated first. To be more exact, the problem is not even in the approach, but in the way the information is represented.

 
mytarmailS:


p.3) there is a price, which is clustered according to some principle (time, graphic pattern, all together, something else ... ...) (it can be perceived as a market condition or just a cluster)

pp. there should be many or very many clusters

I don't understand, do you want to take the naked price of the markup points and cluster them or what?


mytarmailS:


3) during a test we single out those points which the model predicted without errors, let's call them HT (good point)

4) each ht corresponds to a cluster from step 3 above...

That's it, now we know what clusters (states) of the market the model trades well... Clusters (states) are like an analogue of your grid, i.e. we try to differentiate through clusters (states) what we can predict and what we can't...

The idea is interesting, but its principle is not related to what I suggested or I do not fully understand it. We found out that the model activated a certain percentage of clusters, and then what to do with it? I understand that you have to look at which leaf (if we are talking about a single tree) how many clusters activated, and if a leaf significantly more often activates one cluster, it will only say that it has learned to identify it. Here it could be that most of the leaves activate evenly correctly on different clusters, which would indicate randomness, apparently. Again, you have to be sure of the clustering algorithm - make sure that the result is unique clusters and not many similar ones...

 
Aleksey Vyazmikin:

It is clear that according to the algorithm, but what is the right algorithm? Only CatBoost has 3 algorithms to build the grid.

This will reduce the sample by roughly 24 times (and my sample is already small), and then, observing the principle of greedy tree building (not always true, as it turned out from my experiments with splitting trees) we will select only those predictors for branching tree, which is at a particular hour statistically had the best probability, and you need, in my opinion, to find signs and put them in the tree, those that, regardless of other conditions gave an advantage in the entire sample, then you get no adjustment for a particular hour of the day (conditional

What difference does it make where you reduce the sample? Outside, by making 24 forests, or inside, for example by adding the first 24 nodes with a split clock? From those 24 nodes, each remaining branch will take 1/24th of the sample.

 
By the way, what I don't like about boostings is that the recommended tree depth is 7-10.
That is, if we have 100 predictors, and the division there also starts in the middle of each predictor. Then with a high probability we will have 7 different predictors divided in the middle. Maybe 1 or 2 will divide to a quarter, hardly any smaller.
Or in boosting algorithms, the algorithm doesn't work by half division, but in smaller chunks? Does anyone know?
And who uses what tree depth?
 
elibrarius:

What difference does it make where you reduce the sampling? Outside, by making 24 forests, or inside, like adding the first 24 nodes with a split clock? From those 24 nodes, every remaining branch will take 1/24th of the sample.

It's not about reduction, it's about the statistics of predictor behavior on the sample outside the split - this should reduce the randomness of selecting predictor value.

By the way, does AlgLib do the grid on every split or once and then use that grid? As I understand it, the developers of CatBoost claim that the grid is done once by them.

 
Aleksey Vyazmikin:

I don't get it, are you suggesting to take the bare price of the target markup points and cluster them or what?

The target is yours, any target... I'm a little fuzzy on that....

you only need clusters for one target:


Here we found HTs on the test new ones, and accepted them as good...

Now on the new data we need to find this TX to apply the model to it, since the model works well only on XT, and how do we recognize it on the new data? as an option by the cluster number

 
Aleksey Vyazmikin:

In general, I agree about the mutual compensation of differently directed signals, but for this in my case it is necessary to apply different strategies, and to make a markup for each - that is another song, but I plan to apply it.

And to find similar strategies in order to select them out of a group or to divide the risk (lot), I will need to consider not only entry and exit times but also entry direction. We should think over the best way to do this.

I will bring the idea to some logical conclusion. Suppose we have a set of systems on one asset. Each system, when in the market, holds a position of a fixed volume, but the direction can vary. The yields and volatilities of the strategies are known. Let us define the correlation between the strategies by the formula (t1-t2)/sqrt(T1*T2), where T1 and T2 are the duration of their time in the market and t1 and t2 are the duration of time when these strategies are simultaneously in the market and directed in the same and opposite directions respectively. This is a simplified formula derived under the assumption of price proximity to the SB. Now there is all the data to apply Markowitz theory to find an optimal portfolio.

Obviously, we will not get any meaningful portfolios this way (at least, because only one asset is used). We need some modifications.

1) To change the optimization algorithm (parameter limits, penalties). Clarify the definition of the correlation between strategies.

2) Optimize the portfolio already at the moment of strategy creation. That is, look for strategies based on the portfolio optimality condition. It is not quite clear how this can be formalized in a practically applicable way, but the approach seems more logical in general. Although, as you have already written, the algorithms will need to be rewritten, etc. etc. Not sure that it's worth the trouble.