Machine learning in trading: theory, models, practice and algo-trading - page 1531

 

gumnokod on python today and got this

traine

test

Maybe the tester is wrong, maybe something else, I'll check it tomorrow night. I'm doing for an article (I want to describe my approach), maybe catbust is better to handle this strategy than the forest

 
Maxim Dmitrievsky:

gumnokod on python today and got this

traine

test

Maybe the tester is wrong, maybe something else, I'll check it tomorrow night. I make for article (I want to describe my approach), maybe catbust is better to cope with this strategy than forest

So CatBoost is trained on a test, according to their naming system samples - is there anything wrong?

 
Aleksey Vyazmikin:

So CatBoost is trained on the test, by their sampling naming system - anything wrong?

I mean new data in general, another piece of history. It should be like that on the train, but the test is confusing. Maybe the tester is peeking somewhere.

Forest I learn from 0.6-0.7 error on about the same data and tray shows the same thing, and the test is not always good and almost always worse than tray. Error in the boost is about the same, but the test is very good, it does not happen.

How are you doing, did you get any leaves?
 
Maxim Dmitrievsky:

I mean new data at all, a different piece of history. It should be like that on the Train, but the test is confusing. Maybe the tester is peeking somewhere.

My forest is trained with 0.6-0.7 error on about the same data and shows the same on trail, and the test is not always good and almost always worse than trail. Boost has about the same error, but the test is painfully good, it does not happen

how did you do, did you get any leaves?

Well, then I don't know - without sampling it's hard to know what's wrong.

My results do not make me very happy. Leaves I collected a decent amount, but then comes the question - how best to make them work with each other. The fact that they often overlap each other by 20%-50% or more, and therefore it turns out that give the same signal, which is not very good. The idea is that would group them and do on each group activation threshold - here I think how best to do it.

So far I'm experimenting with mutual filtering - trained tree in R on leaves, but so far only on buying.

The first screenshot without tree, and the second with tree. The period March-September of this year.

Of course the balance curve is not very attractive, but it is a trend strategy, the essence of which is not to lose more in the flat than you earn in the trend, and the market (Si) is flat the whole year with rare strong movements. I can see that adding the tree has improved the relative indicators, I will do the same for sale later, if the result is positive, it will be a part of the model.

The question of the selection of leaves until the end is not solved, even selecting leaves that have shown good results in each of the 5 years can expect that 20%-40% stop working, what is even more sad is the inability to understand whether to turn them off or not - especially by quarters did the test, it turned out that the unprofitable leaves in the past quarter, the subsequent quarters overlap the loss (many).

The method of leaf selection itself seems promising, but the process is extremely slow.

 
Aleksey Vyazmikin:

Well, then I don't know - without sampling it's hard to know what's wrong.

I am not very happy with the results. I have collected a decent amount of leaves, but then there is a question - how to make them work better with each other. The fact that they often overlap each other by 20%-50% or more, and therefore it turns out that give the same signal, which is not very good. The idea is that would group them and do on each group activation threshold - here I think how best to do it.

So far I'm experimenting with mutual filtering - trained tree in R on leaves, but so far only on buying.

The first screenshot without tree, and the second with tree. The period March-September of this year.

Of course the balance curve is not very attractive, but it is a trend strategy, the essence of which is not to lose more in the flat than you earn in the trend, and the market (Si) is flat the whole year with rare strong movements. I can see that adding the tree has improved the relative indicators, I will do the same for sale later, if the result is positive, it will be a part of the model.

The question of the selection of leaves is not solved until the end, even selecting leaves that have shown good results in each of the 5 years can expect that 20%-40% stop working, what is even more sad is the inability to understand whether to turn them off or not - especially by quarters did the test, it turned out that the unprofitable leaves in the past quarter overlap the loss (many) in the subsequent quarters.

The leaf selection method itself seems promising, but the process is extremely slow.

I don't think it's worth a muzen, it's a pity.

There are very few trades, it is not representative. And to filter out unprofitable ones, they are all raonon not needed
 
Maxim Dmitrievsky:

Yeah, you can't make a million on that quickly, it's a pity.

There are very few trades, it is unrepresentative. I would filter the loss-making ones, they are all the same, you don't need them.

Yes, the problem is that for each leaf signal allocated 1 lot, respectively, if a lot of leaves activated, then more lots needed - there are 71 lots, but very rarely, and if you keep the money at 71 lots all the time, then the total would get 25% per annum - GO on the stock exchange is big, and it's Si.

About removing unprofitable - here a stick with two ends - on the one hand it is possible to increase the number of filters, which will reduce the number of losing trades, but at the same time reduce the number of profitable ones, as I wrote above, many leaves during the year become profitable, so it's a matter of time, a matter of market conditions. The problem with this approach is that it is very labor-intensive and there is no possibility to promptly add new predictors for which I have ideas and regularly appear - I am actually working with a sample from February of this year.

Another way to improve the indicators is to work on fixing of profits - now the closing is done mostly by a stop and one condition - in rare cases.

In spite of all the shortcomings, the system has not lost money for almost a year, which may indicate the right direction.

 
Aleksey Vyazmikin:

Yes, the problem is that for each leaf signal allocated 1 lot, respectively, if a lot of leaves activated, then more lots needed - there are 71 lots, but very rarely, and if you keep the money at 71 lots all the time, then the total would get 25% per annum - GO on the stock exchange is big, and it's Si.

About removing unprofitable - here a stick of two ends - on the one hand it is possible to increase the number of filters, which will reduce the number of losing trades, but at the same time reduce the number of profitable ones, as I wrote above, many leaves during the year become profitable, so it's a matter of time, a matter of market conditions. The problem with this approach is that it is very labor-intensive and there is no possibility to promptly add new predictors for which I have ideas and regularly appear - I am actually working with a sample from February of this year.

Another way to improve the indicators is to work on fixing of profits - now the closing is done mostly by a stop and one condition - in rare cases.

In spite of all the shortcomings, the system has not lost for almost a year, which may indicate the right direction.

It is possible to train a 2. model on top of the first, on the same features. However, it will just correct Equity entries, i.e. it will prohibit/allow trades. At least there won't be unnecessary trades.

 
Maxim Dmitrievsky:

It is possible to train a 2. model on top of the first one, using the same features. But only it will just correct entrances by equity, i.e. it will prohibit/allow trading. At least there would not be unnecessary trades.

Or something like a portfolio of system-lists with sliding recalculation.

 
Maxim Dmitrievsky:

It is possible to train a 2. model on top of the first one, using the same features. But only it will simply correct entrances by equity, i.e. it will prohibit/allow trading. At least there won't be unnecessary trades.

I do not quite understand how you suggest changing the target - now I already have 3 targets - buy/sell/don't trade. And, I select sheets to buy / sell, and then to each such sheet looking for a filter of sheets "do not trade", as much as 3 pieces (for some it is enough for some it is not enough - conducted the tests). After results of these pairs - activation leaf + filter leaf, I built a tree based on the final signal, which took into account responses of all leaves and their mutual exclusion, i.e. I obtained an additional filter.

Here is the tree

I want to try model on CB instead of tree, maybe there generalization of all used leaves will be better, though accuracy here also increases on 1%, that certainly is not much, but the result is positive.

 
Aleksey Vyazmikin:

I don't quite understand how you propose to change the target - now, in fact, I already have 3 targets - buy/sell/don't trade. And, I select buy/sell sheets, and then for each of these sheets I look for a filter of "do not trade" sheets, as many as 3 pieces (for some this is enough for some, for some not enough - I conducted tests). After results of these pairs - activation leaf + filter leaf, I built a tree based on the final signal, which took into account responses of all leaves and their mutual exclusion, i.e. I obtained an additional filter.

Here is the tree

I want to try model on CB instead of tree, maybe there generalization of all used leaves will be better.

Well, you should start with the theory. For example, what is the point of selecting models separately for sale and for purchase?

Everything that is not for buying - is for selling, and vice versa. That is why we need the usual binary classifier. Further, if we do it that way, why do we need a separate "do not trade" class when you can simply filter entries via a higher threshold. The "do not trade" class can be given excessive weight by the model, due to which the model error will decrease, and the predictive (generalizing) ability, in general, will fall.

The point of the second model is that the 1st model will have errors of the 1st and 2nd kind - false positive and false negative. We are interested in removing them. For this purpose you feed the same features to the input of the 2nd model and the output of the first model, where 0 - trade was profitable, 1 - trade was unprofitable. Train the second classifier and trade only when it shows 0, i.e. filters the signals of the 1st model. Loss trades will almost disappear on the tray, you have to test it on the test - that's one.

It is possible to train the second model not only on the tray, but also to capture the CB, then it will correct trades on the new data - this is two. And then the tests.

You can break the sample into fouls, and teach the model in checkerboard order, through certain chunks. For example, 5 fouls. The first model is trained at once on 1,3,4 fouls. The second corrective model at 2,5. This would still improve the generalization.