Machine learning in trading: theory, models, practice and algo-trading - page 3023
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Justified above that you can't dismiss model errors.
I would like to change my opinion.
But to do that.
Evaluation of the initial model on and off the training selection
Estimation of a "cleaned" model outside the training selection that does NOT match the previous two models
Can we?
I've added a couple of screens above
algorithms for separating grain from chaff in this way can be different, I show you how I do it.
to the left of the dotted line is the OOS, which does not participate in the training in any way
learning on simple signs like increments
the yellow curve is the chart of quotes itself, don't look at it. But you can use it to understand in which situations the model works better/worseIf you throw away a lot of errors at once, the teacher degenerates (there may be a lot of errors and zero grains left), so throw away gradually at each iteration
and the error on OOS decreases gradually, in this case r^2 increases
in essence, this is an analogue of bestinterval from fxsaber, only here the TS is prepared at onceI think it's all listed. This is a book by Jeremy Howard, founder of Kaggle and framevoc fast.ai.
Fast.ai.
Book in original
Book in Russian
Free version
If you throw away a lot of errors at once, the teacher degenerates (there may be a lot of errors and zero grains left), so throw away gradually at each iteration
and the OOS error decreases gradually, in this case r^2 increases.
in essence it is an analogue of bestinterval from fxsaber, only here the TS is prepared at onceTo me there is an over fitting to the quote.
Where is "Out of sample"?
To me there is a super fit to the quote.
Where's "Out of sample"?
It's not funny anymore.
I propose to make this hat with a tree in python with a choice of leaves, in kolab, you can put your datasets there.
if you have any ideas what is better/worse, rules to take only the best or through some filters, suggest it.
I want to compare, having run one dataset through both approaches. Then we'll understand what's what :)
This is easily automated and works without human intervention
a similar algorithm was shown in the last article.
In essence, it is filtering of model errors and putting them in a separate class "do not trade", better through a second model, which learns to separate grain from chaff
and only the grains remain in the first model
It's the same as with tree rules, but from the side. But rules should be plundered and compared with each other, and there is already a refined TS at the output.
For example, the first iteration of selecting grains from chaff (to the left of the vertical dotted line - OOS):
And here is the 10th:
Yes, the point is the same - work ultimately with data that better describes the predictors.
How to do this most efficiently is still an open question - each method has pros and cons.
I think it's all listed. This is a book by Jeremy Howard, founder of Kaggle and framevoc fast.ai.
Fast.ai.
The book in the original
Book in Russian
Free version
Thanks! I'll have to look for a free one in Russian - the translator sometimes makes pearls and tells me about brine, which can be useful :)
I propose to make this hat in python with a tree with a choice of leaves, in kolab, you can slip your datasets into it
if you have any ideas what is better/worse, rules to take only the best or through some filters, suggest it
I want to compare, having run one dataset through both approaches. Then we'll understand what's what :)
Interesting idea!
First of all, we need to understand which tree implementation will allow to easily pull out the rules of the leaf to work with them further.
Then the way of building the tree - greedy or genetic. I checked the leaves of trees of all populations (if I didn't get it right :))
Of course, you can use forest instead of genetics, but then you need more trees to search for leaves, and you need to pruning to the percentage of examples in the leaf from the whole sample. Forest trees may be faster than genetics and they will obviously have fewer settings.
The process of generating new leaves should be carried out until the required (specified) number of selected leaves is reached.
In this case, before building the tree, it is necessary to ensure generation of a random subsample of two types of samples - the first is the selection of N parts from continuous uniform intervals of the specified size as a percentage of the training sample, the second is a completely randomly obtained subsample.
A random set of predictors used to construct the tree.
On preprocessing for all data - more thought needs to be given.
Criteria for evaluating leaves - also can be added later, but the gist is the metrics have a set threshold. I don't know what metrics you have and I don't remember what I used - I need to parse the code. You can take balance, expectation matrix, and recovery factor.
The estimation should take place at each interval of the whole training sample, the number of intervals is set. If the required criterion is not reached at any interval, the leaf is archived or discarded. I kept a database of leaves, removing duplicates so as not to check them again.
After selecting leaves, they should be grouped by similarity, maybe rank correlation does it correctly. Then distribute weights within the group and decide on the voting rules for the groups. However, maybe that's already a lot, and it's worth at least learning how to select leaves so far that will be effective a new data.
I'm not quite sure which sample you want to do the experiment on - the one I'll give or the one that will be created randomly?
In any case, to compare methods, the sample should be the same and for a large time interval, which will allow to take into account if not cyclicality, then the trends of different market phases on large TFs.
Let me say at once that the method I used is very slow. Perhaps it is better to do the process of leaf evaluation in MQL5 - it will allow to distribute the load on the cores.