Machine learning in trading: theory, models, practice and algo-trading - page 3604
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Imho, one should also want to make a normal forum on MO in trading ))
Want to do, but do not want to do )
If you don't make a big forward, it works like candy. Then retrain, taking into account the new data.
Any signs, but preferably close to the prices. You don't need a lot, 10 is enough. It is desirable to normalise for clustering, if the values are very different.
It is desirable to make a dense markup, that is, for each change of signs there should be a mark. But you can do it on a sparse dataset. Labels should correspond to profitable deals in most cases. Otherwise, there will be an opposite effect.Apparently nobody reads, only flud in the thread.
I think I already wrote then that I have something similar....
The more subset_size
What is this parameter responsible for?
Parameters can be selected.
So when you search, you are sure to find a good result - you should always find a good result with any parameters :).
Then retrain, taking into account new data.
How to understand what doesn't work anymore is an eternal question.
You don't need a lot, 10 is enough.
I was planning to make a variant of selecting random attributes from my assortment - until I did it, I didn't touch python at all for almost a month.
I had a task to find not the average, but the average-maximum spread - the maximum spread within a minute was taken, and then they were averaged for each minute of the day.
In the highlighted place I substituted the time-weighted average spread of the minute. Spikes at 15:15, 15:30 and 17:00.
I think I posted at the time that I have something similar.....
My code fixes labels, your code does what?
Now we'll start going round in circles again, because we're looking for analogies where there are none :)How to realise what's no longer working is an age-old question.
It is very simple to understand - it stops working :)
The idea of the approach has already been described in the links. Dataset is grouped into similar clusters (patterns, if you like) by attributes. And labels for a given (subset_size) number of clusters are fixed, i.e. all labels become 1 or 0, depending on what is more in the cluster. This removes ambiguity for the final model, it stops overtraining for noise and making unnecessary splits.
In the sorted list of clusters, by "probability bias", the most biased clusters are at the very beginning. These are corrected first to become fully unambiguous for subsequent model training. The others, which are in the tail and whose probability is close to 0.5, are not touched in any way and continue to introduce noise into the model.
By varying the number of clusters and subset_size, we find a balance between good clusters and bad clusters that satisfies the user.
The work of the function is transparent in the sense that it gives a proven expected result: the more clusters are corrected, the more stable the model is, but less "beautiful" and vice versa. Therefore, an additional setting is added to adjust it.
As a result, such a small f-ya does almost all the work of searching for stable patterns in the data and improving the model. If there are no stable patterns, there will always be a predictable (bad) result even on the traine, whereas without this function the model would have retrained and shown the grail on the traine.
It's very simple to understand - it stops working :)
The idea of the approach has already been described in the links. Dataset is grouped into similar clusters (patterns, if you like) by attributes. And labels for a given (subset_size) number of clusters are fixed, i.e. all labels become 1 or 0, depending on what is more in the cluster. This removes ambiguity for the model, it stops overtraining for noise and making unnecessary splits.
In the sorted list of clusters, by "probability bias", the most biased clusters are at the very beginning. These are corrected first to become fully unambiguous for subsequent model training. The others, which are in the tail and whose probability is close to 0.5, are not touched in any way and continue to introduce noise into the model.
By varying the number of clusters and subset_size, we find a balance between good clusters and bad clusters that satisfies the user.
In the end, such a small f-ya does almost all the work of finding stable patterns in the data and improving the model.
My code fixes the labels, your code does what?
Your code removes rows from the sample that belong to clusters that have a mean of (as you write 0.5) units.
My code does the same thing, in brief:
1. Open the sample train
2. We do clustering according to the tree principle, i.e. sequentially going deeper into each cluster.
3. Evaluate the metrics of each cluster on all samples.
4. Select clusters from the train sample where there is a bias in the probability of meeting the target "1" greater than 5%.
5. We form a new sample from the selected clusters.
But, I haven't tried to use so many clusters....
There is no special stability after that, there is improvement if well partitioned by random clustering, but without peeking into new data it is not guaranteed.
And labels for a given (subset_size) number of clusters are fixed, i.e. all labels become 1 or 0, depending on what is more in the cluster. This removes ambiguity for the final model, it stops overtraining for noise and making unnecessary splits.
That's something I haven't done. But it's essentially binarisation. Again, if the probabilities are preserved on new data - the effect will be there, but if not, it's fucked.
I get a similar effect through quantisation.