Machine learning in trading: theory, models, practice and algo-trading - page 2208
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
that link I gave on the wiki about the semi. I understood the marks are the edges of the stable sections.
ZZ does not go, because marking just goes without differences of sites, and learning goes the same way, and if by labels ZZ it is like too many examples with different features and the result of learning can not be good.
Labels are known target\classes. The rest of the data without them, just in the form of traits
These labels are supposed to carry some meaning. For example, labels that are cats or crocodiles
in our case we have no idea where the cats are. I.e. we don't know any patterns and how they differ, this makes it even more difficult
so the initial marking of labels can bruteforce, go through the variants
are just the known target classes. The rest of the data without them.
It's like setting the right search direction.))
These labels are supposed to carry some meaning. For example, tags that it's cats or crocodiles
In our case, we have no idea where the cats are. That is, we do not know any patterns and how they differ, this further complicates the task
so we can bruteforce the initial markings, go through the variants.
Full bruteforcing is always better than incomplete bruteforcing. The point about not full correct markup has always been there. And the curse of dimensionality is only solved by getting the search direction right. It's a question of finding / determining a good area to search for variants.
I tried to extend the idea of acceptability of small samples for GMM. Train 6 months, test 5 years. I split tags into n parts of fixed size. For each part I created my own GMM model, generated 1000 samples from each, piled them up and trained the catbust. I picked up the features and got like this:
Second version, same labels, with the same partitioning, but with pre-mixing:
X = X.sample(frac=1.0)
One fixed target was used in both cases. I can reproduce this experiment if you wish. I am not strong in the interpretation of such phenomena, perhaps there is an explanation.
I tried to extend the idea of acceptability of small samples for GMM. Train 6 months, test 5 years. I split tags into n parts of fixed size. For each part I created my own GMM model, generated 1000 samples from each, piled them up and trained the catbust. I picked up the features and got like this:
Second version, same labels, with the same partitioning, but with pre-mixing:
One fixed target was used in both cases. I can reproduce this experiment if you wish. I am not strong in the interpretation of such phenomena, perhaps there is an explanation.
Sorry, guys, there's a question.
What is the number of weighting ratios in your grids, and how many trades are being trained on?
I want to understand the relationship between those numbers and speculate on the dependence of overtraining on that relationship. Thank you.
Is this mixing before the gmm or before the boost? You need to check the class balance for traine/test. Maybe zeros went to traine and ones to test. You could also try separate clustering by buy and sell marks.
Mixing is done before creating GMM.
Before that I drop labels by condition:
this always brings the class balance to 1/1 with slight variations:
In this case, 115 tags were shuffled, and divided into 4 parts. Then 4 GMM models were created based on them. From each, 1000 labels are sampled and they are combined into one dataframe. After that it was split into trace and test.
The balance of samples classes in this case was a little different from the ideal. But the samples of the train and the test had approximately the same ratio
Below are the simulation results with the same sample of 115 tags split into 4 parts, but without mixing. The balance of classes, of course, a little better, but I think it does not significantly affect the result.
This may sound silly, but it seems to me that there is some kind of time correlation in the series that GMM models find in different parts of the series. It disappears if you break the ordering by shuffling the row.
I didn't think about separate clustering, I'll try it tonight.
Stirring is done before creating the GMM.
Before that I drop labels by condition:
this always brings the class balance to 1/1 with slight variations:
In this case, 115 labels were shuffled, and divided into 4 parts. Then 4 GMM models were created based on them. From each, 1000 labels are sampled and they are combined into one dataframe. After that it was split into trace and test.
The balance of samples classes in this case was a little different from the ideal. But the samples of the train and the test had approximately the same ratio
Below are the simulation results with the same sample of 115 tags split into 4 parts, but without mixing. The balance of classes, of course, a little better, but I think it does not significantly affect the result.
This may sound silly, but it seems to me that there is some kind of time correlation in the series that GMM models find in different parts of the series. It disappears if you break the ordering by shuffling the row.
I hadn't thought about separate clustering, I'll try it tonight.
I'll have to draw it, it's not very clear... Well, the fact that the distributions are different in both cases. Plus you have already removed the serialization. Most likely the distributions are heavily uninformative, and the new points after sampling start to lie in the middle of nowhere. I.e. the information in the series is lost, yes, since the quotes are not independent.
Or do on some simple example (not quotes) and compare then.
I'll have to draw, it's not very clear... Well, the fact that the distributions are different in both cases is a fact. Plus you have already removed the seriality. Most likely, the distributions are very uninformative, and the new points after sampling start to lie in an incomprehensible place. I.e. the information in the series is lost, yes, since the quotes are not independent.
Or do it on some simple example (not quotes) and compare then.
Maxim, hi. It's been a long time since I came here... I may be interested in your article) I have installed Python) I should understand it and have lots of questions))) I take it MARKUP is a spread? Markup is a simple comparison of current value with the current + a random number, depending on the sign > or < you put a markup 1 or 0. right? For the test, you set markup=0.0? (if the tray MARKUP=0.00001)) right?
Maxim, hi. It's been a long time since I came here... It's about your last article) I have installed Python) I'm trying to understand it, and I have a lot of questions))) I take it MARKUP is a spread? Markup is a simple comparison of current value with the current + a random number, depending on the sign > or < you put a markup 1 or 0. right? For a test, you put markup=0.0? ( I think it's MARKUP=0.00001)) right?
Hi. Yes, that's right. The same markup is used in the tester. About articles is probably better to ask in the articles. That in one place was
I analyze the feedback and see what can be improved