Machine learning in trading: theory, models, practice and algo-trading - page 58
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
I tried to classify the zigzag, yes, but not the pivot point, but the whole trend the zigzag shows, the result is 0 if the current zz trend is going down, and 1 if the trend is going up. The zz trends look pretty unbalanced, but that's not why I got away from them. What I didn't like is that the model needs a very high precision. If the model makes a mistake or two in the trend and turns the trade at the wrong time, even if just for one bar, it usually leads to additional losses plus paying a commission each time with the spread. The model will be profitable only if it will open a trade, wait for the end of the trend, and reverse. Without a single error within each trend.
If it predicts the next bar rather than the trend, each error will result in less money lost.
I don't do balancing, the scatter of classes is minimal when predicting the next bar, I don't think that +-10% of any one class will greatly affect the result.
This is where they write in the article that balancing can be replaced by correct model estimation (F-measure or R-Precision). This is the Russian analogue of the article that SanSanych linked earlier.
http://bazhenov.me/blog/2012/07/21/classification-performance-evaluation.html
...
Nevertheless, this metric [accuracy] has one thing to consider. It gives equal weight to all documents, which might be incorrect if the distribution of documents in the training set is strongly skewed towards one or more classes. In this case, the classifier has more information on these classes and, respectively, within these classes it will make more adequate decisions. In practice, this leads to a situation where you have an accuracy of, say, 80%, but within the framework of some particular class, the classifier works out of all proportion not defining even a third of the documents correctly.
One way out of this situation is to train the classifier on a specially prepared, balanced corpus of documents. The disadvantage of this solution is that you take away information from the classifier about the relative frequency of documents. This information, all other things being equal, can be very helpful in making the right decision.
Another way out is to change the approach to formal quality assessment.
Accuracy and completeness
Precision and recall are the metrics that are used when evaluating most information extraction algorithms. Sometimes they are used by themselves, sometimes as a basis for derived metrics such as F-measure or R-Precision. The essence of accuracy and completeness is very simple.
System accuracy within a class is the proportion of documents that actually belong to that class relative to all documents that the system has assigned to that class. Completeness is the proportion of documents found by the classifier which belong to the class with respect to all documents of this class in the test sample.
....
F-measure
It is clear that the higher the accuracy and completeness, the better. But in real life maximum accuracy and completeness are not achievable simultaneously and we have to look for a balance. That is why we want to have some kind of metric which combines information about accuracy and completeness of our algorithm. In this case it will be easier for us to decide which implementation to launch in production (the one who has more is better). F-measure1 is exactly such metric.
F-measure isa harmonic meanbetween accuracy and completeness. It tends to zero if accuracy or completeness tends to zero.
etc., there are various beautiful graphs in the article
I tried to classify the zigzag, yes, but not the pivot point, but the whole trend the zigzag shows, the result is 0 if the current zz trend is going down, and 1 if the trend is going up. The zz trends look pretty unbalanced, but that's not why I got away from them. What I didn't like is that the model needs a very high precision. If the model makes a couple of mistakes in the trend and turns the trade at the wrong time, even if just for one bar, it usually leads to additional losses plus paying a commission each time with the spread. The model will be profitable only if it will open a trade, wait for the end of the trend, and reverse. Without a single error within each trend.
If it predicts the next bar rather than the trend, each error will result in less money lost.
I don't do balancing, for the next bar, the spread of classes is minimal and I don't think that +-10% of one class will strongly affect the result.
This is where they write in the article that balancing can be replaced by correct model estimation (F-measure or R-Precision). This is the Russian analogue of the article that SanSanych linked earlier.
http://bazhenov.me/blog/2012/07/21/classification-performance-evaluation.html
etc., there are various beautiful charts in the article
I have a question for Yuri. When figuring out the results of a trinary model, when I manually enter the data, the results sometimes show a dash symbol. That is 0, there is a 1 and a dash. Is that what a dash means?
Same as the famous Socratic phrase "I know what I do not know. The ternary classifier, answering with a minus, says that the training sample had no examples similar to the pattern being classified, so he cannot attribute it unambiguously to any class, i.e. he cannot give an affirmative answer to the pattern being presented. He honestly admits his lack of proper competence in some areas of knowledge, rather than trying to answer positively with a cheeky face to questions to which he does not know the answers.
It is the same as the famous Socratic phrase "I know what I don't know. The ternary classifier, answering with a minus, reports that the training sample had no examples similar to the pattern being classified, so it cannot unambiguously assign it to any class, i.e. it cannot give an affirmative answer to the pattern presented.
Hmm. Well, I see... Tell me whether there is a possibility in the foreseeable future to upload a ternary model to a file, so that you can use it later in MKUL? The same as the binary, and when you enter it by hand, there is a chance to make a mistake and all that.....
The same thing that Socrates' famous phrase "I know what I don't know" means. The ternary classifier, answering with a minus, says that in the training sample there were no examples similar to the pattern being classified, so he cannot attribute it unambiguously to any class, i.e. he cannot give an affirmative answer to the pattern being presented. Honestly admits his lack of proper competence in some areas of knowledge, and does not try to answer affirmatively with a smug face to questions to which he does not know the answers.
Judging by the attached picture, do I understand the point correctly? On the left is a binary classifier; on the right is a ternary classifier (the white zone is "minus")
If so, then I think the idea is good, for some reason I've never seen it before, can you please advise some articles on the ternary classifier?
Finished this later:
Intuitively, this task is pretty simple. Suppose there are 2 predictors (X and Y), that means we need to work in 2-dimensional space (like on the pictures above). Then we need to enclose such a 2-dimensional space that includes all the "buy" classes (blue fill). Then, enclose a second space that includes all the "sell" classes (red). The two fenced spaces must not overlap. To classify new data, just look at which fenced space the point you're looking for falls into. If it doesn't get anywhere (white on the right picture) - then it is clear that the model cannot tell anything about that point and should not trade at the moment.
With 3 predictors there will be a 3-dimensional space where classes will be enclosed by some three-dimensional volumetric shapes. Etc, the more predictors, the more multidimensional the figures.
Do such models exist? Usually classifiers find some hyperplane in space that separates classes. But here we need two closed hyperfigures.
Mihail Marchukajtes:
...
In the first versions of the predictor it took about 40 minutes to optimize 6 inputs, which was extremely inconvenient, but now it takes 10 minutes to make 9 inputs. This only increased the quality of the model. Now the problem is where to find so many inputs. But we are not in the know. We still have something to offer to the predictor :-).Judging by the attached picture, do I get the point right?
Binary classifier on the left; ternary classifier on the right (the white zone is "minus")
If primitive for dummies, it's good enough as a visual aid.
If so, it's a good idea, for some reason I've never seen it before, can you please advise some articles on the ternary classifier?
If you are not banned from google, you can search by the phrase "ternary classifier machine learning".
If you're not banned from google, you can search for "ternary classifier machine learning".
In other words "Look for the first google link that leads to my site" :)
I found it, you have a committee of two models, this is not at all how I understood and wrote above.