Machine learning in trading: theory, models, practice and algo-trading - page 804
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Don't be sad, I know.) Although, if you feel more comfortable, you can continue.
But you're right.
Tell me please, for the selection of data at the initial stage it is enough to look for correlation with the target data, if so, what correlation threshold should be used?
Correlation is a linear method. If it's present, there's no point in making a vegetable garden out of NS. Linear regression is enough.
Thank you for your answer.
And how to apply linear regression to identify the most stable relationships, by adding additional traits?
And I've been thinking about this nuance for a while now.... actually.
We have a ten by ten matrix, what can we say about it?
The amount of data is 100.
Then we can calculate the amount of information in this data, which will also be expressed in some units. What else does this data set contain, besides the amount of data and the amount of information???? I won't keep you in suspense, I will answer the question myself. The amount of knowledge. All of this is natural in relation to the target. So if we consider the basis of causality, then we get the following model.
The amount of knowledge -> amount of data -> amount of information.
So in order to predict it is necessary to find KNOWLEDGE about the required value from the set of data, but not the amount of information.
Knowledge itself is a very fragile thing which can be lost if the data is not converted skillfully. A careless change, even in a single entry, by a small amount can significantly reduce the amount, if not remove it completely.
That's why it's not recommended to make input data more complicated with conversions. The more complex the transformation, the less knowledge is left in the end result.
So... thoughts aloud about high matter, some people will not understand it and they will continue their way without having reached the final station....
And I've been thinking about this nuance for a while now.... actually.
We have a ten by ten matrix, what can we say about it?
The amount of data is 100.
Then we can calculate the amount of information in this data, which will also be expressed in some units. What else does this data set contain, besides the amount of data and the amount of information???? I won't keep you in suspense, I will answer the question myself. The amount of knowledge. All of this is natural in relation to the target. So if we consider on the basis of causality, we get the following model.
The amount of knowledge -> amount of data -> amount of information.
So in order to predict it is necessary to find KNOWLEDGE about the required value from the set of data, but not the amount of information.
Knowledge itself is a very fragile thing which can be lost if the data is not converted skillfully. A careless change, even in a single record, by a small amount can significantly reduce the amount, if not remove it completely.
That's why it's not recommended to make input data more complicated with conversions. The more complex the transformation, the less knowledge is left in the end result.
So... thoughts aloud about high matter, some people will not understand it and they will continue their way without having reached the final station....
Moreover, you've thought and rounded up number to dozens and lost part of data, it's harmful to think, and some continue...
Thank you for your answer.
And how do you apply linear regression to find the most robust relationships by adding more traits?
https://www.mql5.com/ru/articles/349
I do not understand the question. However, linear regression does not work in financial markets.
So there is no correlation? I thinkMaxim Dmitrievsky answered the question below.
https://www.mql5.com/ru/articles/349
Thanks for the answer.
For fans of crossvalidations, test samples, OOS, and other stuff, I will never tire of repeating myself:
SanSanych and Vladimir Perervenko in particular
Out-of-sample tests
This is the most popular and also abused validation method. Briefly, out-of-sample tests require setting aside a portion of the data to be used in testing the strategy after it is developed and obtaining an unbiased estimate of future performance. However, out-of-sample tests
reduce power of tests due to a smaller sample
results are biased if strategy is developed via multiple comparisons
In other words, out-of-sample tests are useful in the case of unique hypotheses only. Use of out-of-sample tests for strategies developed via data-mining shows lack of understanding of the process. In this case the test can be used to reject strategies but not to accept any. In this sense, the test is still useful but trading strategy developers know that good performance in out-of-samples for strategies developed via multiple comparisons is in most cases a random result.
A few methods have been proposed for correcting out-of-sample significance for the presence of multiple comparisons bias but in almost all real cases the result is a non significant strategy. However, as we show in Ref. 1 with two examples that correspond to two major market regimes, highly significant strategies even after corrections for bias are applied can also fail due to changing markets. Therefore, out-of-sample tests are unbiased estimates of future performance only if future returns are distributed in identical ways as past returns. In other words, non-stationarity may invalidate any results of out-of-sample testing.
Conclusion: Out-of-sample tests apply only to unique hypotheses and assume stationarity. In this case they are useful but if these conditions are not met, they can be quite misleading.
ROS can be used only for hypothesis cancellation or only for known stationary problems.
But not for search of strategies and selection of features/evaluation of system stability