Machine learning in trading: theory, models, practice and algo-trading - page 804

 
Yuriy Asaulenko:

Don't be sad, I know.) Although, if you feel more comfortable, you can continue.

But you're right.

 
Aleksey Vyazmikin:
Tell me please, for the selection of data at the initial stage it is enough to look for correlation with the target data, if so, what correlation threshold should be used?
Correlation is a linear method. If it is there, there is no point in making a vegetable garden of NS. Linear regression is enough.
 
Grigoriy Chaunin:
Correlation is a linear method. If it's present, there's no point in making a vegetable garden out of NS. Linear regression is enough.

Thank you for your answer.

And how to apply linear regression to identify the most stable relationships, by adding additional traits?

 
I don't understand the question. However, linear regression does not work in financial markets.
 

And I've been thinking about this nuance for a while now.... actually.

We have a ten by ten matrix, what can we say about it?

The amount of data is 100.

Then we can calculate the amount of information in this data, which will also be expressed in some units. What else does this data set contain, besides the amount of data and the amount of information???? I won't keep you in suspense, I will answer the question myself. The amount of knowledge. All of this is natural in relation to the target. So if we consider the basis of causality, then we get the following model.

The amount of knowledge -> amount of data -> amount of information.

So in order to predict it is necessary to find KNOWLEDGE about the required value from the set of data, but not the amount of information.

Knowledge itself is a very fragile thing which can be lost if the data is not converted skillfully. A careless change, even in a single entry, by a small amount can significantly reduce the amount, if not remove it completely.

That's why it's not recommended to make input data more complicated with conversions. The more complex the transformation, the less knowledge is left in the end result.

So... thoughts aloud about high matter, some people will not understand it and they will continue their way without having reached the final station....

 
Mihail Marchukajtes:

And I've been thinking about this nuance for a while now.... actually.

We have a ten by ten matrix, what can we say about it?

The amount of data is 100.

Then we can calculate the amount of information in this data, which will also be expressed in some units. What else does this data set contain, besides the amount of data and the amount of information???? I won't keep you in suspense, I will answer the question myself. The amount of knowledge. All of this is natural in relation to the target. So if we consider on the basis of causality, we get the following model.

The amount of knowledge -> amount of data -> amount of information.

So in order to predict it is necessary to find KNOWLEDGE about the required value from the set of data, but not the amount of information.

Knowledge itself is a very fragile thing which can be lost if the data is not converted skillfully. A careless change, even in a single record, by a small amount can significantly reduce the amount, if not remove it completely.

That's why it's not recommended to make input data more complicated with conversions. The more complex the transformation, the less knowledge is left in the end result.

So... thoughts aloud about high matter, some people will not understand it and they will continue their way without having reached the final station....

Moreover, you've thought and rounded up number to dozens and lost part of data, it's harmful to think, and some continue...

 
Aleksey Vyazmikin:

Thank you for your answer.

And how do you apply linear regression to find the most robust relationships by adding more traits?

https://www.mql5.com/ru/articles/349

Множественный регрессионный анализ: генератор стратегий и тестер в одном флаконе
Множественный регрессионный анализ: генератор стратегий и тестер в одном флаконе
  • 2011.12.07
  • ArtemGaleev
  • www.mql5.com
Один мой знакомый, посещая учебные курсы о торговле на форекс, получил домашнее задание - построить торговую систему. Повозившись с этим с недельку, он сказал, что эта задача, пожалуй, сложнее, чем написать диссертацию. Тогда я предложил ему попробовать множественный регрессионный анализ. В итоге за вечер была создана "с нуля" торговая система...
 
Grigoriy Chaunin:
I do not understand the question. However, linear regression does not work in financial markets.

So there is no correlation? I thinkMaxim Dmitrievsky answered the question below.

Thanks for the answer.

 
Try to plot the autocorrection graph on the price data and you will immediately see if there is a correlation or not. It is useless to add indicators. The indicator is a function of price. Therefore, it is necessary to build it on price data only.
 

For fans of crossvalidations, test samples, OOS, and other stuff, I will never tire of repeating myself:

SanSanych and Vladimir Perervenko in particular

Out-of-sample tests
This is the most popular and also abused validation method. Briefly, out-of-sample tests require setting aside a portion of the data to be used in testing the strategy after it is developed and obtaining an unbiased estimate of future performance. However, out-of-sample tests
reduce power of tests due to a smaller sample
results are biased if strategy is developed via multiple comparisons
In other words, out-of-sample tests are useful in the case of unique hypotheses only. Use of out-of-sample tests for strategies developed via data-mining shows lack of understanding of the process. In this case the test can be used to reject strategies but not to accept any. In this sense, the test is still useful but trading strategy developers know that good performance in out-of-samples for strategies developed via multiple comparisons is in most cases a random result.
A few methods have been proposed for correcting out-of-sample significance for the presence of multiple comparisons bias but in almost all real cases the result is a non significant strategy. However, as we show in Ref. 1 with two examples that correspond to two major market regimes, highly significant strategies even after corrections for bias are applied can also fail due to changing markets. Therefore, out-of-sample tests are unbiased estimates of future performance only if future returns are distributed in identical ways as past returns. In other words, non-stationarity may invalidate any results of out-of-sample testing.


Conclusion: Out-of-sample tests apply only to unique hypotheses and assume stationarity. In this case they are useful but if these conditions are not met, they can be quite misleading.

ROS can be used only for hypothesis cancellation or only for known stationary problems.

But not for search of strategies and selection of features/evaluation of system stability