Machine learning in trading: theory, models, practice and algo-trading - page 560

 
SanSanych Fomenko:

I have similar numbers for forest and ada.

Now, if we go back to our "rams" - how do we discard noise from an arbitrary list of predictors? I have some empirical algorithm, which selected my 27 predictors out of 170. In addition, I have used it to analyze other people's sets of predictors and also successfully. Based on this experience, I argue that all methods from R that use "importance" variables in their algorithms cannot clear the predictor set of noise.

I appeal to all readers of the thread: I'm willing to do the appropriate analysis if the raw data is presented as RData or an Excel file that doesn't require processing.

Other than that.

Attached I am attaching a number of articles, which supposedly solve the problem of clearing the original set of predictors from noise, and with much greater quality. Unfortunately I don't have time to try it at the moment. Maybe someone will try it and post the result?


I decided to read the topic first (it turns out that I have not read). And then a number of questions arose, such as:

1. the scaffold is trained on a random subset of features, does this mean that the features that are randomly not in training, will be labeled as "not important"?

2. how to be\or understand, when there are categorical features, and the forest in advance will give them less importance than features with a greater number of categories?

3. PCA on each new sample will not allocate "new" components, which will be very different from the components in the training sample, how to deal with this?

And one more thing about recursive feature elimination:http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/

The article has links to previous parts, which describe scaffolding as well.

Selecting good features – Part IV: stability selection, RFE and everything side by side
  • 2014.12.20
  • blog.datadive.net
In this post, I’ll look at two other methods: stability selection and recursive feature elimination (RFE), which can both considered wrapper methods. They both build on top of other (model based) selection methods such as regression or SVM, building models on different subsets of data and extracting the ranking from the aggregates. As a wrap-up...
 
Maxim Dmitrievsky:

I decided to read the topic first (it turns out I haven't). And then a number of questions arose, such as:

1. the forest is trained on a random subset of traits, does this mean that traits that are randomly not in training will be labeled as "not important"?

2. how to be\or understand, when there are categorical features, and the forests in advance will give them less importance than the features with more categories?

3. PCA on each new sample will not allocate "new" components, which will be very different from the components in the training sample, how to deal with this?

And one more thing about recursive feature elimination:http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/

The article also has links to previous parts, which also describe the scaffolding.


Unfortunately, my personal answer will not add anything to what is written here or in other literature on the subject.

There are quite a number of algorithms for determining the "importance" of traits (regression or category doesn't matter) - all of which are in R.

I spent a lot of time mastering them, and in the end I found out that I had to get rid of noisy, irrelevant to the target variable signs, and then use them from R, which would reduce the error by 5-7%. I could not achieve less than 20% error.

Once again I note that I make a selection of important components on each bar, and on the resulting set retrain the forest.


The numbers are as follows.

  • I selected 27 predictors out of several hundreds of predictors by noise criterion, i.e. "relevant" to the target variable.
  • I select from 27 on each bar by importance criterion (RFE)
  • On obtained 5-15 signs I learn the forest. This list changes all the time within the 27 used.


Quality criterion for this approach: I take two files, on the first I teach, train, validation = approximately the same error. I check the resulting model on the second file = approximately the same error as on the first. I conclude from this that the model is not overtrained and in the future, at least one bar, will behave as it did in training.

Any other approaches do not work for me, and replacing one model type with another does not improve anything in terms of retraining.



Regarding PCA. My result was negative, in the sense that using the principal components did not reduce the error compared to the original set. I don't understand why, though theoretically it should reduce errors.

 
SanSanych Fomenko:

PCA is generally useless when using the model on heterogeneous traits, such as forex... in my opinion. This is because we trained on certain components, and the test appeared another combination of components, not taken into account in any way by the model.

if we go back to the scaffolding

I'll have to go to the source code, to understand how the traits that didn't make it into the training sample are evaluated, it doesn't say anything about it (most likely they are evaluated as bad).

+ the process of training forests randomized, with several training in a row you can get different results, sometimes significantly different... it's also not quite clear how to work with this gimmick. Ok, if we have finetuned the model, saved it and use it later... but if the model is self-training... it needs to be trained several times in a row, choose the minimum error, something like that... otherwise multiple runs in the tester produce different results, according to my observations up to 5 or more, then they repeat/repeat on subsequent runs

 
Maxim Dmitrievsky:

PCA is generally useless when using the model on heterogeneous traits, such as forex... in my opinion. Since we trained on some components, and in the test appeared another combination of components, not taken into account by the model in any way.

If we go back to the forest

I'll have to go to the source code, to understand how the traits that didn't make it into the training sample are evaluated, it doesn't say anything about it (most likely they are evaluated as bad).

+ the process of training forests randomized, with several training in a row you can get different results, sometimes significantly different... it's also not quite clear how to work with this gimmick. Ok, if we have finetuned the model, saved it and use it later... but if the model is self-training... it needs to be trained several times in a row, to choose the minimum error, something like that... otherwise multiple runs in the tester produce different results, according to my observations up to 5 or more, then they repeat/repeat on subsequent runs


I don't remember the passion you described. With the same seed the results are consistently the same.

 
SanSanych Fomenko:

I do not remember the passions you describe. With the same seed the result is consistently the same.


What does seed do? I don't remember... for the number of features, right? I use alglib forest

 
revers45:
The teacher who does not know the multiplication table and the developer of NS, do not impose it, random, correct solutions - do not pour any more!

I second that.

It's just a lot of air pollution. Post an example, ceiling data, so you can check it out.

There are three main ways of training: without a teacher (the target is not presented), with a teacher (the target is fully marked) and semisupervised (I do not know how to translate correctly). This is when the model is presented with the target not fully (partially) marked. Everything else is evil.

Good luck

 
Maxim Dmitrievsky:

What is seed responsible for? I don't remember... the number of signs, right? I use the alglib forest

Come on...

set.seed sets the random number generator to a certain state and when you repeat the calculation have a reproducible result.

Learn your math.

 
Vladimir Perervenko:

Come on...

set.seed sets the random number generator to a certain state and have a reproducible result when the calculation is repeated.

Learn the math.


I don't have such a setting, that's why I asked.

i can do it in MT5 too, thank you.

 
Vladimir Perervenko:

I second that.

It's just a lot of air pollution. Put out an example, ceiling data, so you can check it out.

There are three main ways of training: without a teacher (the target is not presented), with a teacher (the target is fully marked) and semisupervised (I do not know how to translate correctly). This is when the model is presented with the target not fully (partially) marked. Everything else is evil.

Good luck

Well, well.)) If you fully know the algorithm, and you can find the target, then why do you need NS? You can do everything without it.)

You need NS and other DM exactly when you don't know.

As for learning algorithms, they are developed and modified for each specific task. Mostly based on the basics.

And what you're saying is just shaking the air. Read anything beyond the introduction. )

Good luck.

 
Yuriy Asaulenko:

Well, well.)) If you fully know the algorithm, and you can find the target, then why do you need the NS? You can do everything without it.)

Did you understand what you wrote? The target isn't found, it's pre-defined as what the model needs to learn. What algorithm are you talking about?

NS and other DMs are needed precisely when you don't know.

What don't we know?

As for learning algorithms, they are developed and modified for each specific task. Basically based on the basic ones.

We're talking about two tasks to be solved here: regression and classification (omitting clustering and ranking). What other "specific tasks" do you have in mind?

And what you are talking about is shaky air. Read anything beyond the introduction. )

???

Good luck.