Discussion of article "Evaluation and selection of variables for machine learning models"

 

New article Evaluation and selection of variables for machine learning models has been published:

This article focuses on specifics of choice, preconditioning and evaluation of the input variables for use in machine learning models. Multiple methods of normalization and their features will be described here. Important moments of the process greatly influencing the final result of training models will also be revealed. We will have a closer look and evaluate new and little-known methods for determining the informativity and visualization of the input data.

With the "RandomUniformForests" package we will calculate and analyze the importance concept of a variable at different levels and in various combinations, the correspondence of predictors and a target, as well as the interaction between predictors, and the selection of an optimal set of predictors taking into account all aspects of importance.

With the "RoughSets" package we will look at the same issue of choosing predictors from a different angle and based on other concept. We will show that it's not only a set of predictors that can be optimal, a set of examples for training can also be optimized.

All calculations and experiments will be executed in the R language, to be specific - in Revolution R Open 3.2.1 .

OOB error

Fig. 2. Training error depending on the number of trees

Author: Vladimir Perervenko

 

Congratulations!

Actually a reference book on the subject, based on the latest available tools in the field. Everyone can replicate, everything is free, maintained, developed, documented....

Respect and respect!

 
СанСаныч Фоменко:

Congratulations!

Actually a reference book on the subject, based on the latest available tools in the field. Everyone can replicate, everything is free, maintained, developed, documented....

Respect and respect!

Greetings SanSanych.

Of all the previously tried predictor selection options, this one is the most clear and detailed.

Good luck

 
Yes! Yes! I join San Sanych. My thanks! The topic of deep learning with all related nuances is actual and promising. I, as a person who does not have too deep mathematical knowledge, need this kind of information in a systematic, maximally accessible form, with the possibility to go deeper into certain points. Thanks again!
 
I do not want to seem ignorant and do not detract from the merits of the author, he is clearly well done, but does it all help to earn ?
 
Alexey Oreshkin:
I do not want to seem ignorant and do not detract from the merits of the author, he is certainly well done, but does it all help to earn ?

That depends.

Lonely seekers of millions with the cost of a couple of weeks? No, useless.

People who trade for a living and realise that a decent EA takes several years? Yes. And here it is very important to follow the right path and not wander through the forest ....

PS.

And with the help of what tools do professional participants of the securities market earn money? Not with the help of TA? After all, all TA in universities is taught in a fortnight with credit. And professional participants of the securities market are equipped with graduates with diplomas in "Statistics", "Econometrics", "Artificial Intelligence" ...... And for them the discussed article is quite understandable, although in many respects it is new.

PSPS.

I am not writing to discourage. Let's not look at some Merrill Lynch with 100 000 employees trading in all markets on all instruments.

We are talking about very limited TS: a couple of models, a dozen instruments. And really achieve a return of over 20% per month.

Here's the plan.

We put R. Next, we take RAttle. This is available to anyone who has written the simplest Expert Advisor. An hour of work. Using Excel, we prepare a source file. After that, Rattle makes available 6 very decent models, three of which (ada, random forest, SVM) are very promising and by their capabilities far surpass any variants with indicators and in particular neural nets (also available in Rattle and can be compared).

And then begins the tedious work, in many respects substantial work on digging through the list of input data. This is on the whole framework of Excel, and the evaluation of results in Rattle. Once this is mastered, you're in and on the right track.

And in TA..... You write an Expert Advisor, it seems to bring profit..... And then it is sure to go rotten and a great happiness if the trader is distrustful and thought of throwing it away before this next "grail" of his leaked depo.... And so it goes on for the whole life. Experience is not accumulated - theoretically impossible.

 
<br/ translate="no">

Here's the plan.

Let's put an R. Next, we take RAttle. This is available to anyone who has written the simplest Expert Advisor. An hour of work. Using Excel, we prepare a source file. After that, Rattle makes available 6 very decent models, three of which (ada, random forest, SVM) are very promising and by their capabilities far surpass any variants with indicators and in particular neural nets (also available in Rattle and can be compared).

Will it give money ? Then the richest and most successful traders would be mathematicians. Searching for mythical regularities in a non-stationary series is an analogue of flipping a coin.

 

"Searching for mythical patterns in a non-stationary series is analogous to flipping a coin."

This should be recorded as the most arrogant of the most stupid statements.

And this question: "Will it make money?" speaks about the level of training.

Indeed: "The mind of man is limited, stupidity is limitless".

 
Alexey Oreshkin:

Searching for mythical regularities in a non-stationary series is analogous to flipping a coin.

Yeah. Lamers do not know that the outcomes of a coin toss are known to be stationary. Unless of course it is made of plasticine, i.e. it does not change its shape for a long time. However, fools who will give an opportunity to earn money on such stationary processes are still to be found.
 
In general, everyone should learn statanalysis, in any case it will not be superfluous :) But the problem remains - what data to input, and to output as well :)
 
Maxim Dmitrievsky:
In general, everyone should learn statanalysis, in any case it will not be superfluous :) But the problem remains - what data to input, and to output as well :)

It will never be superfluous. But what does input and output have to do with it if even in this thread, in discussions, they suggest using 6 very decent models. What are they decent for? To predict the consumption of heat traffic - for this purpose they can be decent. What's the market got to do with it?

I am not judging, and certainly not arguing with this approach to making money. For me it's just an interest and discussion, nothing more. And all the lamers who have long ago created their super-smart neural networks ... and still there..... it's not even interesting to talk to them.