Machine learning in trading: theory, models, practice and algo-trading - page 162

 
SanSanych Fomenko:

Thanks, I read it.

I think the author is too optimistic.

The problem of overtraining is not solvable in principle.

...

In theory, it is solvable in terms of universal Laplace determinacy, for example, if all the necessary factors are known, and there is information access to them. But in practice such "decidability" has a lot of problems (not all factors are known and not all are available, and those that are available are often noisy by leaps and bounds).

SanSanych Fomenko:


...

To my conviction, if the input predictors are not first cleared from the noise ones, i.e. "irrelevant" to the target variable, then the "coarsening" method does not work, nor do other methods that use the concept of "importance" of predictors.

That according to your belief, and based on the confirmation by my experiences with jPrediction, that seems to be exactly what it should be?

But the problem is that not every experience confirms the above statement. It all depends on what kind of machine learning methods are used.

For example, Viktor Tsaregorodtsev conducted research on neural networks with BackPropagation and based on the results came to quite the opposite conclusions in his article"Reduction of neural network size does not lead to increased generalization ability," and I quote:

"This contradicts the view that eliminating noisy, uninformative features and redundant neurons is mandatory and useful in practice."

So it turns out that it is absolutely useless to draw any general conclusions for all machine learning methods without exception (to cut everyone in one line). For some methods such "conclusions" will be correct, while for others they may be deliberately wrong.

 
Yury Reshetov:

The increasing complexity of jPrediction models implies a gradual increase in the number of predictors. Because in jPrediction the number of neurons in the hidden layer is 2^(2*n+1), where n is the number of predictors. Respectively, with increasing number of predictors the complexity of model (number of neurons in hidden layer) increases.


If there are 100 predictors, then according to your formula the number of neurons in a hidden layer will be somewhere close to the number of atoms in the Universe (I'm afraid to even think about 200 predictors). You seem to have divine resources - computational and time.



 
Andrey Dik:

CC just gave a very unfortunate example, while continuing to persist in his ignorance...

What do you mean "other forces"? The same forces act on the ball and the fluff - the force of gravity (weight) and the force of wind flow distributed over half the area of the body.

...

Andrew, I remind you that this thread is about machine learning, not physics problems.

Be kind enough not to flub here on distracted topics, which are not welcomed in this thread.

If you are so eager to boast about your knowledge of physics, start a separate thread dedicated to it.

Especially since you are trying to challenge the metaphor with a smart face, putting yourself in a deliberately stupid position.

 
Yury Reshetov:

Andrew, I remind you that this thread is about machine learning, not physics problems.

Be kind enough not to flub here on distracted topics, which are not welcomed in this thread.

If you are so eager to boast about your knowledge of physics, start a separate thread devoted to it.

Especially since you're trying with a smart face to challenge a metaphor, putting yourself in a deliberately stupid position.

Well, if you think that metaphors based on wrong examples have any value, I won't interfere from now on.

I'm sorry. And you CC excuse me.

 
sibirqk:

If there are 100 predictors, by your formula the number of neurons in the hidden layer will be somewhere close to the number of atoms in the universe (I'm afraid to even think about 200 predictors). You seem to have divine resources - computational and time.

I don't care if there are 10,000 predictors. It is not certain that all of them are informative. That is, jPrediction will find some of the most informative ones among them, gradually complicating the models. And it will stop as soon as the generalization ability starts to decrease.

It does not come to the divine resources. Quite enough and ordinary personal computer.

 
Andrey Dik:

Okay, if you think that metaphors based on wrong examples have any value, then I won't interfere from now on.

I'm sorry. And you CC excuse me.

Metaphors have no value other than rhetorical value, regardless of their success in rhetoric. And picking on them is a mauvais ton.

Apology accepted, of course.

 
Yury Reshetov:
Metaphors have no value other than rhetorical value, regardless of their success in rhetoric. And picking on them is a mauvais ton.

If something said has no value, then it is bolabolism. I don't think the CC meant to bola-bola-bola-bola-bola, it's just the way he did it.

And metaphors are used when they want to convey some idea in accessible language by means of comparison. So for a politician some examples are good, and for a nuclear physicist other examples will be understandable, and for a politician and a nuclear physicist to understand each other they resort to comparisons, metaphors. Thus, metaphors have a definite purpose - to facilitate the understanding of the interlocutors.

But never mind, forget it.

 
Andrey Dik:

If something said has no value, then it's bolabolism. I don't think CC meant to bola-bola-bola, it just worked out that way for him.

He was just using a metaphor that wasn't very good. So what? To the wall to put him for that?

We are all human and we all make mistakes sometimes.

The other thing is that there is so much flub because of it, which excessively reduces the informativeness of the topic. And that's not good enough.

 
Yury Reshetov:

In theory, it is solvable in terms of Laplace's universal determinacy, for example, if all the necessary factors are known and there is informational access to them. But in practice, such "decidability" has a lot of problems (not all factors are known and not all are available, and those that are available are often noisy by leaps and bounds).

This according to your belief, as well as based on the confirmation of my experiments with jPrediction, it seems to be exactly what it should be?

But the problem is that not every experience confirms the above statement. It all depends on what kind of machine learning methods are used.

For example, Victor Tsaregorodtsev conducted research on neural networks with BackPropagation, and the results came to quite the opposite conclusions in his article"Reduction of neural network size does not lead to increased generalization ability", and I quote:

"It contradicts the view that the elimination of noisy, uninformative features and redundant neurons is mandatory and useful in practice."

So it turns out that it is absolutely useless to draw any general conclusions for all machine learning methods without exception (to cut everyone in one line). For some methods such "conclusions" will be correct, while for others they may be deliberately wrong.

If you look at the first publications of the author of randomforest algorithms, the author in all seriousness claimed that rf is not prone to overtraining at all and gave plenty of examples. The randomforest package itself is built so as to exclude even the slightest suspicion of overtraining.

At the same time, randomforest is the most overtrainable algorithm. I have personally burned myself.

I only trust the figures obtained using the following methodology.

We take two files that follow one another in time.

The first file is divided randomly into three parts: learning, testing and validation.

  • We learn on the training part, which in turn, the algorithm learns on the part, and evaluates on the part - the sample ALE - called out-of-sample. We get the learning error. We get a slice for ALE by cross-validation algorithms, i.e. it is different all the time.
  • We test the trained model on the test and validation part of the first file.
  • we get the error of applying the previously trained model. These three errors should be close.

We proceed to the second file, which is behind the first one in terms of time.

Apply the trained model to this second file. The resulting error should NOT be very different from the three errors.

AS A RESULT, WE HAVE FOUR ERROR VALUES, WHICH ARE NOT VERY DIFFERENT FROM EACH OTHER.

To me, this is the only proof of the lack of overtraining. And if we also get an error close to these four in the tester, we can trade.

That's all I believe in.

An overwhelming number of publications on machine learning are not tested on any analog of the second file. The reason is trivial. The algorithms are NOT applied on time series. And it turns out that a random division of file number one is quite sufficient. And this is indeed the case, for example, when recognizing handwritten text.

 

Regarding my metaphors and analogies.

I graduated in applied mathematics. And my teachers believed that I, like all my classmates, was capable of mastering any mathematical tool. At the same time, the teachers saw the main problem in our future activities as solving the problem of the applicability of a particular tool to a particular practical problem. This is what I have been doing all my life, but mastering any tool ..... In R there are hundreds or thousands of them, so what?

All this trolling at my address...

To contradict the troll is just to feed him.

Of course, it would be nice to clean up this thread, it was a great thread.

Reason: