Machine learning in trading: theory, models, practice and algo-trading - page 2602

 
mytarmailS #:
Specific example:
There is a logical rule which predicts something with 80% probability both on tray and on test and crossvalidation passed etc.. And on validation data (absolutely unknown new data) the rule works at random level...

There is another rule which behaves exactly the same way as the first one on a tray, test and passes validation also without problems, those are real regularities...

Question: How can I distinguish one rule from another at the stage of train, test, crosvalidation... Before the validation stage...

I wonder if there are any signs by which we can draw the line between one and the other, maybe some statistical tests for randomness or determinism, etc...

The question of randomness/lawfulness is a cornerstone for the whole algo and for ML in the algo.

Lots of all sorts of tricks and tricks.


The easiest is to look at how you got the result. If the best result you got... by picking the best result out of the set - of course, the likelihood of fitting is very high (especially if most of the other results are about nothing). The results on the test. How did you get them? - Of course, if you took the top 5% of the best on the trail, ran them all on the test, selected the best on the test - of course, the probability of fitting is still not small (especially if the average result of the others is not much). This "how not to do it" will, I'm sure, cut down the probability of getting overfitted very decently. It's for this reason that I don't see how anyone else's robot/model can be evaluated on equity at all - no way.


Further, as said, all sorts of tricks and tricks.

 
Replikant_mih #:

The question of randomness/lawfulness is a cornerstone for the whole algo and for ML in the algo.

There are all sorts of tricks and tricks.

The easiest one is to look at how you got the result. If the best result you got... by picking the best result out of the set - of course, the likelihood of fitting is very high (especially if most of the other results are about nothing). The results on the test. How did you get them? - Of course, if you took the top 5% of the best on the trail, ran them all on the test, selected the best on the test - of course, the probability of fitting is still not small (especially if the average result of the others is not much). This "how not to do it" will, I'm sure, cut down the probability of getting overfitted very decently. It's for this reason that I don't see how anyone else's robot/model can be evaluated on equity at all - no way.

Next, as said, all sorts of tricks and tricks.

It's all obvious stuff, buttery stuff... I'm interested in specific tricks

==============================

For example, as a variant of the analysis of the optimizing surface (OP) TC, rules, AMO, etc..

For example, OD TC intersection of two wagons on the target "factor of recovery".

Naturally, this TS is not working, has never been and never will be


=======================================

And here is a working TP, which makes a very stable profit so far (Valery knows :) )


So to speak, feel the difference.

 

So I have an obsessive idea that if you see the TC OP, you can say what it is and whether it will work on the new data...

But to count OP is long and difficult, maybe you can bypass it more elegant and less time-consuming in terms of computing resources.

 
mytarmailS #:

So I have an obsessive idea that if you see the TC OP, you can say what it is and whether it will work on the new data...

But to count OP is long and difficult, maybe you can bypass it more elegant and less time-consuming in terms of computing resources.

I have some idea how algo-traders do it. I have no idea how datascientists do it. And I know exactly how I do it.)


The optimization surface, as I understand it, is (in this case) a 3-dimensional space, where 2 axes are the axes of parameters (model, strategy), one is the target metric. Yes, of course, you can go in through that. I have a couple of ways, and if you need to, you can come up with something else. Really, I'm coming at it from the other side now. And, of course, there's no desire to share useful information with someone who goes in with"It's all obvious stuff, butterfingers.")

 
mytarmailS #:
It is not always possible to trace causality

Then there can only be assumptions about causes and the presence of a pattern. Causes are primary, behavior is secondary. In TA they sometimes forget about the primordiality of causes and take random repetitions of behavior as regularities, which are not.

 
Replikant_mih #:

I have some idea how algo-traders do it. I have no idea how datascientists do it. And I know exactly how I do it.)


The optimization surface, as I understand it, is (in this case) a 3-dimensional space, where 2 axes are the axes of parameters (model, strategy), one is the target metric. Yes, of course, you can go in through that. I have a couple of ways, and if you need to, you can come up with something else. Really, I'm coming at it from the other side now. And, of course, there's no desire to share useful information with someone who comes in with"It's all the obvious stuff, butterfingers.")

Look, if your answer to the question is : Lots of tricks and tricks. And Further, like you said, all sorts of tricks and tricks.

Thank you for this in-depth knowledge, which is definitely not "buttery oil."

Try to answer on special sites like SA or CV, I wonder how many pluses you'll get ...

If you're so bothered, you can always cry ))

 
mytarmailS #:

Look, if your answer to the question is: Lots of tricks and tr icks. And Further, as you said, all sorts of tricks and tricks.

Thank you for this in-depth knowledge, which is definitely not "butterfingers".

Try to answer on special sites like SA or CV, it's interesting how many pluses you'll get ...

If it bothers you that much, you can always cry ))

Glad you liked it).

 
Valeriy Yastremskiy #:

Then there can only be assumptions about causes and the presence of a pattern. Causes are primary, behavior is secondary. In TA as sometimes forget about the primacy of the reasons and casual repetitions of behavior take for regularities which are not them.

I agree, it's a difficult question.

That is why it is necessary to go to mathematics, so that the answer is a number and not "many chips and tricks".

 

1) I think it is obvious that there is not and cannot be any way to prove that a pattern established in history will necessarily work in the future.

2) The existence of a method that establishes a deterministic (non-random) pattern for the future based on data from the past would be a negation of (1)

We only have crossvalidation, which can only establish the homogeneity of a pattern on history. We can only interpolate the pattern, not extrapolate it. We have only a very weak PROPOSAL that a well interpolated pattern will turn out to be well extrapolated. This is not deductive inference, but only inductive - a variant of inference by analogy.

 
Aleksey Nikolayev #:

1) I think it is obvious that there is not and cannot be any way to prove that a pattern established in history will necessarily work in the future.

2) The existence of a method establishing the determinacy (non-randomness) of a pattern for the future according to data from the past would be a negation of (1)


So what is the obviousness of (1) and what are the arguments for its validity?