Machine learning in trading: theory, models, practice and algo-trading - page 3285

 
Aleksey Vyazmikin #:

And here's the result - the last two columns

Indeed, the results have improved. We can make an assumption that the larger the sample, the better the training result will be.

It is necessary to try to train on 1 and 2 parts of the training sample - and if the results are not much worse than on 2 and 3 parts, then the factor of sample freshness can be considered less significant than the volume.

Well - the training is finished, the results are below in the table - the last two columns.


We can tentatively conclude that indeed, the success of training depends on the sample size. However, I note that the results of the "-1p1-2" sample are comparable, and even better by some criteria, with the"-1p2-3" sample, while for the "0p1-2" sample the results are twice as bad in terms of the number of models meeting the given criterion.

Now I have run a sample with inverted chronology, where the train sample consists of the initial exam+test+train_p3 sample, and the test sample is train_p2, and exam is train_p1. The goal is to see if it is possible to build a successful model on more recent data that would have worked 10 years ago.

What do you think will be the result?

 
Aleksey Vyazmikin #:

Well - the training is over, the results are below in the table - last two columns.


We can tentatively conclude that indeed, the success of training depends on the sample size. However, I note that the results of the "-1p1-2" sample are comparable, and even better by some criteria, with the"-1p2-3" sample, while for the "0p1-2" sample the results are twice as bad in terms of the number of models meeting the given criterion.

Now I have run a sample with inverted chronology, where the train sample consists of the initial exam+test+train_p3 sample, and the test sample is train_p2, and exam is train_p1. The goal is to see if it is possible to build a successful model on more recent data that would have worked 10 years ago.

What do you think will be the result?

A little bit more and the most trivial result will be obtained ... or maybe it will not be obtained, but then a discovery that will turn the world of ME upside down!

Way to go!

 
СанСаныч Фоменко #:

I have written many times about the "predictive power of predictors". which is calculated as the distance between two vectors.

I came across a list of tools for calculating the distance:

This is besides the standard one, which has its own set of distances

Nice build
 
Here's a task with no input: ...
What do you think the result will be? 😀

Just like before: here are the values of the features without the features themselves.....

And then he will write: and no one guessed, the result is like this 😁😁😁😁🥳
 
Maxim Dmitrievsky #:
Here's a task with no input: ...
What do you think the result will be? 😀

Just like before: here are the values of the features without the features themselves.....

And then he will write: and no one guessed, the result is like this 😁😁😁😁🥳

Max, I don't understand why you're making fun of me.

If there are no assumptions - don't say anything, if there are - say it, like "the result will suck".

 
Aleksey Vyazmikin #:
...

What do you think the result will be?

I don't know, but I'm curious to know.

 
СанСаныч Фоменко #:

Just a little more and a trivial result will be obtained ... or maybe not, but then a discovery that will turn the world of ME upside down!

Way to go!

So you think that the number of models will be comparable in the first two columns? Even though they are twice as different. Be more specific about triviality, please.

 
Andrey Dik #:

Max, I don't understand why you're making fun of me.

If there are no assumptions - don't say anything, if there are - say it, like "the result will suck".

I wrote above about matstat. Before that I wrote about kozul. Even before that I wrote about Oracle errors (markup errors), when data is marked up in a way you don't understand. What absolutely comes out of this is the realisation that on different chunks and lengths of training, the results will vary. Depends on the data, which is not provided or described.
Markup errors affect results and time periods. Which chicken used which paw to mark with which paw will be the chicken's result.

People here like to talk about the basic pillars of learning: preprocessing, quantisation, relation of perdictors to targets.... But they don't write about which paw is used for marking, left or right. More depends on it than all of the above.
 
Maxim Dmitrievsky #:
Above I wrote about matstat. Before that I wrote about kozul. Even earlier I wrote about Oracle errors (markup errors), when the data is marked up in a way you don't understand. What absolutely comes out of this is the realisation that on different chunks and lengths of training, the results will vary. Depends on the data, which is not provided or described.
Markup errors affect results and time periods. Which chicken used which paw to mark with which paw will be the chicken's result.

People here like to talk about the basic pillars of training, preprocessing, relation of perdictors to targets.... But they don't write about which paw is used for marking, left or right. It depends on that more than all of the above.

well, that already sounds like a pro's opinion (whether it's right or wrong is another question).
and there's no need to make fun.
 
Andrey Dik #:

Well, that already sounds like a pro's opinion (whether it's right or wrong is another question).
and there's no reason to make fun of it.
It's absolutely correct, and that's the biggest problem. Because of the lack of resources for proper markup (usually the most expensive), they even invented active learning. When algorithms themselves try to truthfully mark up a dataset + the help of annotators. In our case, to buy or sell.

And then they fiddle with their own markup errors. It's just obvious.