Machine learning in trading: theory, models, practice and algo-trading - page 2787

 
Aleksey Nikolayev #:

You can simply compare the histograms of the sample before and after transformation. If the final one is closer to the target form (normal or uniform distribution, for example), then the transformation is fine). Instead of drawing histograms, we can consider tests for conformity to the target (for normality or uniformity, respectively).

Don't they make plates parabolic in shape? Quite according to the formula)

Yes, look and visually select what is closer to the target) But there is no logic to what this transformation does and why it is better than the others.

It took a long time to get to these parabolas)))))) And the filters are really crackers))))

 
JeeyCi #:

and the goal is always the same - logic and adequacy instead of a stupid search by a greedy algorithm of all the rubbish and roaring about the lack of power for this business....

yes, the estimates should be valid - you call it "the error should not change", the prediction itself will change in time series (in dynamics)...

you can't get any further than your advertising remarks about tools -- without knowing how these tools work... you have been given a sledgehammer in your hands - youare waving it around(are you Chapayev? ??? ) with reference to your IV=0.02 threshold - THAT IS A LOW(!) connection - so why are you waving your slogans here... and calling the proposals of adequate analyses mashka (where they never existed in the past)... open your own advertising thread

and MO yes - it works the same everywhere and for the SAME PURPOSE - and in Py and other libraries it is not IV at all -- but the essence doesn't change -- you, apparently, not understanding the essence of data analysis - can only shout slogans about candidates and tools and stupidly load rubbish into your "black box" -- and you didn't even bother to use your predictions for their intended purpose....

well, open a branch for your advertising campaigns and shout there -- if you can't do anything but churn analysis (not even normal conclusions) -- you look like a fucking collector trying to get other people's ideas for your scrap metal (except for the word "tool" - you haven't even understood how it works) -- what was wrong with LogisticRegression?

=== you don't have to answer! (your personal Informational Value = 0 for me)... your interpretations of linear algebra are even lower in IV

The previous text made sense, but reflected a misunderstanding of what is being done here.

I replied NOT to you, but to other readers who constantly forget about the goal and criteria for achieving the goal, although there are plenty of people here who understand it quite professionally and have the appropriate tools.

And this text has no sense, some kind of offence of a little girl. I don't see the point of answering.

 
JeeyCi #:

and the goal is always the same - logic and adequacy instead of a stupid search by a greedy algorithm of all the rubbish and roaring about the lack of power for this business....

Let's assume that it didn't happen)))))

Logic and adequacy of estimations and understanding of processes is certainly better than their absence. But in statistics and theorver often enough there are no explanations why this or that methodology works. A person threw a needle, measured something there and calculated the number of Pi, another looked at the history of floods on the Nile and found something to measure that would predict the next. The logic in their actions is minimal.

In the same way in the rows, I think, you need to find the right signs, what to measure in general))))

 
the picketers have come out again ... rallying not here. and guessing isn't here. and they haven't learnt to quote, attributing their speculations to others.
 
Valeriy Yastremskiy #:

Logic and adequacy of estimates and understanding of processes is certainly better than their absence. But in statistics and theorising there are quite often no explanations why this or that methodology works. A person threw a needle, measured something there and calculated the number of Pi, another looked at the history of floods on the Nile and found something to measure that would predict the next. The logic in their actions is minimal.

In the same way in the rows, I think, you need to find the right signs, what to measure in general))))

It is hard to disagree with this.


But one should realise that without understanding the PURPOSE of the whole research one very quickly slips into teaching, to the presentation of the corresponding textbook without any prospects of getting the final product.

 

about needles and floods... it was just a coincidence:

we generate 100500*10^3 1D random-walks; if we take a single trajectory from the whole random-walk bundle and scrutinise it, it does not really follow the general integral conclusions. In some places it just contradicts them.

and we work/trade/leisure here always with one single sample. We don't have any others

 

СанСаныч Фоменко #:
  ...

that's what those who have written an "article" and can't defend themselves... as evidenced by his early claims against the research institute.

... everyone's an arsehole...

 
СанСаныч Фоменко #:

It's hard to disagree with that.


But one must realise that without understanding the PURPOSE of the whole study, one very quickly slips into teaching, into the presentation of the relevant textbook without any prospect of obtaining a final product.

The goal without understanding the road to it is a dream))))))

In general, market research without global tools of evaluation and analysis is akin to thinking about the cosmos or the simplest matter, and correct theories are possible, which can be confirmed by truncated tools.))))))

I am closer to the search for now, what and how to measure in a number of new things that have not yet been noticed.))))) That would more correctly assess the state. The prediction paradigm is kind of close, but still the task is different.

The logic should be like this. We measure something and this is the definition of the state. We measure different parameters for different states. And simply state the change of state. Of course, there should be a library / set of states. We measure on all scales and ticks. I hope that the logic of measurements on different scales will be the same, and tick scales will not differ much from candlestick scales. That's how it is)))))

 
You get a lot of inconsistencies, including the removal of outliers. They usually amount to 10% of the dataset size, according to different calculations. They deleted them and what, and how will the model trade when the outlier is caught? )
With transformations the situation is about the same.
If you do preprocessing classically, the results become worse than on raw data.
Or random improvements of metrics are pretended to be systemic.
 
Maxim Dmitrievsky #:
You get a lot of inconsistencies, including the removal of outliers. They usually amount to 10% of the dataset size, according to different calculations. Deleted and what, and how will the model trade when the outlier is caught? )
The same situation with transformations.
If you do preprocessing classically, the results become worse than on raw data.
Or random improvements in metrics are passed off as systemic.

Nothing can be done just like that, after reading textbooks and articles - this is a separate stage and is called learning. Without systematic knowledge of statistics there is nothing to do in MOE.

It is always necessary to do, trying to achieve the goal.

If we take an intermediate goal - maximum predictive ability of the predictor, then:

1. It is obligatory to remove outliers. If values greater than 0.5% of the quantile are considered as outliers, then outliers are less than 1%. By the way, this is the percentage of triggered stops in the future. We develop the trading system itself, we have digital limitations.

2. Preprocessing is mandatory, but again it depends on what kind. If we are talking about the predictive ability of the predictor, then you can't correct slopes that increase the predictive ability. This is an example. In general, we take some preprocessing algorithm and evaluate its effect on predictive power. The answer here is.

3. Always keep in mind the meaning of MO, which in my mind is to search for some patterns. Most obviously in RF. what number of patterns is contained in e.g. 5000 bars? Or from what window value does increasing the number of patterns not reduce the error? Or for some fixed window from what value of the number of patterns the error stops falling?

Answers for RF.

1. it does not make sense to increase the window above 1500 bar.

2. The relationship between the error and the number of patterns (trees) can be clearly seen on the graph:

Minimum 50. Generally from 100 to 200. The graph does not change when increasing the window up to 5000.

You should always clearly formulate the goal and the criterion for achieving the goal. Everything else is blah blah.