Machine learning in trading: theory, models, practice and algo-trading - page 3685

 
Andrey Dik #:

It occurred to me.

What do we see in the real world around us? Trees, houses, and many other things. This world has its own physical laws, gravity, inertia of bodies, laws of refraction of waves, and many others. Now, trees and houses are not laws, they are data. Roughly speaking, by the location of trees and houses on the current street and several previous ones, try to predict the location of trees and houses on the next street. Is that absurd? - Of course it is. But this is what happens in most cases when applying MO to market data - trying to predict the trees on the next street (whatever you want, price values or increments, direction or range) is in no way revealing laws in cvr. And they even go as far as "here, if you slightly undertrain to predict trees and houses on the next street, then you will definitely get a generalisation and will be able to calculate the law of gravity in this flat world of the price chart...". No, it won't.

So yes, an agent like this on a graph must perceive "trees" as effects, not cause - laws. Must literally live and improve his skills by learning and navigating the laws of this candlestick world.

There don't seem to be any fixed price patterns at all. I can't say for sure, but it's like looking for familiar shapes in the sky formed by clouds (they are just clots of vapour, there are no patterns in the shape). The patterns are different - temperature, pressure and humidity, that's what forms the clouds.

Making an analogy with forex, it's like you're talking about fundamental analysis.

If chaotic trees and houses are a visual consequence of physics, temperature, pressure, etc, then a chaotic visual graph is a consequence of fundamentals.

That is, the machine cannot extract anything from the chaos on the graph, swallowing this very chaos on the graph.


This makes sense, but given that we have no fundamental input data for the machine (or they are scarce-not enough), we have to sit with what we have.

And here's a thought: indirect signs. By analogy with nature: moss grows on the north side. And now we already know the coordinate bearing. And so on.

It turns out that we are looking for indirect signs in the chaos of the graph. Indirect traces of the foundation, of some action by a big guy, etc. That's the point of the machine: to find them.

Roughly speaking, there is still a point in picking through the graph.
 

Should I start razzing about haters too) Haters of common sense, the fundamentals of ME and science itself.

Science often combines inductive and deductive methods. First from the particular to the general - models are built from data, and then from the general to the particular - models are tested against new data. For example, looking at some trees builds an assumption about the direction of gravity versus the direction of their growth, from which an assumption is made about any trees, which is tested on new trees. Without this approach, many sciences (like cosmology, for example) would not exist at all.

By the way, the sizes and shapes of clouds obey interesting probabilistic patterns of fractal nature.

 
Aleksey Nikolayev #:

You obviously want to talk about your own issues, using my post as an excuse to do so. In principle, I understand what you wanted to say and I agree that it is very important, but I am talking about something else.

In the context of my message we are just talking about standard conditions for algorithms that ensemble - the error is less than that of a naive algorithm (we still need independence of course).

Your post is not the reason for my post.

It's not about a burning issue. It's about the base.

It is pointless to write about the general performance of the model, which is substituted by private criteria within one file, which are completely refuted by the performance in general.

I followed your way and have remarkable results: I can arbitrarily reduce the classification error when testing within the boundaries of a pre-trained file. But these results can NOT be extrapolated outside the file in a step-by-step run.

 
mytarmailS #:
In my understanding, your "bousting of the TC ensemble" with the ensemble's complication is just iterative creation of a single but complex TC with iterative improvement of equity. And as practice has shown (at me) it does not work precisely because of the high complexity.
Perhaps so. I would just like to clarify that the initial "weak" systems are not necessarily obtained by means of MO. For example, it can be "buy and hold" in a rising market with unexpected corrections.
 
Maxim Dmitrievsky #:

If I am not confused, you have written a lot about bootstrap for data multiplication and, consequently, models. I would like to know your opinion on how meaningful it can be used to build an ensemble of TCs.

 
Ivan Butko #:

This makes sense, but given that we have no fundamental input data for the machine (or they are scarce-insufficient), we have to sit with what we have.

And here's a thought: indirect signs. By analogy with nature: moss grows on the north side. And now we already know the coordinate bearing. And so on.

It turns out that we are looking for indirect signs in the chaos of the chart. Indirect traces of the foundation, of some action by a big guy, etc. That is the point of the machine: to find them.

Roughly speaking, there is still a point in picking in the chart .

We just make an assumption that events on the chart are influenced by some laws (it doesn't matter what kind of laws, whether it is fundamental, actions of big companies or Musk's fart). If there are no laws on the chart (hidden or explicit), there is no point in trying to trade.

It is comparable to observing what happens in nature, how leaves fall, water flows, on which side moss grows on rocks and trees, etc.. It won't directly predict the location of trees further into the forest, but it will allow us to make assumptions, at least where trees can't grow, where there can't be water, or conversely where there can be lots of water.

So yes, roughly speaking, the point of picking at the graphics is there after all.

"Insanity: doing the same thing over and over again and expecting different results."

 
Random text generators have migrated to another topic in a natural way :)
 
Aleksey Nikolayev #:

If I am not mistaken, you have written a lot about bootstrap for data multiplication and, consequently, models. I would like to know your opinion on how meaningful it can be used to build an ensemble of TCs.

Generative models somehow manage. There in the comments to this or some other article I did ensembling, I don't remember :).

 
Maxim Dmitrievsky #:

Generative models somehow manage. There in the comments to this or some other article I did ensemble, I don't remember :).

Yes, starting from here and a couple of pages with examples. It seems that models on synthetic data (at the end of attempts, on the second page) ensemble quite well.

Обсуждение статьи "Продвинутый ресемплинг и выбор CatBoost моделей брутфорс методом" - Лучшая класстеризация на хорошие плохие наборы проводится из случайного набора признаков, не полного набора.
Обсуждение статьи "Продвинутый ресемплинг и выбор CatBoost моделей брутфорс методом" - Лучшая класстеризация на хорошие плохие наборы проводится из случайного набора признаков, не полного набора.
  • 2020.11.24
  • Forester
  • www.mql5.com
Усреднение с плохим классификатором ухудшает общий результат. Надеюсь найдете время для сравнения усредненного и лучшего результатов на экзаменационной выборке. как показала практика не сильно ухудшает результат
 

I tried different methods of generation, of them only GMM pleased me, so the article is only with its participation.

Rating of generators (from memory):

  • GMM
  • copulas
  • autoencoders, including conditional ones (already bad), or lacked understanding of how to use them better.
  • GAN (also bad + slow)
  • GAN on transformers and other complex models. (tGAN, tsGAN) Very slow and very bad.
  • Other very bad ways...
There are already different packages for generating synthetic tabular data and time series, I don't follow the development.

You could probably ask some bullshit generator to generate :)