Machine learning in trading: theory, models, practice and algo-trading - page 3005

 
mytarmailS #:
The fact that the task is not set correctly, this I have been saying here for a long time, the last time 10-20 pages ago, so as if I have nothing to teach me, I understand everything perfectly well.....

And to transform the task, you have to transform the data...

So while you're telling me to think, I've already thought about it and I'm going through the options.
And again you'll get memorisation instead of generalisation.
 
Maxim Dmitrievsky #:
The presence of a constant signal in the data almost immediately means generalisation, and noise goes into errors. If there is no signal, we get memorisation, where the classification error is simply overlapping samples with different labels and the same feature values. The second model has no predictive value. This is the answer to my rebus. And it is confirmed by tests on synthetic and real data.

In the presence of patterns (wavelets, shaplets and so on), but in the absence of a signal, only the division into trained and untrained will work. The trainable is classified, the untrainable is filtered. Here good memorisation works as a filter, and generalisation as generalisation of patterns.

When there are no pronounced persistent patterns (as in the market), but when there are inefficiencies, the approach must be even more refined, because inefficiencies cannot always be described by patterns. Here you have to scratch out one from the other: what needs to be remembered and what needs to be generalised. Algorithmically.

Transforming the data will change its representation, but will not solve the problem in itself, since it is the original series that is ultimately traded.

Regularities have a floating probability, which may be cyclical, or may disappear altogether. I work with large time ranges and have observed these phenomena.

I will say that it is possible to train a model for 2008-2017 (10 years), which will work until today. Such a model will have few signals - Recall up to 30% maximum, but this also says that there are few patterns for ten years that would work for the next couple of years (test sample).

But what kind of patterns are these - cyclical or one-time (maybe with a cycle of tens of years) - cannot be established yet, and therefore it is impossible to select a model that will continue to work.

Ideally, it is necessary to find cyclic patterns with a frequency of at least once every 3 months, and then we can expect that the model will be able to get out of the drawdown.

A package of such models trained on different cyclic patterns will reduce drawdown.

Therefore, the important thing is the initial signal + target + predictors having cyclic positive regularity with respect to the target.


Your approach - shaking and sifting - is similar to the work of a prospector - of course you can find a lot of interesting and not learnt, but how can you be sure that it will be stable?

I like to crack the date myself, and I have ideas in this direction, but now I want to dig in the direction of increasing the probability of stability of the model.

 
Aleksey Vyazmikin #:

Your approach - shaking and sifting - is like the work of a prospector - of course you can find a lot of interesting and not learnt, but how to be sure that it will be stable?

I myself like to shake the date, and have ideas in this direction, but now I want to dig in the direction of increasing the probability of stability of the model.

I wrote a basic understanding without which it is not worth doing anything at all. If only to confirm each of the points (if you didn't get it the first time 🙂 )

To verify stability, tests, tests and more tests, nothing else. MO is best suited for that.

Because theoretical whiners talk a lot about tsos and other nonsense, without even understanding the basics of MO, apparently. This entertainment is not for mental and moral weaklings, it is easier to whine :).
 
Maxim Dmitrievsky #:
I wrote a basic understanding without which it is not worth doing anything at all. If only to confirm each point (if you didn't get it the first time 🙂 )

To verify stability, tests, tests and more tests, nothing else. MO is best suited for that.

I'm not disputing your descriptions and the approach in general - I myself described it in essence long ago, albeit with a different implementation.

My artificial experiments have shown that it is often enough to pull out rows with 10 simple regularities (in my case, a quantum segment where the predictor value fell) that the whole sample would shift the probability by 15-20% towards one, while reducing the sample by 35%. This cannot be achieved by machine learning methods. In essence, this is the leaching of useless/contradictory data. But we want to do it closer to the current date, i.e. not by statistically knowing the history, but by some methods of selecting just false/contradictory patterns.

If we set a task, according to which the model should work on any instruments, then testing is possible, but there are even fewer such stable patterns. And just racing through history or, even worse, taking synthetics - I don't think it is effective. At most, it is possible to make synthetics from daily increments by shuffling days, but this is not available to me yet. Have you tried it?

 
mytarmailS #:
I'm completely out of the loop, but I'm wondering if a price can be represented by a graph, and if it has any advantage over the usual two-dimensional representation.
Who knows about this?

You could take Mandelbrot's representation of price as a forest of ternary trees, where each move is split into three (two moves in the same direction and a correction between them).

The advantage is access to collecting any statistics about the fractal structure of price. The disadvantages are the complexity of algorithms and the difficulty of avoiding looking ahead.

 
Stanislav Korotky #:

And how does this relate to the matrices and examples of "parallel" platforms I gave above?

For example, I take matrices from the link to keras, call for them:

and get zeros.

The control example does not fit.

Categorical cross-entropy is used in classification models where there are more than two classes. And after softmax. Softmax converts a set of values into a set of probabilities whose sum is 1.

Try a control example like this:

pred: 0.1, 0.1, 0.2, 0.5, 0.1

true: 0, 0, 1, 0, 0, 0

 
Aleksey Vyazmikin #:

At most, it is possible to make synthetics out of the daily increments by shuffling the days - but this is not available to me yet. Have you tried that?

I don't understand what such assumptions are based on, and this is again working with traits, not improving the way of learning. So it's secondary.

It's like school. You can teach from one textbook and you can teach from another. The information will not change in essence, just like with transformations. But it will fly into one head and out of the other :)

It is difficult to work with synthetics when the final goal is not clear. It is better to use it to test some properties of sequences, mix it into different parts and see how the results change. Like what if? To align offsets in the data still.
It is generally useful for improving generalisability by reducing memorisation of the original data, but it is still incomplete in terms of creating a ready-made TS. A similar result can be obtained on the original data, when training an ensemble of different models, which includes simple up to linear models.

Well, you know the approximate answer that it is still impossible to get something superb through it, but it is possible to improve it.

 
Aleksey Nikolayev #:

We can take Mandelbrot's representation of price as a forest of ternary trees, where each move is broken down into three (two moves in the same direction and a correction in between).

The advantage is access to the collection of any statistics on the fractal structure of the price. The disadvantages are the complexity of the algorithms and the difficulty of avoiding looking ahead.

Can it be presented in the form of associative rules?
 
mytarmailS #:
Can it be represented as associative rules?

I haven't thought about it, but I think it is unlikely, because the order of movements is important in prices.

Just in case, a picture to illustrate Mandelbrot's idea. Each price movement, if possible, is split into three movements (by selecting the maximum correction within it), then it becomes a node of the tree. If there is no correction inside the movement (or it is less than a given value), then it becomes a leaf of the tree.


 
Aleksey Nikolayev #:

because the order of movements is important in prices.

Is that a statement?
Reason: