Machine learning in trading: theory, models, practice and algo-trading - page 3183

 
fxsaber #:

One.

I don't understand this statement. What is meant by the following two options?

  1. It takes many iterations of randomisation to get a workable one.
  2. If you create a lot of randomised characters, the probability increases that there will be a workable one among them.

The randomisation algorithm is as follows:

  1. A real tick history is taken.
  2. A sequence of increments of the average ((bid+ask)/2) price is made from it.
  3. In this sequence each term is randomly multiplied either by +1 or -1.
  4. A new tick history is collected from the obtained sequence of increments, where time and spread coincide with point 1.
  5. The new tick history is written in a custom symbol.
I.e. some real symbol is randomised. You can apply item 3 any number of times. If after item 5 all five points are repeated, it is the same as repeating item 3 twice.

Yes, the highlighted

You need to run many times, many characters. I've shown an example of my over sampler above. It just randomly pulls samples for training from the same row and the results are always different on OOS.

Exactly the same sharp dips on OOS.
 
I just don't understand the point of this SB stuff, what does it prove? ) You can get whatever curve you want on the OOS if you mix it enough times.
 
mytarmailS #:
Shit, I don't know how to put it in simple terms.

You choose better variants "by hand" on OOS after optimisation on the test and it is NOT a fitting....

And if the algorithm chooses the best options on the OOS after optimisation, that's a fit. Why?

Choosing the best variants/option from the full set of variants is optimisation... It doesn't matter if you do it by hand or by algorithm.

Perhaps you have worked only with the tester in MT and think somewhat formulaic about optimisation itself and ways of its application, that's why we have some misunderstanding

Your statement.

Forum on trading, automated trading systems and testing trading strategies

Machine learning in trading: theory, models, practice and algo-trading

mytarmailS, 2023.08.16 13:23

Imagine that you have only 1000 variants of TS, in general.


your steps 1 and 2

1) You start to optimise/search for a good TS, this is traine data (fitting/searching/optimisation).

Let's say you have found 300 variants where the TC makes money...

2) Now you are looking for a TC out of these 300 variants which will pass OOS is test data. You have found say 10 TCs that earn both on the traine and on the test ( OOS ).


So what is point 2 ?

It is the same continuation of fitting, only your search(fitting/searching/optimisation) has become a bit deeper or more complex, because now you have not one condition of optimisation (to pass the trade), but two (to pass the test + to pass the trade).

Let's imagine that there are a million times more variants: 1 billion TCs, 300 million TC variants are found, where on the trained sample it makes money - this is p.1.

In p.1. the optimisation is done on some fitness function. The higher the value, the higher the fitness is assumed to be. So the optimisation is concerned with finding the global maximum. All of this is p.1.


  • When the optimisation is over, you can search among 300 million positive results for those five that pass OOS. I don't do that.
  • Or you can take the five closest results from the global maximum and only look at them for OOS.
So if you do the first one, it is an optimisation of the view

Forum on trading, automated trading systems and testing trading strategies.

Machine Learning in Trading: Theory, Models, Practice and Algorithm Trading

fxsaber, 2023.08.19 01:32 pm

Do you think you should trust the train_optim + test_forward model more than (train+test)_optim?

I.e. it's a fit in its purest form.


If you do the latter, however, it is not a fit.

 
Maxim Dmitrievsky #:

Exactly the same way I get both OOS working models and not, through the same algorithm. The symbol is the same, no new randomisation has been added.

I have had training not on the same symbol. Obviously there are series with any characteristic in the randomisation cloud.

 

The front is worse and the back is better. And the reverse situations are exactly the same. I just haven't done a lot of rebuilding at the moment.

 
fxsaber #:

Your assertion.

Let's imagine that there are a million times more variants in total: 1 billion TCs, 300 million TC variants are found, where on the trained sample it makes money - this is point 1.

In p.1. the optimisation is done on some fitness function. The higher the value, the higher the fitness is assumed to be. So the optimisation is concerned with finding the global maximum. All of this is p.1.


  • When the optimisation is over, you could search among 300 million positive results for the five that pass OOS. I don't do that.
  • Or you can take the five closest results from the global maximum and only look at them for OOS.
So if you do the first one, it's an optimisation of the form

So it's pure fitting.


If you do the second, it's not fitting.

Got it. I apologise.

 
fxsaber #:

I have had training on more than one characteristic. Obviously, in the randomisation cloud there are rows with any characteristic.

Well, I don't see a problem. All these TS are randomised because they trade in a non-stationary market. But some variants can bring profit in some perspective.

 
Maxim Dmitrievsky #:

Yes, the highlighted

You have to run many times, many characters.


I've shown an example of my over sampler above. It just randomly pulls samples for training from the same row and the results are always different on OOS.

On the real symbol I do not have such an effect. I choose any 40% of the optimisation interval and after that the results are very similar on OOS.

This is the symbol I chose for randomisation and gave its training graphs.

Exactly the same sharp dips on the OOS.

I don't see them always.

 
fxsaber #:

On the real symbol I do not observe such an effect. I choose any 40% of the optimisation interval and after that the results are very similar on OOS.

This is the symbol I chose for randomisation and gave its training graphs.

I don't see them always.

Still means there is more alpha in the ticks. Found a way to quickly search in them (through MO it would have been very long). I'll roll out the results later when I'm done.

 

I looked at several types of simulations of time series and their characteristics, created a synthetic series from sinusoids and noise (I took sinusoids for better clarity).

The conclusion is... This simulation is still to be properly understood...


The first row is original (top left), all other rows are simulations built on the characteristics of the first row.

another run


par(mar=c(2,2,2,2),mfrow=c(3,2))

n <- 1:1000
#  original series
s <- sin(n/50+1) + sin(n/20+15)/2 + rnorm(n,sd = 0.1) 
s |> plot(t="l", main = "original series (2 sin + noise)")

s |> rnorm(mean = mean(s),sd = sd(s)) |> cumsum() |> plot(t="l",main = "random generation")
library(forecast)
s |> ets() |> simulate() |> plot(t="l",main = "Exponential smoothing state space model")
s |> ar() |> simulate() |> plot(t="l",main = "Fit Autoregressive Models to Time Series")
s |> nnetar() |> simulate() |> plot(t="l",main = "Neural Network Time Series Forecasts")
s |> Arima() |> simulate() |> cumsum() |> plot(t="l",main = "ARIMA model to univariate time series")
Reason: