Machine learning in trading: theory, models, practice and algo-trading - page 3468

 
mytarmailS #:

You're doing it a little bushy.

You run many TC settings (many datasets) through the model and the model gives a probability of better TC settings right?

So on each new bar you create a huge number of datasets.


I think it is better to take a regression algorithm with many outputs, where the outputs will be ready parameters for the TS.


Your scheme is as follows: many datasets + model with one output

Or better : one dataset + model with many outputs.


example

https://stackoverflow.com/questions/57704609/multi-target-regression-using-scikit-learn


It's a bit redundant, yes. Each new bar generates a number of new rows equal to the tested parameters for the strategy. Factors - market chips + TS parameters, output - binary output where a bar will be either profitable or unprofitable.

The regression model that would output parameters for the strategy must be somehow trained to produce such values of parameters that will give a positive profit. That is, the profit should be used as a predictor. It turns out that we train a model for which X will contain market chips and either absolute profit from trading within a minute bar, or just a sign of profit, and y will be the strategy parameters. Then, when trading, we give it the chips and the desired profit or the profit sign as input, so that it regresses the parameters for this combination of chips and profit to us?

 
Arty G #:


Ran across this article. Will read it again, thanks!


The goal is not really to find market regimes in which a strategy works well, but to define for a regime a set of parameters for the strategy that will work well.

What I do is run a hundred backtests on the same time frame with different settings. Then I put it all into one dataframe and train the model. For the model, the combination of trading strategy settings is the same chip as other market chips like volatility or, say, overweight in trades balance. Thus the model learns to understand which parameters for the strategy can be profitable at which market chips. That is, X for the model looks like this: [volatility_short, volatility_long, rsi, trade_imbalamce, strategy_param1, strategy_param2]. And y is the sign of what has been earned per minute. Since orders are sent up to 10 pieces per second, you can trade enough in a minute to understand how the parameters work. In the dataframe, in which we uploaded all this stuff, market chips are duplicated as many times as there are unique combinations of strategy parameters. In other words, the dataframe does not just contain trading results for the whole backtest, but minute-by-minute data:

X:[volatility_short, volatility_long, rsi, trade_imbalamce, strategy_param1, strategy_param2], y:[step_profit].


We got the model. Now, running a backtester or live trading, we can cycle through the model every minute, passing it the current set of features and available combinations of parameters for the trading strategy, until we find such a combination of parameters, at which the model sees a high probability of profitable trading.

That is, the hypothesis is that for different market regimes there is a different set of parameters, at which the trading system will be profitable and also that we can teach the ML model to find a suitable set of parameters for different regimes.

Well, it turns out that first you need to define the modes, and then do the same thing you do for each of them, excluding data from other modes :)

 
Arty G #:


It's a bit excessive, yes. Each new bar generates a number of new lines equal to the tested parameters for the strategy. Factors - market chips + TS parameters, output - binary output where the bar will be either profitable or unprofitable.

The regression model that would output parameters for the strategy must be somehow trained to produce such values of parameters that will give a positive profit. That is, the profit should be used as a predictor. It turns out that we train a model for which X will contain market chips and either absolute profit from trading within a minute bar, or just a sign of profit, and y will be the strategy parameters. Then, when trading, we give it the chips and the desired profit or the profit sign as input, so that it regresses the parameters for this combination of chips and profit to us?

I don't know exactly what you do and how you do it, so I can't give specific advice, but in general it should look like this.

You feed the price series and its characteristics into the model and the model responds to you:

"For the most profitable trade, you need to set up the following periods.

stochastic n = 12, rsi n = 33, mashka n = 55 ..... .... ....


----------------------------------------------------------------------

But basically I recommend you to do what you do and not waste your time on learning a new but not necessarily better approach.
Your approach should not be worse in terms of accuracy, it is just more cumbersome.

----------------------------------------------------------------------

It's basically called adaptive filtering, a hundred year old DSP idea.

 
Maxim Dmitrievsky #:
Well, it turns out that first you need to define the modes, and then do the same thing you do for each of them, excluding data from other modes :)

In the current version, I dump the features that seem important to me for the TS of this type into the model and the scatter of parameters for the TS, taken from scratch, and train the model to understand at which combinations of features + parameters of the TS there will be a profit, and at which there will be a loss.

Do you advise to define modes more specifically (for example, using clustering methods from your article) and then teach the model to choose optimal parameters for each mode? Do I understand correctly that clustering of modes will allow to determine a profitable set of parameters with a higher probability than my current set-up?

 
Arty G #:

In the current version, I dump the features that seem important to me for the TS of this type into the model and the scatter of parameters for the TS, taken from scratch, and train the model to understand at what combinations of features + parameters of the TS there will be a profit, and at what combinations there will be a loss.

Do you advise to define modes more specifically (for example, using clustering methods from your article) and then teach the model to choose optimal parameters for each mode? Do I understand correctly that clustering of modes will allow to determine a profitable set of parameters with a higher probability than my current set-up?

For a particular mode, yes, if the TS is sensitive to volatility, for example. That is, the second model will determine what mode is now, and if it is the right one, trading is allowed. And the first model is trained on your parameters for this mode.

But you can select parameters for the whole dataset, without dividing it into modes, and then see how the TS works on specific modes. On which one is better, trade only on it. That is a usual filter. This is how it is done in the article.

 
mytarmailS #:

I don't know exactly what you're doing and how you're doing it, so I can't give specific advice, but in general it should look like this.

You feed the price range and its characteristics into the model, and the model responds to you:

"For the most profitable trading, you need to set up the following periods

stochastic n = 12, rsi n = 33, mashka n = 55 ..... .... ....


----------------------------------------------------------------------

But basically I recommend you do what you do and don't waste your time learning a new but not necessarily better approach.
Your approach doesn't have to be worse in terms of accuracy, it's just more cumbersome.

----------------------------------------------------------------------

It's basically called adaptive filtering, a hundred year old DSP idea.


Got it. I'm trying to do it already. For regression of suitable parameters it is necessary to provide the model with earnings per minute bar. Then we feed it with current chips and "desired" earnings, and at the output we get a set of parameters for the strategy. To begin with, I will try to supply just the earnings sign. Then we can try to create cateogories - like, small profit, big profit. I wonder how the trained model will behave if at the current state of affairs the category "big profit" is impossible in principle - the market is dead, for example, there are few deals.


Maxim Dmitrievsky #:

For a particular mode, yes, if the TS is sensitive to volatility, for example. That is, the second model will determine what mode is now, and if it is the right one, trading is allowed. And the first model is trained on your parameters for this mode.

But you can select parameters for the whole dataset, without dividing it into modes, and then see how the TS works on specific modes. On which one is better, trade only on it. That is a usual filter. This is how it is done in the article.

Got it, thanks. Basically, it is labelling by modes where we either trade or do not trade at all. I want to go a bit further and somehow select more optimal (from the ML model's point of view) parameters for the next minute of trading.

 
Arty G #:
That is, X for the model looks something like this: [volatility_short, volatility_long, rsi, trade_imbalamce, strategy_param1, strategy_param2]. And y is the sign of what was earned per minute.

As I understand it, there are predictor values that change once a minute, and there are strategy settings values that you cycle through and see the result.

I think it is still appropriate to use clustering here, but only on those predictors that change once a minute - they determine the market state.

Then for each cluster we can look at the probability distribution of specific strategy settings. And you can look at the settings separately, or you can consider them as a numbered set. As a result, you will get such a base - with probability indicators and estimation of financial result for each setting. Then you just define the current cluster and choose the optimal settings.

Another option is multiclassing for each setting. If the settings affect the TS through a function - you can regression. Then take the responses from each model and use them as settings.

Another option is to look at ranking and recommendation systems - in theory you can get sets of optimal settings for the current situation. However, I haven't got my hands on it yet, so only in theory here.

Arty G #:
I will read your article about quantisation. I used deciles, but when I switched to real values the accuracy of predictions increased dramatically. My understanding was that for trees, data normalisation was not so important, but that the granularity of the data was lost a lot when using deciles.

Read. Head-to-head quantisation is not always efficient, the paper proposes a tool to assess the quality of quantisation, the loss of accuracy is assessed.

Arty G #:
CatBoost will give it a try, thanks. I just have the backtester and live trading code written in Numba in Python and using trained models in sklearn doesn't work there. For this I found a way to export the random forest model into a set of sheets and dicts, which allowed me to use the trained model under Numba. I will investigate if this or similar is possible with CatBoost.

I don't really understand the difficulties of applying the model - check out their site for lots of examples with detailed code for python - I'm not a python expert myself. Again you could use ONNX - in theory.

Arty G #:
About data size - I meant that data for backtest weighs too much and it takes a lot of space and time to run backtest on a hundred other parameters on a year of HFT data (including all book updates and trades).

Ah, you mean the data on which you trade and on which you form strategy settings, right?

What do you store the data in?

 
Ivan Butko #:

It seems to me that predicting the future is a half-measure.

It is necessary to learn to exit a position in time.

And the payment for a mistake is a short loss.

That is, we teach to enter and teach to exit.

Simply forecasting a trend is to sit to one side and not wait for it to work out.


Ideally, yes, but in fact, I am better at identifying favourable conditions for the start of a trend from flat than a specific reversal point. The probability of the trend ending soon can be predicted, but it will be a scatter of points long before the extremum of the same ZZ. As a result, we will close on average the same as from a trawl based on the same ZZ.

I was thinking about a system of confirming the trend, and if expectations have decreased, we close - we check the state not on every bar.

 

I made it for buying.

But for a clear reason, volatility.

If the candle on which the signal was triggered was abnormally volatile, the entry on the opening gets the hell out of there.


But if the volatility is normal and the market is calm, the entry is normal.


It so happened that 4 out of 5 triggers were on extreme volatility....

So for now... we need to normalise the signal strength to volatility....


Also an interesting picture if you switch from m1 to m15 and increase the entry threshold.


 
Aleksey Vyazmikin #:

As I understand it, there are predictor values that change once a minute and there are strategy setting values that you cycle through and see the result.

Yes, that's right. And the trading takes place during this minute, as it is HFT.


Aleksey Vyazmikin #:

I think it is still appropriate to use clustering here, but only on those predictors that change once a minute - they determine the market state.

Then for each cluster we can look at the probability distribution of specific strategy settings. And you can look at the settings separately, or you can consider them as a numbered set. As a result, you will get such a base - with probability indicators and estimation of financial result for each setting. Then you just define the current cluster and choose the optimal settings.


It turns out that we define a set of predictors, cluster them all together into different clusters, then build a model that tells us which settings for which cluster will give the highest probability of profitable trading. This is where the clustering layer comes in from my original idea - are we hoping that by clustering the mode-determining features into clusters, it will be easier for the model to train itself to identify profitable combinations of parameters?


Aleksey Vyazmikin #:
Another option is multiclassing for each setting. If the settings affect the TS through a function - you can regression. Then take responses from each model and use them as settings


Do you mean regression of settings for profitable trading, roughly as discussed a couple of posts above? I tried to train RandomForestRegressor, to which I feed minute predictors as features, as well as the profit received for a minute in the form of just a sign, and as dependent value two parameters for the TS, strongly influencing the behaviour of the TS. I haven't run it in the backtester yet, but I can't see yet, the distributions of predictions for positive and negative trading are very different.


Aleksey Vyazmikin #:
There is also an option to look at ranking and recommendation systems - in theory you can get sets of optimal settings for the current situation. However, I haven't got my hands on it yet, so only in theory here.

Have you come across any useful articles on this topic that you could point to for enlightenment? I would be grateful.


Aleksey Vyazmikin #:
I don't really understand what difficulties may arise in applying the model - look at their site for many examples with detailed code for python - I'm not a python expert myself. Again you can use ONNX - in theory.

Ah, you mean the data, on which you trade and on which you form strategy settings, right?

What do you store the data in?


Yes, I am talking about the data for the backtester. The data is initially downloaded in CSV format from the data vendor, and then converted to the Numpy library data format - npz, and stored in it. It is a ZIP archive, but the data is not compressed.

Машинное обучение в трейдинге: теория, модели, практика и алготорговля
Машинное обучение в трейдинге: теория, модели, практика и алготорговля
  • 2024.04.09
  • mytarmailS
  • www.mql5.com
Добрый день всем, Знаю, что есть на форуме энтузиасты machine learning и статистики...
Reason: