Machine learning in trading: theory, models, practice and algo-trading - page 3555

 
Aleksey Vyazmikin #:

Definitely. But in the process of development/research, on the contrary, a lot of additional tricks and complications appear. When the work is complete and everything is clear and obvious, then you can optimise and reduce/accelerate something.

I see that you can analyse bins, for example, as suggested in the article, and then select successful ones. It will take little code and will be very clear.

 

Discretisation with a teacher is a very curious thing, but it consumes a lot of computational resources

discretisation::mdlp()

 
Maxim Dmitrievsky #:

I can see that you could analyse bins, for example as suggested in the article, and then select the successful ones. It will take little code and will be very clear.

What article? The link above, so there is no analysis there, but only a search of quantum tables. In general, CatBoost has various methods of quantisation built in (which are not in KBinsDiscretizer library) - experiment with settings. There is a possibility to save quantum tables and on them then to transform a sample for other methods of training.

 
Aleksey Vyazmikin #:

What article? That above link, so there is no analysis in essence there, but only enumeration of quantum tables. In general in CatBoost different methods of quantisation are built in (which are not in KBinsDiscretizer library) - experiment with settings. There is a possibility to save quantum tables and on them then to transform a sample for other methods of training.

You can select the number of bins and individual bins that outperform the others. An analogue of clustering. That's why I wrote before why not just use clustering.

In any case, without regard to brute-force or competent sampling of targets, it's a finger in the sky. Because the artificial limitation of options.

The main emphasis in the classification of time series (financial), when there is a choice of when to trade and when not to trade (i.e. already a discrete representation of BP), should be on the markup, not on the signs.

I have already written that my approach allows you to select such bins (clusters) and labels that you can limit yourself to only 2-5 signs. And it is done in minutes.

 
Maxim Dmitrievsky #:
You can select the number of bins and the individual bins that outperforate the others. An analogue to clustering. So wrote before why not just use clustering.

And by what criterion of selection? I don't see such an option in the library..... but again - this is binarisation, only under the bonnet.

 
Aleksey Vyazmikin #:

And what are the criteria for selection? I don't see it in the library... but then again, it's binarisation, only under the bonnet.

By criterion on new data, after training.

 
Maxim Dmitrievsky #:

On criterion on new data, after training.

So where is the selection of multiple bins from the set, or did I misunderstand you?

 
Aleksey Vyazmikin #:

So where is the selection of multiple bins from the set, or did I misunderstand you?

Make a choice )) just choose a bin and trade only on it, what's the problem?

 

The problem of TC/legality search is reduced to a simple two-dimensional optimisation problem. This is dictated by the very nature of BP (two-dimensionality).

When 2 errors are minimised:

  1. classification error within a bin (buy/sell)
  2. error in determining the current bin
The end result = a suboptimal RT.

How exactly you solve this is a purely technical question.

And here, again, it is not the method of discretisation that plays an absolutely critical role (although it can also be important), but the way of marking transactions inside bins.

 
Maxim Dmitrievsky #:

Make a choice )) just pick a bin and trade only on it, what's the problem?

You can always do it, I just thought that there is a ready-made functionality - I was interested in the selection criteria.

And so - in essence - if you need clustering, you can use any clustering, for which you can then apply new data, just pull out the thresholds by clusters and record in a quantum table, which can already be fed into CatBoost. This will speed up the process as you don't have to re-count the clusters when experimenting.