Machine learning in trading: theory, models, practice and algo-trading - page 3553

 

Quantum segment is a bin in essence in the concept of discretisation.

But again, if we translate this bin into Russian, we will get the meaning of capacity, in which digital values are placed, in essence a more primitive concept than I proposed. What is better "capacity" (Yandes suggests to consider it as a bin at all, and CB developers use the term "basket") or "quantum segment"?

And this is how ChatGPT suggests to translate it:

"

In the context of data processing and analysis, the term "bin" is most often translated as "interval" or "basket". Both terms are used to refer to groups of values into which a continuous range of data is divided.

Possible translations:

  1. Interval:

    • This is the most technically accurate translation. It emphasises that it refers to dividing a range of values into intervals.
  2. Basket:

    • This term is less formal and can be used in contexts where it is important to emphasise that meanings are grouped or 'gathered' together.

Examples of usage:

  • Interval:

    • "We divided the data into 10 intervals to analyse the distribution."
    • "The histogram shows the frequency of values in each interval."
  • Basket:

    • "We grouped the data into baskets to make it easier to analyse."
    • "Each basket contains values in a specific range."

Application options in different contexts:

  1. In the context of histograms and statistical analysis:
    • "Interval" is more commonly used when referring to histograms, where data is divided into intervals to show the distribution of values.
  2. In the context of machine learning algorithms:
    • "Basket" may be more appropriate when talking about grouping data to prepare inputs for algorithms.

Examples:

  • If you are using the scikit-learn library and the KBinsDiscretizer function , you might say, "The KBinsDiscretizer function breaks continuous data into multiple intervals."
  • If you're creating a histogram, "A histogram breaks data into bins to show the distribution of values."

Both are valid terms, and the choice between them depends on context and preference.

"

Interval or segment are essentially similar concepts.

 
Here's an explanation from physics
h ttps://physics. stackexchange.com/questions/206790/difference-between-discretization-and-quantization-in-physics


I still incline to the fact that in your case it is better to call it discretisation, and in case of information compression - quantisation. Actually in MO this is roughly the meaning, quantisation is a synonym for compression. And discretisation is division of a continuous quantity into bins.

So I have irrational (unrecognised) revolt against your definitions :)
 
Maxim Dmitrievsky #:
Here's an explanation from physics

A good example of people who are proficient in a topic not being able to come to a consensus.

The discussion ends with "And to answer the question "why do we use the name "quantum mechanics" instead of "discrete mechanics"? it's probably because the Germans discovered it...". " :)

Maxim Dmitrievsky #:

I still incline to the fact that in your case it is better to call it discretisation, and in case of information compression - quantisation. Actually in MO this is approximately the meaning, quantisation is a synonym of compression. And discretisation is division of a continuous quantity into bins.

No, it's all about the same thing. By dividing into bins you lose information, which is compression.

In general, such an idea after physicists that quantisation is division into bins with meaning, not just uniformly or according to another formula.

If it is closer to the truth, then just my task is that would find such ranges in which different logic of information on content, i.e. there is a shift of probability, so quantisation is closer to me than just discretisation.

 
Aleksey Vyazmikin #:

A good example of people profiled on a topic not being able to come to a consensus.

The discussion ends with: " And to answer the question "why do we use the name "quantum mechanics" instead of "discrete mechanics"? it's probably because the Germans discovered it...". " :)

Nah, it's all about one thing. By dividing into bins you lose information, which is what compression is.

In general, such an idea after physicists that quantisation is division into bins with meaning, not just uniformly or according to another formula.

If it is closer to the truth, then just my task is that would find such ranges in which different logic of information on content, i.e. there is a shift of probability, so quantisation is closer to me than just discretisation.

Well in signal processing there is such a thing:

https://www.audiolabs-erlangen.de/resources/MIR/FMP/C2/C2S2_DigitalSignalQuantization.html


They say that quantisation is a 2-stage discretisation :) first by X, then by Y

 

But there is a feeling that they confuse sampling with discretisation. That is, discretisation by X is sampling, and discretisation by Y is quantisation.

But if we split into bins (discretisation), we automatically get Y levels.

 

Well from this picture it follows that quantisation is discretisation by amplitude and frequency. Okay. But that's not what you have :) You have only amplitude.

Well, let's assume that you do pre-sampling from a continuous value, and only then do sampling by amplitude. Then it's really quantisation.

Okay, got it.

 
So where did you end up? Does it give you a decent model now, or is it just so-so? What's the problem?
Your bots are on a different logic, on MO-ha there is no TC there?
 
Aleksey Vyazmikin #:

No, it's all about one thing. By dividing into bins you lose information, which is compression.

Don't give me that shit.
There are lossless compression algorithms
 
Maxim Dmitrievsky #:
So where did you end up?

I haven't stopped yet. The concept is that you should take only useful information from the predictor, then do binarisation, and then build a model on this data. But, there is a problem of small responses, super sparse sampling, training on which is difficult for standard models. The alternative is clustering of these binary predictors, for this purpose I wrote the code of the clustering tree, but so far I have put the development on pause. Because the main problem is that the selected quantum segments lose their efficiency on new data in large numbers, which leads to errors in classical models. So now concentrated on increasing the collection of the percentage of effective quantum segments.

How to measure efficiency is also an open question, but I assume that a quantum cutoff should include more members of the same class than the sample average. Probability bias and means that the percentage of representatives of class 1 or 0 is greater in the quantum segment by a threshold value than in the subsample.

Thus, if we have a set of quantum segments with probability bias, we can build both new rules and ensembles, combining quantum segments into groups by probability of synchronous triggering, which in theory should add confidence to the model.

Maxim Dmitrievsky #:
Does it give now norm models, or such a self?

Even fitting a quantum table to a predictor can improve learning.

So far I do not build final models using this method, I am not satisfied with the selection of quantum segments.

And so, models on binary sample are more easy at catboost, not inferior to those on full data, but again there is no guarantee that the model will be profitable, but it is understandable - after all, the problem is in the shift of probability on new data....

Maxim Dmitrievsky #:
What is the problem?

Apart from the main problem, there is a production problem - you need to think and code :)

Lately, unsuccessful ideas, after testing them, knock me out of my rut for a few days, sometimes weeks. It's still summer now - I try to go for walks in the park more often.

Maxim Dmitrievsky #:
Your bots are based on a different logic, there is no TC there?

There, in fact, a similar approach is used: a database of effective single settings of different filters/predictors is created, and then they are randomly selected (not all are used at once) with certain settings. This approach saves a lot of resources and the result is quite good when there are hundreds of settings to optimise. Essentially the same approach as with quantisation.

Most of the predictors I use in MO are based on the logic of that EA.

For MO, perhaps, I will mass produce bots with low price, but a little later.