Machine learning in trading: theory, models, practice and algo-trading - page 3552

 
Aleksey Vyazmikin #:

That's good to hear.

There seems to be an unconscious internalisation going on :)

I don't know what was so difficult to verbalise?

Why is it pleasant? ))

there is no internalisation, because these words already mean something else in your head :) memory cells are already reserved

 
Maxim Dmitrievsky #:
Why nice? ))

Because I've been saying for a long time that CV with our data doesn't have the same effect as on representative samples, and now that we've realised that, there will be less controversy further down the line, which is nice.

Maxim Dmitrievsky #:
is not learnt because these words already mean something else in your head :) memory cells are already reserved

Let's take such a term as an example.

 
Aleksey Vyazmikin #:

Because I've been saying for a long time that CV with our data doesn't have the same effect as on representative samples, and now that they've realised that, there's less controversy further down the line, which is nice.

Let's take a term like this as an example.

Well, maybe in the current application it is not very good... I even know why, it takes a long time to explain.

Let's do it after the weekend.

 
Aleksey Vyazmikin #:

Because I've been saying for a long time that CV with our data doesn't have the same effect as on representative samples, and now that they've realised that, there's less controversy further down the line, which is nice.

Let's take a term like this as an example.

It's kind of your topic

Demonstrating the different strategies of KBinsDiscretizer
Demonstrating the different strategies of KBinsDiscretizer
  • scikit-learn.org
Demonstrating the different strategies of KBinsDiscretizer# This example presents the different strategies implemented in KBinsDiscretizer: ‘uniform’: The discretization is uniform in each feature, which means that the bin widths are constant in each dimension. quantile’: The discretization is done on the quantiled values, which means that each...
 
Maxim Dmitrievsky #:

It's kind of your thing.

Well such a thing, a couple of off-the-shelf features - yes I use something like that.

Thanks for thinking about my needs.

 
Aleksey Vyazmikin #:

Well that kind of thing, a couple of off-the-shelf functions - yes I use something like that.

Thanks for thinking about my needs.

Yay, we found the right terminology for you, now we can find common ground.

You're doing cabin discretisation, not quantise cuts
 
Now, let's press on the differences:

K-bin discretisation and clustering are two different techniques in data science and machine learning which are used for different purposes. The following are the key differences between them:

1. Purpose:

- K-bin discretisation: the purpose of this technique is to convert continuous features into discrete or categorical variables. It is used to simplify data analysis, reduce dimensionality and create more manageable features.

- Clustering: the purpose of clustering is to group data objects into clusters based on their similarities. It is used to discover hidden structures or patterns in the data, discover communities or segments and classify objects based on their characteristics.

2. Data applied:

- K-bin discretisation: this technique is applied to continuous features or variables. It decomposes the range of values of these features into bins or intervals.

- Clustering: clustering is applied to sets of data objects that may have both continuous and discrete features. It groups the objects themselves rather than transforming the features.

3. Number of groups:

- K-bin discretisation: the number of bins (K) is specified by the user or determined using certain methods. Each observation belongs to one bin.

- Clustering: the number of clusters (K) can also be user-defined or determined by methods such as the elbow method or silhouettes. Each observation can be assigned to one or more clusters depending on the clustering method used.

4. Cluster membership:

- K-bin discretisation: each observation is unambiguously assigned to one bin based on its value.

- Clustering: cluster membership can be fuzzy or probabilistic depending on the method used. An observation may have a higher probability of belonging to one cluster than another, or may belong to more than one cluster at the same time.

5. Interpretation:

- K-bin discretisation: bins usually have specific ranges of values, making their interpretation more direct and related to the original feature values.

- Clustering: clusters may not have clear boundaries or interpretations related to the original features. They represent groups of objects that are similar to each other, but the specific value or range of values may not have a direct meaning.

6. Algorithms:

- K-bin discretisation: usually uses uniform partitioning of a range of values or methods based on statistical criteria such as quantiles or standard deviation.

- Clustering: there are many clustering algorithms including k-means, hierarchical clustering, DBSCAN, expectation-maximising clustering (EM) and others. Each algorithm uses different methods to identify clusters.

In general, K-bin discretisation is used to transform continuous features while clustering is used to group data objects based on their similarity. They serve different purposes and are used in different data analysis and machine learning scenarios.
 
Maxim Dmitrievsky #:
cabin discretisation
KBinsDiscretizer is a class from the scikit-learn library, not some established term.

Binarisation is one of the final variants of the whole process.

Quantisation and discretisation are related but not identical concepts that are often used in data processing.

Quantisation

Quantisation is the process of converting continuous values into discrete values, often in the context of digital signal processing. In quantisation, a continuous signal or data is divided into defined levels and each level is assigned a fixed value. This process is used, for example, when converting an analogue signal into a digital signal.

Examples of quantisation:

  • Converting audio signals to digital format when recording audio.
  • Converting an image from a continuous spectrum of colours to a limited number of colour levels.

Discretisation

Discretisation is a broader term that includes quantisation, but can also refer to dividing data into separate categories or bins. Discretisation is used to convert continuous data into categorical or discrete intervals.

Examples of discretisation:

  • Separating people's ages into categories (e.g., "young," "middle-aged," "elderly").
  • Converting continuous income values into categories ("low", "medium", "high").

Comparison

  1. Quantisation:

    • More commonly used in the context of signal and image processing.
    • Converts continuous data into discrete levels.
    • Each level represents a fixed value.
  2. Discretisation:

    • A more general term that includes quantisation.
    • Used to convert continuous data into categorical or discrete intervals.
    • Each interval can represent a category or bin, which can be encoded in different ways.

In the context of machine learning and data analytics, discretisation is often used to prepare data before using it in models, especially if the models perform better with categorical features. Thus, we can say that quantisation is a form of discretisation specialised for processing signals and data with fixed levels.

 

The term "quantum cutoff" is not a standard or widely used term in science or engineering. However, it can be assumed that it may refer to the context of quantisation in data or signal processing. In such a case, a "quantum cutoff" can be interpreted as an interval or range of values that is assigned to a particular quantum level during the quantisation process.

Quantisation and quantum levels

In the process of quantising a continuous signal or data, the range of possible values is broken down into discrete levels called quantum levels. Each quantum level corresponds to a specific range of values of the original continuous signal.

Quantum cutoff

If we speak of a "quantum cutoff" in this context, it is:

  • Aninterval of values: A range of values that refers to a single quantum level. For example, if we have a continuous range of values from 0 to 10, and we break it down into 5 quantum levels, then each quantum interval could be a range of 0 to 2, 2 to 4, and so on.
  • Quantisation width: The difference between the upper and lower boundary of a quantum segment. In the example above, the quantisation width of a quantum segment is 2.

Example

Let's consider an example with quantisation of continuous data in the interval [0, 10] into 5 quantum levels:

  1. Range 0-2: This is the first quantum segment.
  2. Range2-4: This is the second quantum segment.
  3. Range4-6: This is the third quantum interval.
  4. Range6-8: This is the fourth quantum interval.
  5. Range8-10: This is the fifth quantum cutoff.

Each of these quantum intervals will be assigned a corresponding quantum level, which can be represented, for example, by the numbers 1, 2, 3, 4, and 5.

Thus, a "quantum segment" can be considered as an interval of values, which is converted into a certain discrete level in the process of quantisation.

 

The above is almost all ChatGPT. I publish it to the fact that since the model understands everything correctly, the term occurs in this context.

I don't mind if you use similar terms to decode my messages, but it doesn't mean that I will change mine - I wrote about it in my articles - you would have read and understood it long ago.