Discussing the article: "Quantization in machine learning (Part 1): Theory, sample code, analysis of implementation in CatBoost"

 

Check out the new article: Quantization in machine learning (Part 1): Theory, sample code, analysis of implementation in CatBoost.

The article considers the theoretical application of quantization in the construction of tree models and showcases the implemented quantization methods in CatBoost. No complex mathematical equations are used.

So what is quantization and why is it used? Let's figure it out!

First, let's talk a little about the data. So, to create models (carry out training), we require data that is scrupulously collected in the table. The source of such data can be any information that can explain the target one (determined by the model, for example, a trading signal). Data sources are called differently - predictors, features, attributes or factors. The frequency of occurrence of a data line is determined by the occurrence of a comparable process observation of the phenomenon, about which information is being collected and which will be studied using machine learning. The totality of the data obtained is called a sample.

A sample can be representative - this is when the observations recorded in it describe the entire process of the phenomenon under study, or it can be non-representative when there is as much data as it was possible to collect, which allows only a partial description of the process of the phenomenon under study. As a rule, when we deal with financial markets, we are dealing with non-representative samples due to the fact that everything that could happen has not yet happened. For this reason, we do not know how the financial instrument will behave in case of new events (that have not occurred before). However, everyone knows the wisdom "history repeats itself". It is this observation that the algorithmic trader relies on in its research, hoping that among the new events there will be those that were similar to the previous ones, and their outcome will be similar with the identified probability.

Author: Aleksey Vyazmikin

 

Typos:

3. Сохранение таблиц квантования в указанный файл – ключ "--input-borders-file"

4. Loading quantisation tables from the specified file - key "--output-borders-file"

Reverse.

 
Quantisation in machine learning is not a quantum neural network (nor is quantum neural network training).
 
Stanislav Korotky #:

Typos:

Opposite.

Thank you!

 
Sergey Pavlov neural network learning).

Where is this asserted? Does the word "quantisation" seem to mislead and distort expectations?

 
Thanks for the article, interesting!
 
Andrey Dik #:
Thanks for the article, interesting!

Very excited about it!

 
Very interesting article! Can I add you as a friend? I am new to ML. I try to code models and save them in ONNX, but I get plum nonsense or just elementary memorisation of historical data(
 
Yevgeniy Koshtenko #:
Very interesting article!

Thank you!

Yevgeniy Koshtenko #:
Can I add you as a friend? I am new to ML. I try to code models and save them in ONNX, but I get plum nonsense or just elementary memorisation of historical data(

Added you, although anyone can write to me - there is no programme restriction.

Reason: