Machine learning in trading: theory, models, practice and algo-trading - page 2032

 
Rorschach:

For the forest, it is possible to see the importance and clusters. In catbust it is probably plot_tree.

I will prepare the data and post it.

I made a test version for 6 columns, it took 11GB. Notepad++ couldn't open it, says the file is too big. BD Browser for SQLite has been hanging for about 20 minutes.

Show me a picture of what tree clusters look like, I don't understand what we are talking about yet.

Why open it? :) I'm just making a mini copy with a similar structure for debugging.

 
elibrarius:

I wonder how they train trees without taking all the data into memory. If the table is 6 gigabytes, then about 6 gigabytes of memory should be used. The tree needs to sort each column as a whole. If you don't take everything into memory, but read the data from disk every time, it will be too slow.
The only option is to keep data in memory in float type instead of double, but this will reduce accuracy. For us with 5 digits of accuracy, it may not be too bad, but catbust universal software, I think the physical and mathematical problems should be solved in double precision.

I have not studied the code, but if you think about it logically, CatBoost probably converts the sample data table and stores exactly the converted version, which is amenable to fast compression.

Transformation takes place on a grid of quantization of each predictor (6 algorithms are available) and, for example, 32 variants remain from 1000 different values of a variable (falling into the range of quantization grid) and such data vector is easily compressed, besides figures are integers only (judging by restrictions of grid size - type ushort data - 2 bytes per number), and the quantization table simply lies in memory and is used later at model creation in code. Here already decently reduced the size, and further it is possible to reduce volume at the expense of selection not all predictors for an estimation, and their part that is recommended to do at the big samples, thus the algorithm of randomizer allows to use in other trees those predictors which have not got in "bag" at once, due to this training will also be faster, but in the model there will be more trees. Surely there are other tricks, but the main one is quantization.

Quantization here in general should be paid separate attention, ideally for each predictor to pick up its own grid and submit already to the training data together with the grid, it allows the algorithm to do.

Therefore, the sample can be self-quantized and it will be well compressed.

 
Aleksey Vyazmikin:

I have not studied the code, but if you think logically, CatBoost probably transforms the sample data table and stores exactly the transformed variant, which is amenable to fast compression.

Transformation takes place on a grid of quantization of each predictor (6 algorithms are available) and, for example, 32 variants remain from 1000 different values of a variable (falling into the range of quantization grid) and such data vector is easily compressed, besides figures are integers only (judging by restrictions of grid size - type ushort data - 2 bytes per number), and the quantization table simply lies in memory and is used later at model creation in code. Here already decently reduced the size, and further it is possible to reduce volume at the expense of selection not all predictors for an estimation, and their part that is recommended to do at the big samples, thus the algorithm of randomizer allows to use in other trees those predictors which have not got in "bag" at once, due to this training will also be faster, but in the model there will be more trees. Surely there are other tricks, but the main one is quantization.

Quantization here in general should be paid separate attention, ideally for each predictor to pick up its own grid and submit already to the training data together with the grid, it allows the algorithm to do.

That's why the sample can be quantized by itself and it will be well compressed.

Right, I remember, I think there's a default grid of 1024 split variants. If you replace all the data with split numbers, you can store in ushort format, which is 2 bytes instead of 8, 4 times compression. That's probably why you used 2 gb instead of 6.
Now I understand what this grid is used for, it turns out for compression. And it is faster to sort it.
Документация по MQL5: Основы языка / Типы данных / Целые типы / Типы char, short, int и long
Документация по MQL5: Основы языка / Типы данных / Целые типы / Типы char, short, int и long
  • www.mql5.com
Целый тип char занимает в памяти 1 байт (8 бит) и позволяет выразить в двоичной системе счисления 2^8 значений=256. Тип char может содержать как положительные, так и отрицательные значения. Диапазон изменения значений составляет от -128 до 127. uchar # Целый тип uchar также занимает в памяти 1 байт, как и тип char, но в отличие от него, uchar...
 
elibrarius:
Right, I remembered, it seems to use a default grid of 1024 split variants. If you replace all the data with split numbers, you can store in ushort format, which is 2 bytes instead of 8, 4 times compression. That's probably why you used 2 gb instead of 6.

The default is 254, but I don't remember how much I set then. Maybe less than 2 gb of memory was consumed - I remember being very surprised that it was so little.

Anyway, this approach allows to significantly compress data, even the sampling itself. You can't do that with neural networks

elibrarius:
Now I understand what this grid is used for, it turns out for compression. And it is faster to sort it.
The grid also allows less fitting, because a range of values is used. But it will not always be good - to catch theoretical levels, I think you have to cut the grid yourself.
 
Aleksey Vyazmikin:

The default is 254, but I don't remember how much I set then. Probably less than 2 gb of memory was consumed - I remember being very surprised that it was so small.

You could use uchar then, it's 1 byte.

Aleksey Vyazmikin:
The grid allows to fit less, because a range of values is used. But it won't always be good. For catching theoretical levels I think you have to cut the grid yourself.


The maximum value in ushort is 65 thousand - if you set such a grid, you don't have to bother with it manually.

 
Maxim Dmitrievsky:

https://www.mql5.com/ru/articles/8385

not a fact that there is a good implementation )

I'll pass on the Russian one.

I read this one))) it has a lot of errors in calculations and the network gives more like random answers

 
elibrarius:

Then you can use uchar , it is 1 byte.


In ushort the maximum value is 65 thousand - if you set such a grid, you can manually not bother

The maximum size they have is65535, but I can't affect the variables in the code.

And about the maximum size - no, it does not guarantee the result, because there may be a fit to the data strip and skipping nearby.

In general it would be nice to have a learning algorithm, which always checks reasonability of closing split range (A>10 && A<=15), but usually it happens randomly - there is no such mandatory condition, although it is reproduced sometimes.

 
Alexander Alexeevich:

I read this one))) there are a lot of errors in the calculations and the network gives rather random answers

Do you want to write the network yourself?

Here you have a minimum of words and maximum code in python, but also English.

https://datascience-enthusiast.com/DL/Building_a_Recurrent_Neural_Network-Step_by_Step_v1.html

Building a Recurrent Neural Network - Step by Step - v1
  • datascience-enthusiast.com
Building your Recurrent Neural Network - Step by Step
 
Maxim Dmitrievsky:

Do you want to write the network yourself?

Here there is a minimum of words and a maximum of code in python, but also English.

https://datascience-enthusiast.com/DL/Building_a_Recurrent_Neural_Network-Step_by_Step_v1.html

For example the sigmoid function is always assumed to be 1/1+exp(-x). And it needs 1.0/1.0+exp(-x). It seems to be the same, but the terminal gives different calculations) see if you have the same calculations) hence the error.
 
Alexander Alexeyevich:
Yes, I want to do it myself) because in all articles, examples activation functions are not counted correctly) for example sigmoid everywhere is counted as 1/1+exp(-x). And it needs 1.0/1.0+exp(-x). It seems to be the same, but the terminal gives different calculations) see if you have the same calculations) hence the error.
Writing neural networks in the terminal is not an option at all. There any function may suddenly work in a different way than expected. Use ready and tested