Machine learning in trading: theory, models, practice and algo-trading - page 2032
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
For the forest, it is possible to see the importance and clusters. In catbust it is probably plot_tree.
I will prepare the data and post it.
I made a test version for 6 columns, it took 11GB. Notepad++ couldn't open it, says the file is too big. BD Browser for SQLite has been hanging for about 20 minutes.Show me a picture of what tree clusters look like, I don't understand what we are talking about yet.
Why open it? :) I'm just making a mini copy with a similar structure for debugging.
I wonder how they train trees without taking all the data into memory. If the table is 6 gigabytes, then about 6 gigabytes of memory should be used. The tree needs to sort each column as a whole. If you don't take everything into memory, but read the data from disk every time, it will be too slow.
The only option is to keep data in memory in float type instead of double, but this will reduce accuracy. For us with 5 digits of accuracy, it may not be too bad, but catbust universal software, I think the physical and mathematical problems should be solved in double precision.
I have not studied the code, but if you think about it logically, CatBoost probably converts the sample data table and stores exactly the converted version, which is amenable to fast compression.
Transformation takes place on a grid of quantization of each predictor (6 algorithms are available) and, for example, 32 variants remain from 1000 different values of a variable (falling into the range of quantization grid) and such data vector is easily compressed, besides figures are integers only (judging by restrictions of grid size - type ushort data - 2 bytes per number), and the quantization table simply lies in memory and is used later at model creation in code. Here already decently reduced the size, and further it is possible to reduce volume at the expense of selection not all predictors for an estimation, and their part that is recommended to do at the big samples, thus the algorithm of randomizer allows to use in other trees those predictors which have not got in "bag" at once, due to this training will also be faster, but in the model there will be more trees. Surely there are other tricks, but the main one is quantization.
Quantization here in general should be paid separate attention, ideally for each predictor to pick up its own grid and submit already to the training data together with the grid, it allows the algorithm to do.
Therefore, the sample can be self-quantized and it will be well compressed.
I have not studied the code, but if you think logically, CatBoost probably transforms the sample data table and stores exactly the transformed variant, which is amenable to fast compression.
Transformation takes place on a grid of quantization of each predictor (6 algorithms are available) and, for example, 32 variants remain from 1000 different values of a variable (falling into the range of quantization grid) and such data vector is easily compressed, besides figures are integers only (judging by restrictions of grid size - type ushort data - 2 bytes per number), and the quantization table simply lies in memory and is used later at model creation in code. Here already decently reduced the size, and further it is possible to reduce volume at the expense of selection not all predictors for an estimation, and their part that is recommended to do at the big samples, thus the algorithm of randomizer allows to use in other trees those predictors which have not got in "bag" at once, due to this training will also be faster, but in the model there will be more trees. Surely there are other tricks, but the main one is quantization.
Quantization here in general should be paid separate attention, ideally for each predictor to pick up its own grid and submit already to the training data together with the grid, it allows the algorithm to do.
That's why the sample can be quantized by itself and it will be well compressed.
Now I understand what this grid is used for, it turns out for compression. And it is faster to sort it.
Right, I remembered, it seems to use a default grid of 1024 split variants. If you replace all the data with split numbers, you can store in ushort format, which is 2 bytes instead of 8, 4 times compression. That's probably why you used 2 gb instead of 6.
The default is 254, but I don't remember how much I set then. Maybe less than 2 gb of memory was consumed - I remember being very surprised that it was so little.
Anyway, this approach allows to significantly compress data, even the sampling itself. You can't do that with neural networks
Now I understand what this grid is used for, it turns out for compression. And it is faster to sort it.
The default is 254, but I don't remember how much I set then. Probably less than 2 gb of memory was consumed - I remember being very surprised that it was so small.
You could use uchar then, it's 1 byte.
The grid allows to fit less, because a range of values is used. But it won't always be good. For catching theoretical levels I think you have to cut the grid yourself.
The maximum value in ushort is 65 thousand - if you set such a grid, you don't have to bother with it manually.
https://www.mql5.com/ru/articles/8385
not a fact that there is a good implementation )
I'll pass on the Russian one.
I read this one))) it has a lot of errors in calculations and the network gives more like random answers
Then you can use uchar , it is 1 byte.
In ushort the maximum value is 65 thousand - if you set such a grid, you can manually not bother
The maximum size they have is65535, but I can't affect the variables in the code.
And about the maximum size - no, it does not guarantee the result, because there may be a fit to the data strip and skipping nearby.
In general it would be nice to have a learning algorithm, which always checks reasonability of closing split range (A>10 && A<=15), but usually it happens randomly - there is no such mandatory condition, although it is reproduced sometimes.
I read this one))) there are a lot of errors in the calculations and the network gives rather random answers
Do you want to write the network yourself?
Here you have a minimum of words and maximum code in python, but also English.
https://datascience-enthusiast.com/DL/Building_a_Recurrent_Neural_Network-Step_by_Step_v1.html
Do you want to write the network yourself?
Here there is a minimum of words and a maximum of code in python, but also English.
https://datascience-enthusiast.com/DL/Building_a_Recurrent_Neural_Network-Step_by_Step_v1.html
Yes, I want to do it myself) because in all articles, examples activation functions are not counted correctly) for example sigmoid everywhere is counted as 1/1+exp(-x). And it needs 1.0/1.0+exp(-x). It seems to be the same, but the terminal gives different calculations) see if you have the same calculations) hence the error.