Machine learning in trading: theory, models, practice and algo-trading - page 3283
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
I wonder, what if a matrix is computed and the same matrix is computed by fast Algibov algorithm PearsonCorrM. Who would be faster.
PearsonCorrM is 40-50 times faster than Algibov's line-by-line algorithm, probably even a fast homemade algorithm will not overcome such a speed gap.
Here is twice the lag of the homemade from PearsonCorrM.
I compared the speed of CatBoost training in python and through the command line:
- 20% faster from startup to saving the models, including reading the samples
- 12% faster the learning process itself
Tested on the same model - the training result is identical.
Of course, the command line is faster.
- 20% faster from startup to saving models, including reading samples
- 12% faster the learning process itself
Tested on the same model - the training result is identical.
Of course, the command line is faster.
Do you still use the command line to run EXEs?
You can run them via WinExec and even optimise them in the tester.
I haven't tried Catboost, but I think you can do something.
Another possible problem, if the model file is huge, it may take a long time to write, ie. The file already exists, but not written to the end. It may be necessary to pause the file or check that the file is not closed from reading by the writing process....The most difficult thing is to determine the moment when the training is over. I do it by checking the appearance of the model file in the folder agent_dir+"\model.bin" 1 time per second. Where Catboost puts the model file - I don't know, I may have to look for it elsewhere.
Do you still use the command line to run EXEs?
You can run them through WinExec and even optimise them in the tester.
I run bat files - so far it is the most convenient option for current tasks.
So far there is only one task for automatic launching of training after getting a sample - reconnaissance search. I plan to make an automatic bat file for it later.
I haven't tried Catboost, but I think we can think of something.
The most difficult thing is to determine the moment of the end of training. I do it by checking the appearance of the model file in the folder agent_dir+"\model.bin" 1 time per second. Where Catboost puts the model file - I don't know, I may have to look for it elsewhere.
Another possible problem, if the model file is huge, it may take a long time to write, ie. The file already exists, but not written to the end. You may need to pause the file or check that it is not closed from reading by the writing process....Your solution is interesting - I'll look, maybe I'll take it for application.
But I think there is a possibility to get an end response from the executable programme - I wanted to use this approach.
Option with files - if you run a bootnik, then again - you can just create a new file at the end of the task.
Maybe the relative success rate of the -1 and 0 variants in the sample size of the train, and it should be reduced? In general, it seems that Recall reacts to this.
In your opinion, should the results of such combinations be comparable to each other in our case? Or is the data irretrievably outdated?
I split the train sample into 3 parts, trained 3 sets for variant -1 and variant 0, and also trained only the original train as three samples.
This is what I got.
I made this generalisation PR=(Precision-0.5)*Recall
It seems that the training happens at the expense of the 2nd and 3rd parts of the train sample - now I have combined them and run the training - let's see what happens.
Still, it looks like this might not be a bad method for estimating the randomness of training. Ideally, the training should be relatively successful at each segment, otherwise there is no guarantee that the model will simply stop working tomorrow.
It looks like training is happening at the expense of the 2nd and 3rd part of the sample train - I've now combined them and run the training - we'll see what happens.
Still, it looks like this might not be a bad method for assessing the randomness of training. Ideally, the training should be relatively successful at each segment, otherwise there is no guarantee that the model will simply stop working tomorrow.
And here are the results - the last two columns
Indeed, the results have improved. We can make an assumption that the larger the sample, the better the training result will be.
We should try training on the 1st and 2nd parts of the training sample - and if the results are not much worse than on the 2nd and 3rd parts, then the factor of sample freshness can be considered less significant than the volume.
I have written many times about the "predictive power of predictors". which is calculated as the distance between two vectors.
I came across a list of tools for calculating the distance:
This is besides the standard one, which has its own set of distances
And here's the result - the last two columns
Indeed, the results have improved. We can make an assumption that the larger the sample, the better the training result will be.
It is necessary to try to train on 1 and 2 parts of the training sample - and if the results are not much worse than on 2 and 3 parts, then the factor of sample freshness can be considered less significant than the volume.
I have written many times about the "predictive power of predictors". which is calculated as the distance between two vectors.
I came across a list of tools for calculating the distance:
This is besides the standard one, which has its own set of distances
Can you show me an example of how to use it?
Cuto, we're back to not knowing the matstat
.
This is your hour - shine your knowledge - expose the ignoramus!
And seriously, you'd better think about the results - I publish for free, although it doesn't cost me much.