Machine learning in trading: theory, models, practice and algo-trading - page 1194
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
And then the learning algorithm is sharpened to break down loglosses by 0.5 - so it's kind of logical that there's a cluster of the main one.
logloss is almost useless to look at, it is a metric that does not tell you anything in terms of the breakdown of classes
the higher the probability of the event, the more accurate the signal, it kind of comes even from the definition :) 2 the hump will not be on noisy data, but the model should at least properly capture the extreme values, otherwise it is never sure about the inputs at all
I think it's not that clear-cut, you have to consider the learning function... because the probability is generally calculated after its indicators (in the model algorithm so).
So far the facts tell me that the smeared model is just not sure, and I've never encountered a failure in the center...
The logloss is almost useless to look at, it's an unimportant metric in terms of splitting into classes
I think that it is not so unambiguous, you have to consider the learning function... because the probability is generally calculated after its values.
So far the facts tell me that the smeared model is just not sure, and I haven't encountered a failure in the center yet...
I don't understand the terminology, what is the learning function? is there a softmax at the end or what?
I don't know about the failure, but the unstable model won't work with the new data for sure, while the smeared model will, if you set a probability threshold
I don't understand the terminology, what is the learning function? Is there a softmax at the end or what?
The model is evaluated by logloss, and all the gradient bouncing is aimed at improving the performance of this function. The model itself produces values that need to be transformed through a logistic function. That's why I admit that everything is not so simple in this method with probability...
There the model is evaluated by logloss and all the gradient boosting actions are aimed at improving the performance of this function. The model itself produces values, which must be transformed through the logistic function. That's why I assume that everything is not so simple in this method with probability...
there's min and max f-ions, they'll probably be at logit margins... if they're not there then it's underrun or something else (it always happens with me when there's underrun, like few neurons or trees) and a big misclassification and logloss
There are min and max f-ions, they will always be on the edges of the logit... if they are not there, it's an underfitting or whatever (I always have it when I'm underfitting, e.g. few neurons or trees) and a big error in classification and logloss
It's about these coefficients, what the model gives out https://en.wikipedia.org/wiki/Logit - there is not a linear distribution.
It seems to me that undertraining is better than overtraining, especially if you focus on class 1 and take a large percentage of correctly classified targets that fall under classification, and then you can combine models by limiting their range of application.
It's about those coefficients that the model gives out https://en.wikipedia.org/wiki/Logit - it's not a linear distribution.
It seems to me that undertraining is better than overtraining, especially if you focus on class 1 and take a large percentage of correctly classified targets that fall under the classification, and then you can combine the models, limiting their range of application.
To cut the long story short... I repeat: you have to teach properly, so there would be no bumps (overfits) and no cropped tails (underfits)
The red curve looks more or less normal, in my opinion
and underfitting is nothing at all... in the neighborhood of 0.5
The bias can be pulled out by Bayes, by conditional probabilities, while the model is running. I haven't figured it out yet, but there's some unknown power in it, intuitively
Bayesian models are capable of learning... what if you just put a Bayesian tip on the model so that it doesn't retrain too often... I haven't figured it out yet
Anyway... I'll say it again: you have to teach normally, so there would be no bumping (overfit) and cropped tails (underfit)
The red curve looks more or less normal, in my opinion.
and underfitting is nothing at all... in the neighborhood of 0.5
The bias can be pulled out by Bayes, by conditional probabilities, while the model is running. I haven't figured out exactly how, but it feels like some unknown power, intuitively
Bayesian models are capable of retraining... what if you just put a Bayesian tip on the model, so that you don't have to retrain often... I haven't figured it out yet
Yeah I like the red one better too - like normal distribution and all that, but so far on 512 models this distribution loses by eye... Soon there will be many models of about 100000 - I'll see what they show... theory and practice sometimes don't add up - I need to adapt, or I could put my teeth on the shelf like that...
Catbust is exactly Bayesian and supports pre-learning, but I don't know, infinitely add trees - it looks like fitting...
Adding trees is somehow strange, without reorganizing the whole structure... or maybe it's okay, it's hard to say... for some small perspective, it seems to be okay, just to shift the center of mb
And how else can you get to learn - in boosting, as I understand it is the only option. You could of course throw out the last third of the model, a third of the trees, and see what comes out when you feed in new data. But, I'm thinking about nulling leaves with unimportant "probabilities" - cleaning up from noise, so to speak. In general I'm thinking about automation of ensembles gathering from models, found good interval of model's predictive ability - trimmed classification on it (for example from 0.7 to 0.8) and put in blanks for combinations between other models.