Machine learning in trading: theory, models, practice and algo-trading - page 1194

 
And then the learning algorithm is sharpened on the breakdown of loglosses by 0.5 - so it is kind of logical that there is a cluster of the main.
 
Aleksey Vyazmikin:
And then the learning algorithm is sharpened to break down loglosses by 0.5 - so it's kind of logical that there's a cluster of the main one.

logloss is almost useless to look at, it is a metric that does not tell you anything in terms of the breakdown of classes

 
Maxim Dmitrievsky:

the higher the probability of the event, the more accurate the signal, it kind of comes even from the definition :) 2 the hump will not be on noisy data, but the model should at least properly capture the extreme values, otherwise it is never sure about the inputs at all

I think it's not that clear-cut, you have to consider the learning function... because the probability is generally calculated after its indicators (in the model algorithm so).

So far the facts tell me that the smeared model is just not sure, and I've never encountered a failure in the center...

Maxim Dmitrievsky:

The logloss is almost useless to look at, it's an unimportant metric in terms of splitting into classes

There is a gradient descent on it...
 
Aleksey Vyazmikin:

I think that it is not so unambiguous, you have to consider the learning function... because the probability is generally calculated after its values.

So far the facts tell me that the smeared model is just not sure, and I haven't encountered a failure in the center yet...

I don't understand the terminology, what is the learning function? is there a softmax at the end or what?

I don't know about the failure, but the unstable model won't work with the new data for sure, while the smeared model will, if you set a probability threshold

 
Maxim Dmitrievsky:

I don't understand the terminology, what is the learning function? Is there a softmax at the end or what?

The model is evaluated by logloss, and all the gradient bouncing is aimed at improving the performance of this function. The model itself produces values that need to be transformed through a logistic function. That's why I admit that everything is not so simple in this method with probability...

 
Aleksey Vyazmikin:

There the model is evaluated by logloss and all the gradient boosting actions are aimed at improving the performance of this function. The model itself produces values, which must be transformed through the logistic function. That's why I assume that everything is not so simple in this method with probability...

there's min and max f-ions, they'll probably be at logit margins... if they're not there then it's underrun or something else (it always happens with me when there's underrun, like few neurons or trees) and a big misclassification and logloss

 
Maxim Dmitrievsky:

There are min and max f-ions, they will always be on the edges of the logit... if they are not there, it's an underfitting or whatever (I always have it when I'm underfitting, e.g. few neurons or trees) and a big error in classification and logloss

It's about these coefficients, what the model gives out https://en.wikipedia.org/wiki/Logit - there is not a linear distribution.

It seems to me that undertraining is better than overtraining, especially if you focus on class 1 and take a large percentage of correctly classified targets that fall under classification, and then you can combine models by limiting their range of application.

Logit - Wikipedia
Logit - Wikipedia
  • en.wikipedia.org
In deep learning, the term logits layer is popularly used for the last neuron layer of neural network for classification task which produces raw prediction values as real numbers ranging from [3]. If p is a probability, then is the corresponding odds; the logit of the probability is the logarithm of the odds, i.e. logit ⁡ ( p ) = log ⁡...
 
Aleksey Vyazmikin:

It's about those coefficients that the model gives out https://en.wikipedia.org/wiki/Logit - it's not a linear distribution.

It seems to me that undertraining is better than overtraining, especially if you focus on class 1 and take a large percentage of correctly classified targets that fall under the classification, and then you can combine the models, limiting their range of application.

To cut the long story short... I repeat: you have to teach properly, so there would be no bumps (overfits) and no cropped tails (underfits)

The red curve looks more or less normal, in my opinion

and underfitting is nothing at all... in the neighborhood of 0.5

The bias can be pulled out by Bayes, by conditional probabilities, while the model is running. I haven't figured it out yet, but there's some unknown power in it, intuitively

Bayesian models are capable of learning... what if you just put a Bayesian tip on the model so that it doesn't retrain too often... I haven't figured it out yet

 
Maxim Dmitrievsky:

Anyway... I'll say it again: you have to teach normally, so there would be no bumping (overfit) and cropped tails (underfit)

The red curve looks more or less normal, in my opinion.

and underfitting is nothing at all... in the neighborhood of 0.5

The bias can be pulled out by Bayes, by conditional probabilities, while the model is running. I haven't figured out exactly how, but it feels like some unknown power, intuitively

Bayesian models are capable of retraining... what if you just put a Bayesian tip on the model, so that you don't have to retrain often... I haven't figured it out yet

Yeah I like the red one better too - like normal distribution and all that, but so far on 512 models this distribution loses by eye... Soon there will be many models of about 100000 - I'll see what they show... theory and practice sometimes don't add up - I need to adapt, or I could put my teeth on the shelf like that...

Catbust is exactly Bayesian and supports pre-learning, but I don't know, infinitely add trees - it looks like fitting...

 
Maxim Dmitrievsky:

Adding trees is somehow strange, without reorganizing the whole structure... or maybe it's okay, it's hard to say... for some small perspective, it seems to be okay, just to shift the center of mb

And how else can you get to learn - in boosting, as I understand it is the only option. You could of course throw out the last third of the model, a third of the trees, and see what comes out when you feed in new data. But, I'm thinking about nulling leaves with unimportant "probabilities" - cleaning up from noise, so to speak. In general I'm thinking about automation of ensembles gathering from models, found good interval of model's predictive ability - trimmed classification on it (for example from 0.7 to 0.8) and put in blanks for combinations between other models.