Machine learning in trading: theory, models, practice and algo-trading - page 552

 
Maxim Dmitrievsky:
so I don't know what to believe in this life...everything has to be double-checked

+1 - I first did it without removing the outliers, and got strong shifts of the center (or zero) of the data. After removing the outliers, everything became more stable.

 
Maxim Dmitrievsky:

So it's no longer jPredictor at last? :)


Why exactly with him.... It builds better models than all those libraries and so on. Because it shakes the data thoroughly...

 
Mihail Marchukajtes:

Why him?.... It builds better models than all the libraries, etc. Because it shakes up the data thoroughly...


but it takes a long time... :)

 
elibrarius:

+1 - I initially did it without removing the outliers, and got strong shifts of the center (or zero) of the attitude data. After removing the outliers everything became more stable.


Well, yes there is sometimes a problem with the shift because of normalization, so I have all the inputs through the logarithms of increments and there the center will always be at zero if they need to be normalized

but in general, the spikes are removed more so that the NS would not greatly shift the weights on the anomalies... and the scaffold, for example, do not care about them at all

 
Maxim Dmitrievsky:

But it takes a long time... :)


That's why I statistically reduce the number of inputs and training becomes adequate in time. I've done the model + 4th level boosting in one day. Now I'll make rollback model and it should work for weeks. At least I hope so...

 
Maxim Dmitrievsky:

Well, yes there is sometimes a problem with the shift because of normalization, so I have all inputs through the logarithms of increments and there the center will always be at zero if they need to be normalized

But in general, the spikes are removed more so that the weights are not shifted on the anomalies... and the scaffold, for example, do not care about them at all

Well, in fact logarithm does something similar, only it does not discard, and brings, strong emissions. However, I don't discard them either, but equate them to the maximum. if(v>max){v=max;}

In general, I want to define in advance a valid range for each predictor and conduct all my experiments within it. Because even with my method and logarithm, there will be some shift in data from sample to sample.

 
elibrarius:
Well essentially logarithm does something similar, only it doesn't discard, but approximates, strong outliers. However, I don't discard them either, but equate them to the maximum. if(v>max){v=max;}

Oh yeah, that's right... I guess)

I first take some series like log(close[0]/close[n])

and then

void NormalizeArrays(double &a[]) //нормируем от -1 до 1
  { 
   double multiplier;
   double x_min=MathAbs(a[ArrayMinimum(a)]);
   double x_max=MathAbs(a[ArrayMaximum(a)]);
   if(x_min>=x_max) multiplier = x_min;
     else multiplier = x_max;
   for(int i=0;i<ArraySize(a);i++)
     {
      a[i]=a[i]/multiplier;
     }
  }
and if you take small samples then you can define in advance themaximal value in the large sample, so it won't change.
 
Maxim Dmitrievsky:

Oh yeah, that's right... I guess)

I first take some series like log(close[0]/close[n])

and then

If min and max are not mirrored (e.g. -100 and 90), then you will have a normalization for example from -1 to 0.9. But the center will always be at 0. That's an interesting approach for dealing with offsets.

And apparently you have to take

a[i]=a[i]/Abs(multiplier);

Otherwise, a negative min will turn everything upside down.

 
elibrarius:

If min and max are not mirrored (e.g. -100 and 90), then you will have a normalization for example from -1 to 0.9. But the center will always be at 0. An interesting approach in the fight against displacement.

And apparently we should.

Otherwise a negative min will turn everything upside down.


Yes, the main thing is not to shift the center.

The Abysses for max and min are already there.

double x_min=MathAbs(a[ArrayMinimum(a)]);
double x_max=MathAbs(a[ArrayMaximum(a)]);
 
Maxim Dmitrievsky:

There are already Abysses for max and min

Missed)

Another point, if you take for example not 0, but for example 0.5 - even with your method it will "float" from sample to sample.

Only manual setting of range for each input will help. But it is not clear how to determine it. Maybe, for example, to run the data for a year, and discard 1-5% of emissions. And work with them during the year. Although in a year they will change.