Machine learning in trading: theory, models, practice and algo-trading - page 2749

 
Maxim Dmitrievsky #:

added in the previous post

#299

someone deleted my posts where I posted the vid... i don't know how that can happen.

 
mytarmailS #:

someone deleted my posts where I posted the vid... i don't know how that can happen.

Well, well, well, you're showing off a lot, I guess.

Here's one on wood modelling.

#4469

#4467

About the catbusters search? 😄 who started the thread

Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Продолжаю переобучать каждую неделю
Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Продолжаю переобучать каждую неделю
  • 2017.07.12
  • www.mql5.com
Самое интересное что я не понимаю почему он открывает сделки туда или обратно. экспериментирую с предикторами и разными способами открытия поз. тем более что оно копирует механизм работы настоящих нейронов очень примитивно и не так как на самом деле это происходит в мозгу Единственное из НС что нормально работает и перспективно это сверточные нс для распознавания всяких образов
 
Maxim Dmitrievsky #:

Well, well, well, you must be showing off a lot.

here's the wood modelling

#4469

#4467

About the catbusters search? 😄 who started the thread

Look, wood is not an argument at all... 95 per cent of the MoD tutorials start with wood, it's the first thing a beginner gets to know, you're not in charge of it.

 
mytarmailS #:

Look, wood is not an argument at all... 95% of the MoD tutorials start with wood, that's what a beginner learns first, it's not your fault.

It's been mostly NS here before, I wondered why not trees. I made tests, it turned out that they are no worse. I haven't seen such information here before, otherwise I wouldn't have asked this question.

Then I switched over to bousting, found the katbust lib and started replicating it here

 
Maxim Dmitrievsky #:

It was mostly NSs that were discussed here before, I wondered why not trees. I made tests and it turned out that they are not worse. I haven't seen such info here before, otherwise I wouldn't have asked this question.

and trees, by the way, can serve as a worthy alternative to clustering and fs - "2 in one".

Maxim Dmitrievsky #:

Then I switched to bousting, found the katbust lib and started replicating it here

on the algorithm: it gives another 2nd derivative mathematically (or residuals averaging - statistically) -- BUT how does it help you personally in training and in what cases?... besides the standard clichés in commercials "catboost will give better and more accurate results".... because point precision is not always important, sometimes the generative ability of the model may be more important.
 
JeeyCi #:

and trees, by the way, can serve as a worthy alternative to clustering and fs.

on the algorithm: it gives another 2nd derivative mathematically (or residuals averaging - statistically) -- BUT how does it help in training and in what cases?... besides the standard clichés in commercials "catboost will give better and more accurate results".... because not everywhere point precision is important, sometimes the generative ability of the model may be more important?

There are also wooden models for causal inference, haven't had time to figure it out yet

Boosting reduces bias and variance, whereas Forest only variance I think. It's about proven advantages, you can google it. And the library itself is developed, it is convenient to work with it.

It is not quite clear about generative ones, maybe sometimes they are more important. But NS generative ones do not work well at Forex, if we are talking about generating synthetic data.

Forest Based Estimators — econml 0.13.1 documentation
  • econml.azurewebsites.net
Orthogonal Random Forests are a combination of causal forests and double machine learning that allow for controlling for a high-dimensional set of confounders , while at the same time estimating non-parametrically the heterogeneous treatment effect , on a lower dimensional set of variables . Moreover, the estimates are asymptotically normal and...
 
Maxim Dmitrievsky #:

if we're talking about generating synthetic data.

no - it's about generalising... yes, wrong, express it.... sorry.

I think to distinguish risk-on/risk-off environment -- still thinking about how to generalise this division... all in my own thoughts (on the forum by accident)...

thanks for the reply!

 
JeeyCi #:

no - it's about generalising..... yes, wrong, express.... sorry

I think to distinguish risk-on/risk-off environment -- still thinking about how to generalise this division... all in my own thoughts (on the forum by accident)...

thanks for the reply!

try it, catbusta has a bunch of different features, I like it.

There is an early stop based on error on validation sample, pre-training. Generalisation is not worse than in NS, which in addition have to write their own functions to stop learning.

and it learns quickly, you don't have to wait for hours.

 
can a random forest find the maximum in a string of data, that is simulate the function mach() ?

I'm just wondering if the MO can simulate primitive functions from the JA

there is a matrix, each row of the matrix is a training example.

head(X)
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    4    1    3    1
[2,]    3    1    4    5    3
[3,]    1    2    4    4    1
[4,]    1    1    5    3    5
[5,]    3    4    1    3    3
[6,]    4    4    5    1    2

we need to find the maximum of each row, the sample size is 20k rows.


solving the problem through regression

 pred actual
1  4.967619      5
2  4.996474      5
3  4.127626      4
4  4.887233      5
5  5.000000      5
6  4.881568      5
7  4.028334      4
8  4.992406      5
9  3.974674      4
10 4.899804      5
11 4.992406      5

rounded for clarity

 pred actual
1     5      5
2     5      5
3     4      4
4     5      5
5     5      5
6     5      5
7     4      4
8     5      5
9     4      4
10    5      5
11    5      5

pretty good, only a few errors on the test out of 50 new lines


But the data in matrix X is very simple, only 5 unique values from 1 to 5 and only 5 columns, but already errors.

Although I think if we make classification, there would be no errors, by the way you can check it.

Well, yes, it is, but if we are looking for the maximum in the data then classification is not suitable because the scatter of values can be huge....

So let's go back to regression and complicate the data.

head(X)
      [,1]  [,2]  [,3]  [,4]  [,5]
[1,]  0.93 -2.37 -0.35  0.16 -0.11
[2,] -0.53  0.19 -0.42  1.35 -0.16
[3,]  1.81  0.19 -0.68  0.31 -0.05
[4,]  0.08 -1.43  0.15 -0.96  0.43
[5,]  0.40  1.36  1.17 -0.99 -0.18
[6,] -2.19 -0.65  0.42 -1.12  1.46

we get this result.

  pred actual
1   1.75   1.78
2   1.33   1.31
3   1.67   1.69
4   1.15   1.16
5   0.51   0.41
6   1.00   0.99
7   0.80   0.78
8   1.75   1.76
9   0.35   0.36
10  1.78   1.79
11  2.02   2.13
12  1.26   1.21
13  1.60   1.57
14  0.19   0.06

In principle it is not bad, but the usual function mach() will do it better and can replace all this model....

By the way, I wonder how other models will work, if they will be able to create the mach() function without an error

 
mytarmailS #:
I remember, but it's a fitting error, I mean, trace sampling...
What's the error of that next candle, the test.

It's the error on the next 300 bars. Predictors were generated on each bar, then filtered, the model was trained and the next bar was predicted.

Reason: