Machine learning in trading: theory, models, practice and algo-trading - page 1181

 

The translator works. Either translate the whole page, or copy-paste into the translator.

But a word or a paragraph - nothing.

 
Maxim Dmitrievsky:

There are too many settings there, you need a lot of bottles to figure it out... :) maybe the sample is small because the tree-like ones are mostly designed for large, you need to tweak something


of course, for sure it's possible to tweak, I even guess, that sampling percentage goes to each tree reduced by default, but two times two is an indicator...)

 
Maxim Dmitrievsky:

translate one word at a time, through the Google translator plugin for chrome. Without engl. no way. Even if you read through 1-2 words, the meaning will be clear in general. I myself use when I forget the words. Just click on the word. You can highlight turns / sentences.

Of course it is stupid to translate all the text at once, so you will never remember the words and you will never understand the meaning of the text.

Thanks, I should try to translate using your method, maybe it will be even more productive than making up my own hypotheses, but I have a weakness with languages...

 
Ivan Negreshniy:

I do not understand why you may need to manually edit splits and leaves of the decision trees, yes I have all branches automatically converted to logical operators, but frankly I do not remember that I myself have ever corrected them.

Because what's the point of using leaves with less than 50-60% prediction probability? It's random - it's better the model won't react to the situation at all, rather than reacting to a guess.


Ivan Negreshniy:

And is it even worth digging into the CatBoost code, how can you be sure.

For example I've put above test on python my neural network with learning by multiplication table by two, and now took it for testing trees and forests (DecisionTree, RandomForest, CatBoost)

and here's what the result came out - you can clearly see that it's not in favor of CatBoost, like two times two is zero five...:)


true, if you take thousands of trees, the results improve.

I'm not sure that the trees are better than neural networks, but trees require fewer resources to build them. For example right now I have about 400 predictors, and a network with 400 input neurons and (how many layers there are) would take too long to count.

I can reset my sample - maybe use it to see which method is better?

The settings, yes, make sense, and I'm digging into them right now and trying to figure them out.

 
Ivan Negreshniy:

I do not understand why it may be necessary to manually edit splits and leaves deciding trees, yes I have all branches automatically converted to logical operators, but honestly do not remember that I myself have ever corrected them.

And in general it's worth digging the code CatBoost, how to be sure.

For example I put above test on python my neural network with learning by multiplication table by two, and now took it for testing trees and forests (DecisionTree, RandomForest, CatBoost)

and here's the result - you can see, that it's not in favor of CatBoost, like two times two - zero five...:)


It's true, if you take thousands of trees, the results improve.
import catboost
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from catboost import CatBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor

x = [[1,2],[2,2],[3,2],[4,2],[5,2],[6,2],[7,2],[8,2],[9,2]]
y = [2,4,6,8,10,12,14,16,18]

print('-------- 1 DecisionTree')
tree = DecisionTreeRegressor().fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],tree.predict([ix])[0]))

print('-------- RandomForest 10 Tree')
regr = RandomForestRegressor(bootstrap=True).fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],regr.predict([ix])[0]))

print('-------- CatBoost 10 Tree')
cat = CatBoostRegressor(iterations=100, learning_rate=0.1, depth=2, verbose=False).fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],cat.predict([ix])[0]))

print('-------- Gboost 100 Trees')
gboost  = GradientBoostingRegressor(n_estimators=100, verbose = False).fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],gboost.predict([ix])[0]))
-------- 1 DecisionTree
 1.00*2.00=2.00 
 2.00*2.00=4.00 
 3.00*2.00=6.00 
 4.00*2.00=8.00 
 5.00*2.00=10.00 
 6.00*2.00=12.00 
 7.00*2.00=14.00 
 8.00*2.00=16.00 
 9.00*2.00=18.00 
-------- RandomForest 10 Tree
 1.00*2.00=3.60 
 2.00*2.00=4.40 
 3.00*2.00=6.00 
 4.00*2.00=8.00 
 5.00*2.00=9.20 
 6.00*2.00=11.80 
 7.00*2.00=13.20 
 8.00*2.00=15.60 
 9.00*2.00=17.40 
-------- CatBoost 10 Tree
 1.00*2.00=2.97 
 2.00*2.00=2.97 
 3.00*2.00=5.78 
 4.00*2.00=8.74 
 5.00*2.00=10.16 
 6.00*2.00=12.88 
 7.00*2.00=14.67 
 8.00*2.00=15.77 
 9.00*2.00=15.77 
-------- Gboost 100 Trees
 1.00*2.00=2.00 
 2.00*2.00=4.00 
 3.00*2.00=6.00 
 4.00*2.00=8.00 
 5.00*2.00=10.00 
 6.00*2.00=12.00 
 7.00*2.00=14.00 
 8.00*2.00=16.00 
 9.00*2.00=18.00 

I tweaked it a little and added gradient boosting, it works best out of the box

the rest is monda something of course...

 
Maxim Dmitrievsky:
About a year ago I saw a copy of a simple NS showing very decent results in the multiplication table. At that time it surprised me.
What's it for now?
 
import catboost
import lightgbm as gbm
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from catboost import CatBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor

x = [[1,2],[2,2],[3,2],[4,2],[5,2],[6,2],[7,2],[8,2],[9,2]]
y = [2,4,6,8,10,12,14,16,18]

print('-------- 1 DecisionTree')
tree = DecisionTreeRegressor().fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],tree.predict([ix])[0]))

print('-------- RandomForest 10 Tree')
regr = RandomForestRegressor(bootstrap=True, n_estimators=100).fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],regr.predict([ix])[0]))

print('-------- CatBoost 10 Tree')
cat = CatBoostRegressor(iterations=100, learning_rate=0.1, depth=2, verbose=False).fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],cat.predict([ix])[0]))

print('-------- Gboost 100 Trees')
gboost  = GradientBoostingRegressor(n_estimators=100, verbose = False).fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],gboost.predict([ix])[0]))

print('-------- LGBM 100 Trees')
gbbm = gbm.LGBMRegressor(n_estimators=100,boosting_type='dart').fit(x,y)
for ix in x: print(' {:2.2f}*{:2.2f}={:2.2f} '.format(ix[0],ix[1],gbbm.predict([ix])[0]))
-------- 1 DecisionTree
 1.00*2.00=2.00 
 2.00*2.00=4.00 
 3.00*2.00=6.00 
 4.00*2.00=8.00 
 5.00*2.00=10.00 
 6.00*2.00=12.00 
 7.00*2.00=14.00 
 8.00*2.00=16.00 
 9.00*2.00=18.00 
-------- RandomForest 10 Tree
 1.00*2.00=2.84 
 2.00*2.00=3.74 
 3.00*2.00=5.46 
 4.00*2.00=7.70 
 5.00*2.00=9.66 
 6.00*2.00=11.44 
 7.00*2.00=13.78 
 8.00*2.00=15.46 
 9.00*2.00=16.98 
-------- CatBoost 10 Tree
 1.00*2.00=2.97 
 2.00*2.00=2.97 
 3.00*2.00=5.78 
 4.00*2.00=8.74 
 5.00*2.00=10.16 
 6.00*2.00=12.88 
 7.00*2.00=14.67 
 8.00*2.00=15.77 
 9.00*2.00=15.77 
-------- Gboost 100 Trees
 1.00*2.00=2.00 
 2.00*2.00=4.00 
 3.00*2.00=6.00 
 4.00*2.00=8.00 
 5.00*2.00=10.00 
 6.00*2.00=12.00 
 7.00*2.00=14.00 
 8.00*2.00=16.00 
 9.00*2.00=18.00 
-------- LGBM 100 Trees
 1.00*2.00=10.00 
 2.00*2.00=10.00 
 3.00*2.00=10.00 
 4.00*2.00=10.00 
 5.00*2.00=10.00 
 6.00*2.00=10.00 
 7.00*2.00=10.00 
 8.00*2.00=10.00 
 9.00*2.00=10.00 
 
Yuriy Asaulenko:

But a word or paragraph - nothing at all.

https://www.mql5.com/ru/forum/86386/page1180#comment_9543249

Машинное обучение в трейдинге: теория и практика (торговля и не только)
Машинное обучение в трейдинге: теория и практика (торговля и не только)
  • 2018.11.29
  • www.mql5.com
Добрый день всем, Знаю, что есть на форуме энтузиасты machine learning и статистики...
 
Maxim Dmitrievsky:

there CatBoost at iterations=100 trees, not 10, and GBM is a beauty:)

 
Aleksey Vyazmikin:

Because what is the point of using sheets with less than 50-60% prediction probability? It's random-it's better for the model not to respond at all than to respond at a guess.


I'm not sure that trees are better than neural networks, but trees require fewer resources to build. For example, right now I have about 400 predictors, and a network with 400 input neurons and (how many layers there are) would take too long to count.

I can reset my sample - maybe use it to see which method is better?

And the settings yes - make sense - and I'm digging into them now and trying to understand their essence.

Sure dig in and choose as carefully as you can while it's still in the beginning stages.

Besides not understanding with two-two, try to disconnect obtrusive, at each startup, creation by CatBoost its temporary directories as from this in the protected environment it flies out.

And in general, these glitches he looks somehow not very professional, so if you can not beat them, then personally in my opinion, cheaper than free - from this product to give up immediately:)