Machine learning in trading: theory, models, practice and algo-trading - page 1144

 
Grail:

I would not say that "exactly so", the formula itself is correct, but it should not be calculated by returns from trades, but by daily (hourly, etc.) returns. Otherwise if this number is calculated based on trades and their significantly different number, it doesn't matter, for example one strategy has 0.01 Sharpe and the other has 5, it is not clear which one is better or worse, only sign is important (above or below zero Sharpe).

So although pantural isn't exactly talking about classic Sharpe Ratio but still he raised an important question about it. Although I personally don't prefer using the Sharpe ratio, I prefer the ratio of profit to maximum drawdown as a measure of strategy performance.

I would say it depends on the Expert Advisor. If it generates a strict sequence of deals, i.e. when a position is opened then closed and its volume does not change between opening and closing - it is better to count by trades. If the position volume changes smoothly over time, then identifying the moments of a trade is less meaningful and can be calculated according to your method.

The pantural method is more good for selling the TS and finding investors) So over time, I suppose, will switch to it)

 
Aleksey Nikolayev:

I would say that it depends on the Expert Advisor. If it generates a clear sequence of trades, that is, when the position is opened and closed, and its volume does not change between opening and closing - it is better to count by trades. If the position volume changes smoothly over time, then identifying the moments of a trade is less meaningful and can be calculated using your own method.

So that over time, I suppose, they will switch to it).

In any case, pantural already has no way to object :))

What are you doing now, randomly wandering around? Don't you want to discuss normal things in the field of defense? :) I need a person who knows a lot about formulas. The topic is empty, there is no one to discuss it with.
 
Maxim Dmitrievsky:

What are you doing now, just randomly wandering around? Don't you want to discuss normal things in the MO field? :) I need a person who is good at formulas. The topic is empty, there is no one to discuss it with.

In principle, I am ready to express my opinion on any subject. But availability of sense for you in my statements I cannot guarantee)

 
Maxim Dmitrievsky:

I think I gave you some information about bandits? It's a very interesting topic, but there are a lot of formulas.

Yes, there was something like that. But update the link and write, that approximately interests.

 
Aleksey Nikolayev:

Yes, there was something like that. But update the link and write what, intrinsically interested.

the link above, interested in adversarial bandits for non-stationary processes, with combinatorial algorithms (apparently, something like mgua). myself in the process of getting acquainted with the information so far

write later what exactly

 
Maxim Dmitrievsky:

In their book I immediately came across:

All the learner knows is that the true environment lies in some set E called the environment class.

How do you see this E set for trading?

 
Aleksey Nikolayev:

In their book I immediately came across:

All the learner knows is that the true environment lies in some set E called the environment class.

How do you see this E set for trading?

Well it's an arbitrary set environment for a bandit, for example a set of indicators

e.g. one rsi indicator, for simplicity, a set of several price increments
 
Maxim Dmitrievsky:

Well this is an arbitrary environment for the bandit, such as a set of indicators.

For example one rsi indicator, for simplicity, a set of several price increments

However, I do not understand how their model relates to trading. From their definition of strategy (policy) it follows that they look only at actions taken and their results. On the environment (in your opinion - a set of indicators) they do not or can not even see it.

At should only depend on the history Ht-1 = (A1 , X1 , . . . , At-1 , Xt-1 ). A policy is a mapping from histories to actions.

Moreover, it seems that their environment can even track our behavior, and therefore the reward will not only depend on the action itself, but also on its entire prehistory.

An environment is a mapping from history sequences ending in actions to rewards.

 
Aleksey Nikolayev:

But I don't understand the relationship of their model to trading. From their definition of strategy (policy), it follows that they look only at actions taken and their results. On the environment (in your opinion - a set of indicators) they do not or can not even see it.

At should only depend on the history Ht-1 = (A1 , X1 , . . . , At-1 , Xt-1 ). A policy is a mapping from histories to actions.

Moreover, it seems that their environment can even track our behavior, and therefore the reward will not only depend on the action itself, but also on its entire prehistory.

An environment is a mapping from history sequences ending in actions to rewards.

If the policy is approximated by some model (say, linear) then we just get a solution on the new data and that's it, substituting it into the model

what you described is the process of finding the highest reward

The main problem with non-stationarity is when it stops working on new data. There are unsteady bandits described there, but I haven't gotten to them yet. Admittedly, there's nothing there that I don't already know, as it turns out :) But need some ideas\solutions on how to properly give rewards

By the way, yesterday implemented a linear bandit, the result is something like this:

in fact, the example is still described in my article, but it uses a random forest instead of a linear one. Linear should be less overtrained

 
Maxim Dmitrievsky:


To teach on the future, and to test on the past, it is only on this forum can be met)))

Reason: