Discussing the article: "MQL5 Wizard Techniques you should know (Part 43): Reinforcement Learning with SARSA"

MetaQuotes 2024.10.18 11:29

Check out the new article: MQL5 Wizard Techniques you should know (Part 43): Reinforcement Learning with SARSA.

SARSA, which is an abbreviation for State-Action-Reward-State-Action is another algorithm that can be used when implementing reinforcement learning. So, as we saw with Q-Learning and DQN, we look into how this could be explored and implemented as an independent model rather than just a training mechanism, in wizard assembled Expert Advisors.

Reinforcement Learning (RL) allows trading systems to learn from their environment or market data and thus improve their ability to trade over time. RL enables adaptation to changing market conditions, making it suitable for certain dynamic financial markets and securities. Financial markets are unpredictable, as often they feature a high degree of uncertainty. RL excels at making decisions under uncertainty by continuously adjusting its actions based on received feedback (rewards), thus being very helpful to traders when handling volatile market conditions.

A parallel comparison to this could be an Expert Advisor that is attached to a chart and also self optimizes periodically on recent price history to fine tune its parameters. RL aims to do the same thing but with less fanfare. For the segment of these series that has looked at RL this far, we have been using it from its strict definition sense as a 3^rd approach to training in machine learning (besides supervised and unsupervised learning). We have not yet looked at it as an independent model that can be used in forecasting.

That changes in this article. We do not just introduce a different RL algorithm, SARSA, but we seek to implement this within another custom signal class of wizard assembled Expert Advisors as an independent signal model. When used as a signal model, RL automates the process of decision-making, reducing the need for constant human intervention, which in turn (in theory at least) could allow for high-frequency trading and real-time response to market movements. Also, by having continuous feedback from its reward mechanism, RL models tend to learn to manage risk better. This is realized via penalizing high-risk actions with low rewards, the net effect of this being RL minimizes exposure to volatile or loss-making trades.

Test runs on the daily time frame for EUR JPY for the year 2022 that are strictly meant to demonstrate usability of the Expert Advisor do give us the following results:

Author: Stephen Njuki

New comment