Better NN EA - page 13

 

Neural networks made easy (Part 68): Offline Preference-guided Policy Optimization 

Neural networks made easy (Part 68): Offline Preference-guided Policy Optimization

Reinforcement learning is a universal platform for learning optimal behavior policies in the environment under exploration. Policy optimality is achieved by maximizing the rewards received from the environment during interaction with it. But herein lies one of the main problems of this approach. The creation of an appropriate reward function often requires significant human effort. Additionally, rewards may be sparse and/or insufficient to express the true learning goal. As one of the options for solving this problem, the authors if the paper "Beyond Reward: Offline Preference-guided Policy Optimization" suggested the OPPO method (OPPO stands for the Offline Preference-guided Policy Optimization). The authors of the method suggest the replacement of the reward given by the environment with the preferences of the human annotator between two trajectories completed in the environment under exploration. Let's take a closer look at the proposed algorithm.
Neural networks made easy (Part 68): Offline Preference-guided Policy Optimization
Neural networks made easy (Part 68): Offline Preference-guided Policy Optimization
  • www.mql5.com
Since the first articles devoted to reinforcement learning, we have in one way or another touched upon 2 problems: exploring the environment and determining the reward function. Recent articles have been devoted to the problem of exploration in offline learning. In this article, I would like to introduce you to an algorithm whose authors completely eliminated the reward function.
 

Neural networks made easy (Part 69): Density-based support constraint for the behavioral policy (SPOT)

Neural networks made easy (Part 69): Density-based support constraint for the behavioral policy (SPOT)

Offline reinforcement learning allows the training of models based on data collected from interactions with the environment. This allows a significant reduction of the process of interacting with the environment. Moreover, given the complexity of environmental modeling, we can collect real-time data from multiple research agents and then train the model using this data.

At the same time, using a static training dataset significantly reduces the environment information available to us. Due to the limited resources, we cannot preserve the entire diversity of the environment in the training dataset.

Neural networks made easy (Part 69): Density-based support constraint for the behavioral policy (SPOT)
Neural networks made easy (Part 69): Density-based support constraint for the behavioral policy (SPOT)
  • www.mql5.com
In offline learning, we use a fixed dataset, which limits the coverage of environmental diversity. During the learning process, our Agent can generate actions beyond this dataset. If there is no feedback from the environment, how can we be sure that the assessments of such actions are correct? Maintaining the Agent's policy within the training dataset becomes an important aspect to ensure the reliability of training. This is what we will talk about in this article.
 

Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)

The approach to optimizing the Agent policy with constraints on its behavior turned out to be promising in solving offline reinforcement learning problems. By exploiting historical transitions, the Agent policy is trained to maximize a learned value function.

Behavior constrained policy can help to avoid a significant distribution shift in relation to Agent actions, which provides sufficient confidence in the assessment of the action costs. In the previous article we got acquainted with the SPOT method, which exploits this approach. As a continuation of the topic, I propose to get acquainted with the Closed-Form Policy Improvement (CFPI) algorithm, which was presented in the paper "Offline Reinforcement Learning with Closed-Form Policy Improvement Operators".

Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)
Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)
  • www.mql5.com
In this article, we will get acquainted with an algorithm that uses closed-form policy improvement operators to optimize Agent actions in offline mode.
 
Neural networks made easy (Part 71): Goal-Conditioned Predictive Coding GCPC)
Goal-Conditioned Behavior Cloning (BC) is a promising approach for solving various offline reinforcement learning problems. Instead of assessing the value of states and actions, BC directly trains the Agent behavior policy, building dependencies between the set goal, the analyzed environment state and the Agent's action. This is achieved using supervised learning methods on pre-collected offline trajectories. The familiar Decision Transformer method and its derivative algorithms have demonstrated the effectiveness of sequence modeling for offline reinforcement learning.
Neural networks made easy (Part 71): Goal-Conditioned Predictive Coding GCPC)
Neural networks made easy (Part 71): Goal-Conditioned Predictive Coding GCPC)
  • www.mql5.com
In previous articles, we discussed the Decision Transformer method and several algorithms derived from it. We experimented with different goal setting methods. During the experiments, we worked with various ways of setting goals. However, the model's study of the earlier passed trajectory always remained outside our attention. In this article. I want to introduce you to a method that fills this gap.
 

Neural networks made easy (Part 72): Trajectory prediction in noisy environments 

The noise prediction module solves the auxiliary problem of identifying noise in the analyzed trajectories. This helps the movement prediction model better model potential spatial diversity and improves understanding of the underlying representation in movement prediction, thereby improving future predictions.

The authors of the method conducted additional experiments to empirically demonstrate the critical importance of the spatial consistency and noise prediction modules for SSWNP. When using only the spatial consistency module to solve the movement prediction problem, suboptimal performance of the trained model is observed. Therefore, they integrate both modules in their work.

Neural networks made easy (Part 72): Trajectory prediction in noisy environments
Neural networks made easy (Part 72): Trajectory prediction in noisy environments
  • www.mql5.com
The quality of future state predictions plays an important role in the Goal-Conditioned Predictive Coding method, which we discussed in the previous article. In this article I want to introduce you to an algorithm that can significantly improve the prediction quality in stochastic environments, such as financial markets.
 

Neural networks made easy (Part 73): AutoBots for predicting price movements 

The proposed method is based on the Encoder-Decoder architecture. It was developed to solve problems of safe control of robotic systems. It allows the generation of sequences of trajectories for multiple agents consistent with the scene. AutoBots can predict the trajectory of one ego-agent or the distribution of future trajectories for all agents in the scene. In our case, we will try to apply the proposed model to generate sequences of price movements of currency pairs consistent with market dynamics.
Neural networks made easy (Part 73): AutoBots for predicting price movements
Neural networks made easy (Part 73): AutoBots for predicting price movements
  • www.mql5.com
We continue to discuss algorithms for training trajectory prediction models. In this article, we will get acquainted with a method called "AutoBots".
 

Neural networks made easy (Part 74): Trajectory prediction with adaptation

Building a trading strategy is inseparable from analyzing the market situation and forecasting the most likely movement of a financial instrument. This movement often correlated with other financial assets and macroeconomic indicators. This can be compared with the movement of transport, where each vehicle follows its own individual destination. However, their actions on the road are interconnected to a certain extent and are strictly regulated by traffic rules. Also, due to the individual perception of the road situation by vehicle drivers, a share of stochasticity remains on the roads.

In this article I want to introduce you to a method for effectively jointly predicting the trajectories of all agents on the scene with dynamic learning of weights ADAPT, which was proposed to solve problems in the field of navigation of autonomous vehicles. The method was first presented in the article "ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation".
Neural networks made easy (Part 74): Trajectory prediction with adaptation
Neural networks made easy (Part 74): Trajectory prediction with adaptation
  • www.mql5.com
This article introduces a fairly effective method of multi-agent trajectory forecasting, which is able to adapt to various environmental conditions.
 

Neural networks made easy (Part 75): Improving the performance of trajectory prediction models

Forecasting the trajectory of the upcoming price movement probably plays one of the key roles in the process of constructing trading plans for the desired planning horizon. The accuracy of such forecasts is critical. In an attempt to improve the quality of trajectory forecasting, we complicate our trajectory forecasting models.
Neural networks made easy (Part 75): Improving the performance of trajectory prediction models
Neural networks made easy (Part 75): Improving the performance of trajectory prediction models
  • www.mql5.com
The models we create are becoming larger and more complex. This increases the costs of not only their training as well as operation. However, the time required to make a decision is often critical. In this regard, let us consider methods for optimizing model performance without loss of quality.
 

Neural networks made easy (Part 76): Exploring diverse interaction patterns with Multi-future Transformer

Neural networks made easy (Part 76): Exploring diverse interaction patterns with Multi-future Transformer

The authors of the paper "Multi-future Transformer: Learning diverse interaction modes for behavior prediction in autonomous driving" suggest using the Multi-future Transformer (MFT) method to solve such problems. Its main idea is to decompose the multimodal distribution of the future into several unimodal distributions, which allows you to effectively simulate various models of interaction between agents on the scene.

In MFT, forecasts are generated by a neural network with fixed parameters in a single feed-forward pass, without the need to stochastically sample latent variables, pre-determine anchors, or run an iterative post-processing algorithm. This allows the model to operate in a deterministic, repeatable manner.

Neural networks made easy (Part 76): Exploring diverse interaction patterns with Multi-future Transformer
Neural networks made easy (Part 76): Exploring diverse interaction patterns with Multi-future Transformer
  • www.mql5.com
This article continues the topic of predicting the upcoming price movement. I invite you to get acquainted with the Multi-future Transformer architecture. Its main idea is to decompose the multimodal distribution of the future into several unimodal distributions, which allows you to effectively simulate various models of interaction between agents on the scene.
 

Neural networks made easy (Part 77): Cross-Covariance Transformer (XCiT)

Neural networks made easy (Part 77): Cross-Covariance Transformer (XCiT)

Transformers show great potential in solving problems of analyzing various sequences. The Self-Attention operation which underlies transformers, provides global interactions between all tokens in the sequence. This makes it possible to evaluate interdependencies within the entire analyzed sequence. However, this comes with quadratic complexity in terms of computation time and memory usage, making it difficult to apply the algorithm to long sequences.
Neural networks made easy (Part 77): Cross-Covariance Transformer (XCiT)
Neural networks made easy (Part 77): Cross-Covariance Transformer (XCiT)
  • www.mql5.com
In our models, we often use various attention algorithms. And, probably, most often we use Transformers. Their main disadvantage is the resource requirement. In this article, we will consider a new algorithm that can help reduce computing costs without losing quality.