Discussion of article "Neural networks made easy (Part 27): Deep Q-Learning (DQN)"

 

New article Neural networks made easy (Part 27): Deep Q-Learning (DQN) has been published:

We continue to study reinforcement learning. In this article, we will get acquainted with the Deep Q-Learning method. The use of this method has enabled the DeepMind team to create a model that can outperform a human when playing Atari computer games. I think it will be useful to evaluate the possibilities of the technology for solving trading problems.

You have probably already guessed that deep Q-learning involves using a neural network to approximate a Q-function. What is the advantage of such an approach? Remember the implementation of the cross-entropy tabular method in the last article. I emphasized that the implementation of a tabular method assumes a finite number of possible states and actions. So, we have limited the number of possible states by clustering the initial data. But is it so good? Will clustering always produce better results? The use of a neural network does not limit the number of possible states. I think this is a great advantage when solving trading related problems.

The very first obvious approach is to replace the table from the previous article with a neural network. But, unfortunately, it's not that easy. In practice, the approach turned out to be not as good as it seemed. To implement the approach, we need to add a few heuristics.

First, let's look at the agent training goal. In general, its goal is to maximize the total rewards. Look at the figure below. The agent has to move from the cell start to the cell Finish. The agent will receive the reward once, when it gets to the Finish cell. In all other states, the reward is zero.

Discount factor

The figure shows two paths. For us it is obvious that the orange path is shorter and more preferable. But in terms of reward maximization, they are equivalent.

Author: Dmitriy Gizlyk

 

Hello Mr Gizlyk, First of all, I would like to thank you for your well-founded series. However, as a latecomer I have to struggle with some problems in understanding your current article. After I was able to reconstruct the VAE.mqh file and the CBufferDouble class from your previous articles, i can compile your sample application from this article. To test I tried to create a network with your program NetCreater. I gave it up after many tries. The saved networks were not accepted by your application from this article. Couldn't you also offer the network you created for download? Thanks again for your work!

 
sfonti #:

Hello Mr Gizlyk, First of all, I would like to thank you for your well-founded series. However, as a latecomer I have to struggle with some problems in understanding your current article. After I was able to reconstruct the VAE.mqh file and the CBufferDouble class from your previous articles, i can compile your sample application from this article. To test I tried to create a network with your program NetCreater. I gave it up after many tries. The saved networks were not accepted by your application from this article. Couldn't you also offer the network you created for download? Thanks again for your work!

same problem here..  the loaded file always damaged..  do you find the solution?