Deep learning & Level 2 market data

 

Hello everyone,


I've been trying for a year now to develop an automatic trading algo using a neural network as a model and level 1 data (open, close, high, low candlestick prices) and several technical indicators (14). I retrieved this data in a 1-minute timeframe using mql5 scripts that I devised.


Unfortunately, I haven't obtained any really satisfactory results so far :( .

The network only ended up returning the closing price of the previous candle, as this was the only value found that reduced the error function to a minimum. Since the aim is to predict the closing values of the next n candles (or, at the very least, to predict the trend), this is not appropriate.

The advantage of predicting the closing price in n candles rather than the trend is that you can take into account the spread and therefore know whether the position will be profitable (if it will exceed the spread sufficiently).

If you have any advice on the choice of data or how to pre-process it, or even on the choice of neural network model or how to train it, I'd love to hear from you! 


Remaining hopeful, I continued to read scientific publications and discovered level 2 data (depth of market, Limit Order Book (LOB)), i.e. all pending orders placed on an asset on all the world's stock exchanges (and not just the best bid and ask price of a single broker (level 1)). This is where it gets complicated! This data seems to be in great demand, and apparently some people are prepared to pay several hundred euros a month for access to it! As I'm a student and my trading isn't yet profitable, I can't afford it. So I've found a rather slow low-cost solution which consists of streaming this data from Interactive Broker's API and saving it in a CSV.

Again, if you have any tips or know of a way to download a level 2 market database for free, please don't hesitate! 


If I haven't been clear enough in my explanations, I'll be happy to continue to detail them.

Thank you for taking the time to read my "little" message. It's my first time posting on a forum, so I hope I've respected all the rules.

If you ever want to share other tips, advice, etc. about algorithmic trading go ahead!

Any info is good to take.


Thanks :)


PS: I'm looking for people with a minimum of qualifications to work with me on this project, so if you're interested please let me know.

Neural Networks: From Theory to Practice
Neural Networks: From Theory to Practice
  • www.mql5.com
Nowadays, every trader must have heard of neural networks and knows how cool it is to use them. The majority believes that those who can deal with neural networks are some kind of superhuman. In this article, I will try to explain to you the neural network architecture, describe its applications and show examples of practical use.
 

Hello 

How was the network instructed to output the closing price ? and with how many output nodes .

 
I used a neural network model with first LSTM layers and then Dense layers. I provided the network with the last 5 quotes (last 5 1-minute candlesticks). Each input candle was represented by this data:
- Price data: open, high, low, close, tick_volume, spread, average_price, volume_profile
- Indicator data: ADX, AO, ATR, BearsPower, BullsPower, CCI, DEMA, Tenkan, Kijun, SBB, SSA, MACD, Momentum, RSI, RVI, Stochastic, UO
This gives 25 data items per candle.

So the input layer has 125 nodes (25*5). And then I decreased the number of nodes per layer at each layer. I've simply taken the structure of a neural network shared in a scientific publication, as I don't yet have enough knowledge to adapt it to my problem.

I ended up with something like this:

model = keras.Sequential()
model.add(layers.LSTM(125, return_sequences=True, input_shape=(x_train.shape[1], 1))
model.add(layers.LSTM(100) return_sequences=False))
model.add(layers.Dense(75))
model.add(layers.Dense(50))
model.add(layers.Dense(25))
model.add(layers.Dense(5))

The network had 5 nodes on its output layer and I tried to train it so that each of these nodes was the next k candle (with k from 1 to 5). No success :(
 
Mathis_ #:
I used a neural network model with first LSTM layers and then Dense layers. I provided the network with the last 5 quotes (last 5 1-minute candlesticks). Each input candle was represented by this data:
- Price data: open, high, low, close, tick_volume, spread, average_price, volume_profile
- Indicator data: ADX, AO, ATR, BearsPower, BullsPower, CCI, DEMA, Tenkan, Kijun, SBB, SSA, MACD, Momentum, RSI, RVI, Stochastic, UO
This gives 25 data items per candle.

So the input layer has 125 nodes (25*5). And then I decreased the number of nodes per layer at each layer. I've simply taken the structure of a neural network shared in a scientific publication, as I don't yet have enough knowledge to adapt it to my problem.

I ended up with something like this:

model = keras.Sequential()
model.add(layers.LSTM(125, return_sequences=True, input_shape=(x_train.shape[1], 1))
model.add(layers.LSTM(100) return_sequences=False))
model.add(layers.Dense(75))
model.add(layers.Dense(50))
model.add(layers.Dense(25))
model.add(layers.Dense(5))

The network had 5 nodes on its output layer and I tried to train it so that each of these nodes was the next k candle (with k from 1 to 5). No success :(

I see . The 5 output nodes were "guessing" the ohlcv of the k candle ? (reLu i guess)

 

Yes, I tested my model with different activation functions such as tanh or ReLu, but there was no real conclusive change.

Not having enough knowledge of Deep Learning yet, I simply tested a lot of combinations and tried to draw conclusions, but without success.
If you know more, I'd love to hear from you :)
 
Mathis_ #:

Yes, I tested my model with different activation functions such as tanh or ReLu, but there was no real conclusive change.

Not having enough knowledge of Deep Learning yet, I simply tested a lot of combinations and tried to draw conclusions, but without success.
If you know more, I'd love to hear from you :)

Okay 

One thing you could do is increase the tail of the features .

I'd drop the indicators too , in theory if you feed in a lot of bars it may sort of "build" it's own indicator inside the hidden layers.

To prevent it from "cheating" :

  • Establish a value that is the sum of the movement of all bars within a window
  • Each bar (close-open) is divided by that value ,you can opt to have 2 input nodes per bar one with (close-open)/range and one with (high-low)/range
  • Each bar is then a value from -1.0 to 1.0 depending on its size and direction
  • In the features it's now impossible for any value to excape the -1.0 to 1.0 range and samples are more related to each other in how they are presented to the network. 
  • The output for the training set is also a bar per node with a similar encoding  -1.0 to 1.0 for tanh or 0.0 to 1.0 (split in half for +/-) for sigmoid .You can also use reLu but you'd have to do some smashing of the weights or the outputs of the hidden layers.So far i've found that normalizing the weights after each batch confused the network however.(in my tests , it does not mean its the case in general as the test may have flaws in them and each solution is a different spectrum)
  • Now the problem is that the output can go over the -1.0 or 1.0 range but in training you can allow that.

Furthermore you could use the mt5 built in calendar and :

  • Don't feed in training data from black swan events , BUT , use that data in your estimates of accuracy because these things will occur when you trade live (prefer to only measure the downside of these events)
  • Use an autoencoder type deployment at first with the features , (A butterfly setup) , let that network understand the features then take out the left portion and the squeeze and connect it to the outputs and train only the right part of the network .

 
Yes I see, since I gave up the technical indicators which seem superfluous to me. How many bars do you advise me to use as a network input?

On the new level 2 data that I am currently collecting, I also tried to link the data more together as you suggested: normalized price difference rather than just the price. (normalized by volume, size, spread, etc). Do you think it is essential that the data be linked together for the network to be more efficient?

I hadn't thought of using training data without black swan events. Thanks for the advice !
I didn't know about autoencoder neural networks. By doing research, I seem to have understood that they allow to reduce the dimensionality of the data and to ignore noise more effectively. Is it this ?
By the way, what is the point of training the right part of the network and not the entire network?

Thank you very much for your advice ! I will continue to learn about autoencoder networks and try your strategy on my model.
 
Mathis_ #:
Yes I see, since I gave up the technical indicators which seem superfluous to me. How many bars do you advise me to use as a network input?

On the new level 2 data that I am currently collecting, I also tried to link the data more together as you suggested: normalized price difference rather than just the price. (normalized by volume, size, spread, etc). Do you think it is essential that the data be linked together for the network to be more efficient?

I hadn't thought of using training data without black swan events. Thanks for the advice !
I didn't know about autoencoder neural networks. By doing research, I seem to have understood that they allow to reduce the dimensionality of the data and to ignore noise more effectively. Is it this ?
By the way, what is the point of training the right part of the network and not the entire network?

Thank you very much for your advice ! I will continue to learn about autoencoder networks and try your strategy on my model.

I don't have an exact answer . In my experiment and my autoencoder i tried 20 if i recall .

A recurrent neural net could be more optimal for this task but i have not stepped in that territory yet (in a sense that you keep feeding it a relative change from the previous tick and you also send the previous state alongside it , but i may be butchering the "techniques" for an rnn as i don't understand it yet).

You could estimate what your desirable holding time is for a trade and then use that amount in m1 bars (because h1 data could reside inside m1 bars but m1 data could not reside in h1 bars).
However you might need to collect (construct) these m1 bars from tick data from your broker yourself.

You could also train 3 networks with varying feature amounts and then "receive the opinion of 3 forecasters" for an upcoming bar(s)

The hypothesis is that the left part of the "butterfly" is a way to generalize the most essential signals of the features you send in because its with those signals the feature space can be reconstructed (not 100%) . That means that in the squeeze you will have more similarities than in the left side of the network (so you could use the squeeze as inputs too in other words instead of building a partial training library) . In my mind its like asking it to use the most essential signals from the window of bars to estimate the future . You could even mix the 3 nets idea and the butterfly idea together and have the 3 "squeezes" as the features .

(EDIT : also i suppose -this is an untested assumption- that the squeeze would allow you to detect outlier cases as the "cluster" of most common cases would be more packed , so then you could -hypothetically- train in "varying terrains" so to speak . simple tarmac road first , then dirt road then mountain etc) 

Then if you also take the right side of the butterfly , from the middle to the right and tune the "essential signals" you may be able to create "unseen" data based on what the encoder has seen thus far ofcourse . That's -kinda- how Gans work. They get the essentials of the inputs then they relate the essentials to the classifications and then a human can toggle them and get new "unique" features. 

 
In fact, I'm already building M1 bars from ticks using an algorithm on MT5. But instead of estimating a holding time for my positions, I thought I'd predict the prices every minute and estimate whether it's better to keep the position or close it (even if it means reopening it a few bars later).

The idea of having three networks is an interesting one, as it would enable us to make predictions over several timeframes and thus gain a better overview. Thank ! 

However, I can't see how we could use the squeeze or the right-hand layers as inputs, as by this depth in the network the data has been combined and the nodes no longer correspond to the input nodes? But indeed the autoencoder is interesting for making feature selections.

By the way, is your autoencoder model profitable for you?
 
Mathis_ #:
In fact, I'm already building M1 bars from ticks using an algorithm on MT5. But instead of estimating a holding time for my positions, I thought I'd predict the prices every minute and estimate whether it's better to keep the position or close it (even if it means reopening it a few bars later).

The idea of having three networks is an interesting one, as it would enable us to make predictions over several timeframes and thus gain a better overview. Thank ! 

However, I can't see how we could use the squeeze or the right-hand layers as inputs, as by this depth in the network the data has been combined and the nodes no longer correspond to the input nodes? But indeed the autoencoder is interesting for making feature selections.

By the way, is your autoencoder model profitable for you?

No its still in development . I'm trying to streamline the process for now -taking advantage of the summer- so i want a system that will do all the steps and advance the "training" and "testing" sets as it goes . 

For example : (the figures are not decided yet)

Training window size 4 weeks , testing window size one week ,inspection window size 3 weeks ,  re-train per week . (so in the real application this would be trained on the weekends)

It will do the above processes on its own , butterfly the training set samples , get the left side spit out new inputs (squeeze nodes) per training sample , find the 75% most "common" samples , train the net on those , then allow the outliers in too . When done move to the test set . The predictions there go from features->squeeze nodes->network->prediction and i evaluate what it does . I also run it on more weeks for an "inspection" . What is the inspection ?

If i find a model that has a steep drop in performance in the "inspection" weeks and one that does not have that steep of a drop off then this may indicate which network "memorized" and which network "learned" something . 

Then keep sliding the "windows" , train with the previous model as a basis and so on.

The biggest issue is crunch time in all that . 

The thing with networks is you will either have to be ultra specific or optimize one aspect of the strategy or create a custom loss function.

  • A specific example : "i have sl X tp Y in my strate , and , these are the points where entry would be profitable , these are the points entry would not be profitable learn to find the profitable points for that strategy"  
  • A one aspect example : "these are the tops , these are the bottoms , no sl or tp , learn to find the tops and bottoms and the rest will be done separately"
  • A custom loss function example : "choose direction (none,buy,sell) find the best sl + tp and the custom loss function actually goes into the future and "simulates" the trade and returns an "error"" 

However, I can't see how we could use the squeeze or the right-hand layers as inputs

 Yeah you won't know what it means what the network outputs on these nodes , but for it it "could" be the generalized version of the features .

 

Hi  Lorentzos

I've been following your insightful posts on RNN and LSTM.

Have you been successful in developing a profitable model based on the information you've shared?