Discussing the article: "Neural networks made easy (Part 78): Decoder-free Object Detector with Transformer (DFFT)"

 

Check out the new article: Neural networks made easy (Part 78): Decoder-free Object Detector with Transformer (DFFT).

In this article, I propose to look at the issue of building a trading strategy from a different angle. We will not predict future price movements, but will try to build a trading system based on the analysis of historical data.

The Decoder-Free Fully Transformer-based (DFFT) method is an efficient object detector based entirely on Decoder-free Transformers. The Transformer backbone is focused on object detection. It extracts them at four scales and sends them to the next single-level encoder-only density prediction module. The prediction module first aggregates the multi-scale feature into a single feature map using the Scale-Aggregated Encoder.

Then, the authors of the method suggest using the Task-Aligned Encoder for simultaneous feature matching for classification and regression problems.

Detection-Oriented Transformer (DOT) backbone is designed to extract multi-scale features with strict semantics. It hierarchically stacks one Embedding module and four DOT stages. The new semantically enhanced attention module aggregates the low-level semantic information of each two successive stages of DOT. 

When processing high-resolution feature maps for dense prediction, conventional transformer blocks reduce computational costs by replacing the multi-head Self-Attention (MSA) with the layer of local spatial attention and a biased windowed multi-head Self-Attention (SW-MSA). However, this structure reduces detection performance because it only extracts multi-scale objects with limited low-level semantics.

Author: Dmitriy Gizlyk