This repository contains the official PyTorch Implementation of "Unifying Dimensions: A Linear Adaptive Mixer for Lightweight Image Super-Resolution"
[Paper] [Code] [Visual Results] [Pretrained Models]
Abstract: Window-based transformers have demonstrated outstanding performance in super-resolution tasks due to their adaptive modeling capabilities through local self-attention (SA). However, they exhibit higher computational complexity and inference latency than convolutional neural networks. In this paper, we first identify that the adaptability of the Transformers is derived from their adaptive spatial aggregation and advanced structural design, while their high latency results from the computational costs and memory layout transformations associated with the local SA. To simulate this aggregation approach, we propose an effective convolution-based linear focal separable attention (FSA), allowing for long-range dynamic modeling with linear complexity. Additionally, we introduce an effective dual-branch structure combined with an ultra-lightweight information exchange module (IEM) to enhance the aggregation of information by the Token Mixer. Finally, with respect to the structure, we modify the existing spatial-gate-based feedforward neural networks by incorporating a self-gate mechanism to preserve high-dimensional channel information, enabling the modeling of more complex relationships. With these advancements, we construct a convolution-based Transformer framework named the linear adaptive mixer network (LAMNet). Extensive experiments demonstrate that LAMNet achieves better performance than existing SA-based Transformer methods while maintaining the computational efficiency of convolutional neural networks, which can achieve a
$3\times$ speedup of inference time. The code will be publicly available at: https://github.com/zononhzy/LAMNet
- python 3.8
- pyTorch == 2.1
cd LAMNet
pip install -r requirements.txt
python setup.py develop
# install fsa
cd basicsr/ops/fsa
python setup.py build install
We used only DIV2K dataset to train our model. Please download the DIV2K dataset from here.
The test set contains five datasets, Set5, Set14, B100, Urban100, Manga109. The benchmark can be downloaded from here
- If you do not use lmdb datasets, you may need to crop the training images into sub_images for reducing I/O times. Please follow here.
- After downloading the test datasets you need, you maybe need to get the downsample LR image. Please follow here.
- Please download the dataset and place them in the folder specified by the training option in folder
/options
. - Follow the instructions below to train our LAMNet (todo).
# test SPAN for lightweight SR task
python basicsr/test.py -opt options/test_LAMNet_SRx2.yml
python basicsr/test.py -opt options/test_LAMNet_SRx3.yml
python basicsr/test.py -opt options/test_LAMNet_SRx4.yml
We provide the results on lightweight image SR. More results can be found in the paper. The visual results of LAMNet can be found in [Visual Results].
Visual Results
Performance comparison
You may want to cite:
The codes are based on BasicSR, Swin Transformer, and SwinIR. Please also follow their licenses. Thanks for their awesome works.
If you have any question, please email zhenyuhu@whu.edu.cn.