Continuous Remote Sensing Image Super-Resolution based on Context Interaction in Implicit Function Space

Keyan Chen^1,2,3,4

Wenyuan Li^1,2,3,4

Sen Lei⁵

Jianqi Chen^1,2,3,4

Xiaolong Jiang¹

Zhengxia Zou^1,4

Zhenwei Shi ^{✉ 1,2,3,4}

Beihang University¹

Beijing Key Laboratory of Digital Media²

State Key Laboratory of Virtual Reality Technology and Systems³

Shanghai Artificial Intelligence Laboratory⁴

AVIC Chengdu Aircraft Industrial (Group) Company Ltd.⁵

Code [GitHub]

Demo [HuggingFace]

Paper [arXiv]

Cite [BibTeX]

Teaser

Our proposed FunSR is capable of producing images of arbitrary resolution by a single trained model. FunSR converts location coordinates with some additional attributes, e.g., scale factor, to RGB values using functions parameterized by the transformed LR image.

Abstract

Despite its fruitful applications in remote sensing, image super-resolution is troublesome to train and deploy as it handles different resolution magnifications with separate models. Accordingly, we propose a highly-applicable super-resolution framework called FunSR, which settles different magnifications with a unified model by exploiting context interaction within implicit function space. FunSR composes a functional representor, a functional interactor, and a functional parser. Specifically, the representor transforms the low-resolution image from Euclidean space to multi-scale pixel-wise function maps; the interactor enables pixel-wise function expression with global dependencies; and the parser, which is parameterized by the interactor's output, converts the discrete coordinates with additional attributes to RGB values. Extensive experimental results demonstrate that FunSR reports state-of-the-art performance on both fixed-magnification and continuous-magnification settings, meanwhile, it provides many friendly applications thanks to its unified nature.

Architecture

The outline of the proposed FunSR for continuous magnification remote sensing image SR. The LR image is first converted to multi-scale parameter maps by the functional representor. Then, we design a functional interactor, \textit{i.e.}, a Transformer encoder, to grasp the effective relationship between functions at different pixel-wise locations and contextual levels. It returns a parameter map with global interaction for the local parser and a semantic parameter vector for the global parser via an additional learnable token. Finally, we weight the RGB value produced by the local and global parsers parameterized with the local parameter map and the global parameter vector, respectively, to generate the final RGB value in the HR image.

Quantitative Results

R1: Benchmark on UCMecred

The results of FunSR versus other comparison methods on the UCMerced Dataset are shown in the Tab., with the best performance shown by a bold number. We just show the upscale factor of x2.0, x2.5, x3.0, x3.5, x4.0, x6.0, x8.0, and x10.0 for simplicity. FunSR nearly achieves the highest performance in terms of PSNR and SSIM across all backbones and upscale factors. Specifically, FunSR outperforms the state-of-the-art fixed magnification transformer-based SR method TransENet (26.98/0.7755) by 27.11/0.7781, 27.24/0.7799, and 27.29/0.7798 on PSNR and SSIM under x4 magnification utilizing EDSR, RCAN, and RDN image encoders, respectively. FunSR has also shown comparable performance with continuous image SR algorithms over different backbones for in-distribution and out-of-distribution training magnifications.

R2: Benchmark on AID

We conduct comparison experiments on the AID dataset to further validate FunSR's effectiveness. Unlike the UCMerced dataset, this one is larger in size and has more scene categories, totaling 30. The following Tab. displays the overall results of various methods on this dataset. It can be seen that, when compared to other approaches, FunSR produces the best results on the majority of magnifications presented across different image encoders.

Visualizations

R1: Different Upscale Factors

The visual comparisons of some image examples upsampling with different scale factors by FunSR-RDN. The LR image is downsampled from the HR reference image with a scale ratio of 1/4. The first two rows are from the UCMerced test set (``tenniscourt99" and ``airplane35"), while the last two are from the AID test set (``bridge_28" and ``denseresidential_20").

R2: Comparisons on the UCMerced Test Set

Comparisons on the UCMerced test set with different methods under x4 factor. Image crops are from ``parkinglot17" and ``denseresidential58" respectively. Zoom in for better visualization.

R2: Comparisons on the AID Test Set

Comparisons on the AID test set with different methods under x4 factor. Image crops are from ``viaduct_271" and ``storagetanks_336" respectively. Zoom in for better visualization.

Acknowledgements

Based on a template by Phillip Isola and Richard Zhang.