A Survey on Feature Diversity Enhancement Techniques for Remote Sensing Video Super-Resolution

Authors: Hemalatha K, Dheekshith V, Likhith B S, Aditya Hegde

DOI Link: https://doi.org/10.22214/ijraset.2025.73667

Abstract

Remote sensing video super-resolution (VSR) is a vital technology enabling fine-grained Earth observation from satellites. With growing demands in applications such as envi ronmental monitoring, urban development, and disaster man- agement, improving the resolution of remote sensing videos has become paramount. Traditional video super-resolution methods, designed primarily for natural scenes, often fail to address the unique challenges posed by satellite imagery. This survey comprehensively reviews recent developments in feature diversity enhancement for VSR, focusing on the challenges of spatial, chan- nel, and temporal heterogeneity. We place particular emphasis on MADNet, a novel architecture that integrates Spatial Diversity Enhancement (SDE) and Channel Diversity Enhancement (CDE) into a Multi-Axis Diversity Module (MADM). Furthermore, we compare MADNet with state-of-the-art VSR models, analyze its architectural innovations, and identify future research directions. This paper aims to serve as a foundational resource for re- searchers and practitioners interested in high-fidelity satellite video reconstruction.

Introduction

Satellite video is emerging as a powerful tool for Earth observation due to its ability to continuously monitor areas over time. However, these videos suffer from quality degradation due to factors like:

Platform vibrations
Atmospheric interference
Compression and downsampling

These distortions lead to the loss of high-frequency spatial details, making tasks like object tracking, segmentation, and classification challenging.

II. The Role of Video Super-Resolution (VSR)

VSR aims to reconstruct high-resolution (HR) videos from low-resolution (LR) inputs. Compared to single-image super-resolution (SISR), VSR is more complex due to:

Temporal misalignment between frames
The need to aggregate spatial-temporal features across video sequences

Deep learning methods have outperformed traditional techniques, using:

Sliding-window architectures: Capture local frame redundancy
Recurrent networks: Model temporal dependencies more effectively

Limitations: Most models use static convolutions, which fail to capture the spatial and frequency diversity in satellite data.

III. MADNet: The Proposed Solution

To overcome these limitations, MADNet (Multi-Axis Diversity Network) introduces a novel Multi-Axis Diversity Enhancement Module (MADM), which enhances features along spatial, channel, and frequency domains.

???? Key Components of MADNet:

Spatial Diversity Enhancement (SDE) Module
- Uses dynamic convolution to capture fine-grained, spatially varying patterns.
- Aggregates learnable kernel bases and applies them location-wise.
Channel Diversity Enhancement (CDE) Module
- Uses 2D Discrete Cosine Transform (DCT) to capture frequency relationships between channels.
- Splits and transforms feature chunks, then fuses them adaptively with attention mechanisms.
Auxiliary Branch
- Applies static convolutions to retain global spatial patterns.
MADNet Architecture
- Fuses outputs from SDE, CDE, and auxiliary branches using grouped and depthwise convolutions.
- Enables robust spatial-temporal aggregation while preserving high-frequency details.

IV. Experimental Results

Datasets: JiLin-1, Carbonite-2, SkySat-1, UrtheCast
Metrics: PSNR (Peak Signal-to-Noise Ratio), perceptual quality

???? Performance Comparison (on JiLin-1):

Method	Type	PSNR	FLOPs	Parameters
EDVR	Sliding Window	35.51	High	Large
BasicVSR++	Recurrent	35.94	Medium	Medium
MADNet	Recurrent + MADM	36.35	Low	Compact

MADNet outperforms state-of-the-art models in both accuracy and efficiency.

? Effectiveness of Modules:

SDE contributes +0.27 dB in PSNR.
CDE contributes +0.24 dB in PSNR.

V. Background & Motivation

Satellite videos are prone to viewpoint inconsistencies, low temporal resolution, and spectral complexity.
Traditional CNNs assume spatial/channel homogeneity, which fails on heterogeneous satellite data.
MADNet is motivated by the need for adaptive, diverse feature modeling.

VI. Related Works – Literature Insights

HR Siam (2021): Fast object tracking with pixel-level refinement; limited for stationary objects.
HSPA (2024): Focuses attention on relevant features; performance drops with low-detail images.
BasicVSR & VSR++: Use optical flow for alignment; lack full real-time metrics or are computationally heavy.
DUF (2018): Learns dynamic filters without explicit motion estimation; lacks long-term consistency.
FlowNetSD (2017): Good for fine motion, but large and resource-intensive.

MADNet addresses these challenges with lightweight, adaptive, and frequency-aware design.

Conclusion

MADNet multi-axis feature diversity design makes it a strong candidate for remote sensing VSR tasks. With its lightweight structure and superior performance, it paves the way for real-time, onboard satellite video enhancement.

References

[1] Y. Xiao, et al., “Multi-Axis Feature Diversity Enhancement for Remote Sensing Video Super-Resolution,” IEEE Transactions on Image Process- ing, vol. 34, 2025. [2] Shao, B. Du, C. Wu, M. Gong, and T. Liu, HRSiam: High-resolution Siamese network, towards space-borne satellite video tracking, IEE- Trans. Image Process., vol. 30, pp. 30563068, 2021. [3] Kaselimi, A. Voodoos, I. Daskalopoulos, N. Doulamis, and A. Doulamis, A vision transformer model for convolution-free multilabel classification of satellite imagery in deforestation monitoring,IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 7, pp. 32993307,Jul. 2023. [4] Yang, X. Tang, Y.-M. Cheung, X. Zhang, and L. Jiao, SAG Semantic- aware graph network for remote sensing scene classification,IEEE Trans. Image Process., vol. 32, pp. 10111025, 2023. [5] Guo, Q. Shi, A. Marinoni, B. Du, and L. Zhang, Deep building footprint update network: A semi-supervised method for updating exist ing building footprint from bi-temporal remote sensing images, Remote Sens. Environ., vol. 264, Oct. 2021, Art. no. 112589. [6] Li, W. He, W. Cao, L. Zhang, and H. Zhang, UANet: An un- certaintyaware network for building extraction from remote sensing images,IEEE Trans. Geosci. Remote Sens., vol. 62, 2024, Art. no. 5608513. [7] He, X. Sun, W. Diao, Z. Yan, F. Yao, and K. Fu, Multimodal remote sensing image segmentation with intuition-inspired hypergraph modeling, IEEE Trans. Image Process., vol. 32, pp. 14741487, 2023. [8] Hou, Q. Cao, R. Ran, C. Liu, J. Li, and L.-J. Deng, Bidomain modeling paradigm for pan sharpening, in Proc. 31st ACM Int. Conf.Multimedia, Oct. 2023, pp. 347357. [9] Zhang, Q. Yuan, M. Song, H. Yu, and L. Zhang, Cooperated spectral low-rankness prior and deep spatial prior for HSI unsupervised denois- ing, IEEE Trans. Image Process., vol. 31, pp. 63566368, 2022. [10] Li, K. Zheng, W. Liu, Z. Li, H. Yu, and L. Ni, Model-guided coarse- to-fine fusion network for unsupervised hyperspectral image super- resolution, IEEE Geosci. Remote Sens. Lett., vol. 20, pp. 15,2023. [11] N. Su, M. Gan, G.-Y. Chen, W. Guo, and C. L. P. Chen, High Similarity-Pass attention for single image super-resolution, IEEE Trans. [12] Jiang, Z. Wang, P. Yi, and J. Jiang, Hierarchical dense recursive network for image super-resolution, Pattern Recognit., vol. 107, Nov.2020, Art. no. 107475. [13] Chen, L. Zhang, and L. Zhang, Cross-scope spatial spectral informa tion aggregation for hyperspectral image super-resolution, IEEE Trans. Image Process., vol. 33, pp. 58785891, 2024. [14] Protter, M. Elad, H. Takeda, and P. Milanfar, Generalizing the nonlocal- means to super-resolution reconstruction, IEEE Trans. Image Process., vol. 18, no. 1, pp. 3651, Jan. 2009. [14] S. D. Babacan, R. Molina, and [15] A. K. Katsaggelos, Variational Bayesian super resolution, IEEE Trans. Image Process., vol. 20, no. 4, pp. 984999, Apr. 2011. [16] Chen, L. Zhang, and L. Zhang, MSDformer: Multiscale deformable transformer for hyperspectral image super-resolution, IEEE Trans. Geosci. Remote Sens., vol. 61, 2023, Art. no. 5525614. [17] Xiao, D. Kai, Y. Zhang, X. Sun, and Z. Xiong, Asymmetric event guided video super-resolution, in Proc. ACM Int. Conf. Multimedia, 2024, pp. 24092418. [18] Xu, L. Zhang, B. Du, and L. Zhang, Hyperspectral anomaly detection based on machine learning: An overview, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 33513364, 2022. [19] Wang, K. C. K. Chan, K. Yu, C. Dong, and C. C. Loy, EDVR: Video restoration with enhanced deformable convolutional networks, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, p. 0. [20] C. K. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, BasicVSR: The search for essential components in video super-resolution and beyond, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 49474956. [21] Isobe, X. Jia, S. Gu, S. Li, S. Wang, and Q. Tian, Video super resolution with recurrent structure-detail network, in Proc. Eur. Conf. Comput., Aug. 2020, pp. 645660. [22] Xiao, D. Kai, Y. Zhang, Z.-J. Zha, X. Sun, and Z. Xiong, Event adapted video super-resolution, in Proc. Eur. Conf. Comput. Vis., Oct. 2024, pp. 217235. [23] Jo, S. W. Oh, J. Kang, and S. J. Kim, Deep video super resolution network using dynamic up sampling filters without explicit motion com- sensation, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 32243232. [24] Yu, J. Liu, L. Bo, and T. Mei, Memory-augmented non-local attention for video super-resolution, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 1783417843. [25] Xiao et al., Local-global temporal di erence learning for satellite video super-resolution, IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 4, pp. 27892802, Apr. 2024. [26] S. M. Sajjadi, R. Vemulapalli, and M. Brown, Frame-recurrent video super-resolution, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog- nit., Jun. 2018, pp. 66266634. [27] Isobe, F. Zhu, X. Jia, and S. Wang, Revisiting temporal modeling for video super-resolution, 2020, arXiv:2008.05765. [28] C. K. Chan, S. Zhou, X. Xu, and C. C. Loy, BasicVSR++: Improv ing video super-resolution with enhanced propagation and alignment, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 59725981. [29] Liu, H. Yang, J. Fu, and X. Qian, Learning trajectory-aware trans former for video super-resolution, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 56875696. [30] Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos, Video super resolution with convolutional neural networks, IEEE Trans. Comput. Imag., vol. 2, no. 2, pp. 109122, Jun. 2016. [31] Haris, G. Shakhnarovich, and N. Ukita, Recurrent back-projection network for video super-resolution, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 38973906. [32] Huang, W. Wang, and L. Wang, Bidirectional recurrent convolutional networks for multi-frame super-resolution, in Proc. Adv. Neural Inf. Process. Syst., vol. 28, Dec. 2015, pp. 235243. [33] Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, FlowNet 2.0: Evolution of optical flow estimation with deep networks, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 24622470. [34] Ranjan and M. J. Black, Optical flow estimation using a spatial pyramid network, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 41614170. [35] Caballero et al., Real-time video super-resolution with spatial temporal networks and motion compensation, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2017, pp. 47784787. [36] Wang, Y. Guo, L. Liu, Z. Lin, X. Deng, and W. An, Deep video super-resolution using HR optical flow estimation, IEEE Trans. Image Process., vol. 29, pp. 43234336, 2020. [37] Shi, J. Gu, L. Xie, X. Wang, Y. Yang, and C. Dong, Rethinking alignment in video super-resolution transformers, in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 3608136093. [38] Wen, W. Ren, Y. Shi, Y. Nie, J. Zhang, and X. Cao, Video super resolution via a spatial-temporal alignment network, IEEE Trans. Image Process., vol. 31, pp. 17611773, 2022. [39] Tian, Y. Zhang, Y. Fu, and C. Xu, TDAN: Temporally-deformable alignment network for video super-resolution, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 33603369. [40] Yi, Z. Wang, K. Jiang, J. Jiang, and J. Ma, Progressive fusion video super-resolution network via exploiting non-local spatial-temporal correlations, in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 31063115. [41] Luo, L. Zhou, S. Wang, and Z. Wang, Video satellite imagery super resolution via convolutional neural networks, IEEE Geosci. Remote Sens. Lett., vol. 14, no. 12, pp. 23982402, Dec. 2017. [41] A. Xiao, [42] Z. Wang, L. Wang, and Y. Ren, Super-resolution for jilin-1 satellite video imagery via a convolutional network, Sensors, vol. 18, no. 4, p. 1194, 2018. [43] Jiang, Z. Wang, P. Yi, and J. Jiang, A progressively enhanced net work for video satellite imagery super resolution, IEEE Signal Process. Lett., vol. 25, no. 11, pp. 16301634, Nov. 2018. [44] Liu, Y. Gu, T. Wang, and S. Li, Satellite video super-resolution based on adaptively spatiotemporal neighbors and nonlocal similarity regularization, IEEE Trans. Geosci. Remote Sens., vol. 58, no. 12, pp. 83728383, Dec. 2020. [45] Liu and Y. Gu, Deep joint estimation network for satellite video super- resolution with multiple degradations, IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no. 5621015. [46] He and D. He, A unified network for arbitrary scale super-resolution of video satellite images, IEEE Trans. Geosci. Remote Sens., vol. 59, no. 10, pp. 88128825, Oct. 2021. [47] Xiao, X. Su, Q. Yuan, D. Liu, H. Shen, and L. Zhang, Satellite video super-resolution via multiscale deformable convolution alignment and temporal grouping projection, IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no. 5610819. [48] Ni and L. Zhang, Deformable convolution alignment and dynamic scale-aware network for continuous-scale satellite video super resolution, IEEE Trans. Geosci. Remote Sens., vol. 62, 2024, Art. no. 5610017. [49] Bako et al.., Kernel-predicting convolutional networks for denoising Monte Carlo renderings, ACM Trans. Graph., vol. 36, no. 4, pp. 114, Aug. 2017. [50] Mildenhall, J. T. Barron, J. Chen, D. Sharlet, R. Ng, and R. Carroll, Burst denoising with kernel prediction networks, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 25022510. [51] Jiang, B. Wronski, B. Mildenhall, J. T. Barron, Z. Wang, and T. Xue, Fast and high quality image denoising via malleable convolution, in Proc. Eur. Conf. Comput. Vis., 2022, pp. 429446. [52] Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu, Dynamic con volution: Attention over convolution kernels, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 1103011039. [53] Li, X. Tao, T. Guo, L. Qi, J. Lu, and J. Jia, MuCAN: Multi correspondence aggregation network for video super-resolution, in Proc. 16th Eur. Conf. Comput. Vis. (ECCV), Glasgow, U.K. Springer, Aug. 2020, pp. 335351. [54] Isobe et al.., Video super-resolution with temporal group attention, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 80088017

Copyright

Copyright © 2025 Hemalatha K, Dheekshith V, Likhith B S, Aditya Hegde. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73667

Publish Date : 2025-08-14

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here