TITLE:
Polyp Segmentation Network with Dual-Decoder Pyramid Visual Converter
AUTHORS:
Qing’an Yao, Jiapeng Liu, Yuncong Feng, Dongwei Zhuang, Yougang Wang
KEYWORDS:
Colorectal Polyp Segmentation, Dual-Decoder Architecture, Reverse At-tention Mechanism, Multi-Scale Feature Aggregation, Deep Learning
JOURNAL NAME:
Journal of Computer and Communications,
Vol.13 No.6,
June
30,
2025
ABSTRACT: To address the challenges of morphological irregularity and boundary ambiguity in colorectal polyp image segmentation, we propose a Dual-Decoder Pyramid Vision Transformer Network (DDPVT-Net). This architecture integrates a Pyramid Vision Transformer (PVT) encoder with an innovative dual-decoder design that employs reverse attention mechanisms and multi-scale feature aggregation to effectively handle complex tissue patterns and texture variations. Experimental evaluations demonstrate that DDPVT-Net achieves significant improvements over the standard U-Net, with performance gains of 5.65% in mean Intersection over Union (mIoU) and 3.83% in Dice coefficient on the Kvasir-SEG dataset, along with 5.95% and 4.54% improvements respectively on the CVC-ClinicDB dataset. Notably, independent testing on the ETIS-LaribPolypDB benchmark reveals remarkable enhancements of 26.59% in mIoU and 27.43% in Dice coefficient. These quantitative results validate that DDPVT-Net substantially improves the model’s capability to process polyps with diverse shapes and sizes through enhanced multi-scale contextual understanding and precise boundary localization. The proposed framework demonstrates superior segmentation accuracy and generalization capability, establishing a new state-of-the-art solution for computer-assisted clinical diagnosis in gastrointestinal endoscopy.