A novel encoder–decoder architecture integrating Graph Attention Networks and Fixed-Volume Compression for robust medical image segmentation.
Medical image segmentation demands precise boundary delineation even under dense-distribution and fuzzy-edge conditions. GAT_ASP-UNet couples a Fixed-Volume Compressor (FSCM) with a Graph Attention (GAT) Bridge to introduce structured relational reasoning into skip connections at constant complexity O(1024), independent of input resolution.
| Highlight | Detail |
|---|---|
| 🎯 Task | Medical Image Segmentation |
| 🏗️ Architecture | Dual-path Encoder + Triple-concat Decoder |
| 🧠 Key Innovation | GAT-enhanced skip connections at fixed O(1024) complexity |
| 🏆 Best Result | 90.88% Dice on ISIC2018 (dermoscopic) |
| 🏫 Institution | Bangladesh University of Engineering and Technology (BUET) |
GAT_ASP-UNet is built around a dual-path encoder and triple-concatenation decoder with four key components:
The primary engine for multi-scale context capture. Employs 10 parallel branches with 1×1, 3×3, 5×5, and dilated kernels alongside an ASPP block. Each branch incorporates downsampled self-attention, followed by a global Multi-Head Attention (MHA) layer and 1×1 fusion with a residual connection.
Forces features into a compact 32×32 spatial volume via 1×1 channel reduction and adaptive average pooling. This bounds the complexity of all subsequent attention and GAT operations to a constant O(1024), making the most intensive parts of the network resolution-independent.
Input Tensor (B, in_ch, H, W)
→ Channel Reduction (1×1 Conv, BN, ReLU)
→ Feature Refinement (Deformable Conv 3×3)
→ Forced Spatial Resize (AdaptiveAvgPool2d → 32×32)
→ Multi-Scale ASPP (DSConv dil=12, dil=6, Conv 1×1)
→ Final Projection (1×1 Conv, BN, ReLU)
Output Tensor (B, out_ch, 32, 32)
Converts spatial features into a 4-neighbourhood grid graph and applies GATConv (PyTorch Geometric) for relational reasoning between spatial nodes. Features are pooled to a P×P grid, processed, then upsampled back to the original skip connection dimensions.
A lightweight bottleneck combining:
- GhostModule for cheap feature expansion
- LiteRFB (multi-branch dilated depthwise convolutions) to expand the receptive field
- CoordinateAttention for spatial-aware channel gating
Dual-Path Encoder (per stage):
Path A: Encoder Output → FSCM → ADFM → ChannelAdapt → Upsample
Path B: Encoder Output → MaxPool/Downsample
→ Concatenate & Fuse (1×1 Conv)
Triple-Concat Decoder:
(1) Direct encoder features
(2) GAT Bridge outputs
(3) Previous decoder features re-processed via FSCM → ADFM
| Dataset | Modality | Split | IoU (%) | Dice (%) | Precision (%) | Recall (%) | Accuracy (%) |
|---|---|---|---|---|---|---|---|
| Kvasir-SEG | Endoscopic | Val | 76.17 | 86.14 | 88.99 | 84.27 | 95.86 |
| Kvasir-SEG | Endoscopic | Train | 74.48 | 85.03 | 88.07 | 83.13 | 95.60 |
| ISIC2018 | Dermoscopic | Val | 83.61 | 90.88 | 93.64 | 88.81 | 96.24 |
| ISIC2018 | Dermoscopic | Train | 82.62 | 90.28 | 92.37 | 88.88 | 96.04 |
| Breast US B | Ultrasound | Val | 66.64 | 79.40 | 82.75 | 76.88 | 98.17 |
| Breast US B | Ultrasound | Train | 63.54 | 77.37 | 73.91 | 83.34 | 97.86 |
| Model / Variant | Val Loss | Val IoU (%) | Val Dice (%) | Val Acc (%) | Val Prec (%) | Val Rec (%) |
|---|---|---|---|---|---|---|
| DDSUNet – Dice loss | 0.11762 | 86.74 | 92.63 | 98.61 | 94.69 | 91.33 |
| DDSUNet – Dice + BCE | 0.04925 | 85.28 | 91.77 | 98.51 | 93.77 | 90.53 |
| DDSUNet + GATConv | 0.12696 | 86.39 | 92.47 | 98.60 | 93.77 | 91.91 |
| Proposed – Focal Tversky | 0.21342 | 72.58 | 83.22 | 96.76 | 83.59 | 84.99 |
| Proposed – Dice + BCE | 0.23182 | 76.30 | 86.14 | 97.54 | 88.98 | 84.58 |
Key finding: The GATConv baseline variant achieved the highest validation recall (91.91%), empirically supporting the thesis that graph-based relational reasoning captures complex non-local boundaries.
| Dataset | Modality / Target | Images |
|---|---|---|
| CVC-ClinicDB | Endoscopic / Polyp detection | 612 |
| ISIC2018 | Dermoscopic / Skin lesions | 2,596 |
| Kvasir-SEG | Endoscopic / Polyp images | 1,000 |
| Breast Ultrasound B | Ultrasound / Breast lesions | 163 |
| Parameter | Value |
|---|---|
| Dataset Split | 80% Train / 20% Validation |
| Epochs | 100 |
| Learning Rate | 1e-4 |
| Optimizer | Adam |
| Primary Loss | Dice + BCE |
| Secondary Loss | Focal Tversky + IoU |
| Augmentations | 9 techniques (elastic transform, colour jitter, random occlusion, and more) |
| Evaluation Metrics | IoU, Dice, Precision, Recall, Accuracy |
- ✅ Dermoscopic strength: Model generalises exceptionally well to ISIC2018 (Dice: 90.88%)
- ✅ GAT recall boost: Graph-based skip connections improve boundary recall over standard baselines
⚠️ Overfitting on CVC-ClinicDB: Validation performance lags behind DDSUNet baselines⚠️ Ultrasound domain gap: Lower Breast Ultrasound B scores suggest need for domain-specific augmentation
- Enhanced Regularisation — Dropout and weight decay within ADFM and GAT Bridge for improved robustness
- Structural Ablations — Investigate GAT layer depth and attention head counts to optimise relational reasoning vs. parameter count
- Data Augmentation — Stronger elastic transforms and random occlusion for clinical endoscopic variability
If you find this work useful, please cite:
@article{chowdhury2026gataspunet,
title = {GAT\_ASP-UNet: Unified Deep Learning Approaches — Ensemble-Based U-Net for Medical Image Segmentation},
author = {Chowdhury, Rahul Drabit and Hasnain, Masab and Islam, Mareful},
institution = {Bangladesh University of Engineering and Technology},
year = {2026}
}| Name | Student ID | |
|---|---|---|
| Rahul Drabit Chowdhury | 0424057003 | rahuldrabit@gmail.com |
| Masab Hasnain | 0424052099 | masabhasnain1@gmail.com |
| Mareful Islam | 0424056005 | 0424056005@grad.cse.buet.ac.bd |
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET)
- O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," MICCAI, 2015.
- Y. Ou et al., "Enhanced medical image segmentation via deep dynamic self-adjusting U-Net with multi-scale attention and semantic mitigation," The Visual Computer, vol. 41, 2025.
- Y. Wang, S. Wang, and J. He, "MFA U-Net: a U-Net like multi-stage feature analysis network," Pattern Analysis and Applications, vol. 27, 2024.
- M. R. Ahmed et al., "DoubleU-NetPlus: a novel attention and context-guided dual U-Net," Neural Computing and Applications, vol. 35, 2023.
- H. Wang et al., "UCTransNet: rethinking the skip connections in U-Net from a channel-wise perspective with transformer," AAAI, vol. 36, 2022.