Synergistic fusion of a multilevel visual transformer in CNN for variable-length volumetric radiographic data analysis and content-based retrieval.
Authors
Affiliations (5)
Affiliations (5)
- Khalifa University Center for Autonomous Robotic Systems (KU-CARS), Khalifa University, Abu Dhabi, United Arab Emirates. [email protected].
- School of Computer Science, University of Galway, H91 TK33, Galway, Ireland.
- Department of Applied Artificial Intelligence and Robotics, School of Computer Science and Digital Technologies, Aston University, Birmingham, United Kingdom.
- Khalifa University Center for Autonomous Robotic Systems (KU-CARS), Khalifa University, Abu Dhabi, United Arab Emirates.
- Khalifa University Center for Autonomous Robotic Systems (KU-CARS), Khalifa University, Abu Dhabi, United Arab Emirates. [email protected].
Abstract
Volumetric radiographic data analysis poses significant challenges due to its 3D structure and variable input lengths. Moreover, the unpredictable distribution of diseased regions, often spanning multiple slices and interspersed with normal tissue within abnormal volumes, further complicates the analysis. Despite advancements, existing 3D volumetric analysis methods predominantly rely on 2D slice selection and expert intervention, limiting scalability and efficiency. Additionally, a prevailing challenge is harmonizing the analysis of volumetric radiographic data with variable length. To address these limitations, we introduce a novel deep learning framework that synergistically fuses a lightweight multilevel vision transformer with a convolutional neural network (CNN). The proposed approach independently extracts and aggregates spatial features from 2D slices while preserving multilevel contextual information. A second-stage recurrent module is further integrated to handle variable-length inputs by leveraging single annotations for complete 3D volumes and exploiting their structural features. Empirical validation of our method is conducted on a composite of three publicly accessible radiographic repositories, demonstrating superiority (p-value < 0.01) over existing alternatives. The results achieved highlight remarkable metrics: 98.54% accuracy, 98.51% F1-score, 98.77% average precision, and 98.25% average recall. To facilitate further research and development, we will publicly release the proposed framework and associated resources, providing a robust foundation for future studies. The implementation and materials is available at our GitHub.