Latest Papers on Radiology AI. Tags: Mixed Modality, Order: Best Match, Limit: 10.

Federated Foundation Model for GI Endoscopy Images

Alina Devkota, Annahita Amireskandari, Joel Palko, Shyam Thakkar, Donald Adjeroh, Xiajun Jiang, Binod Bhattarai, Prashnna K. Gyawali

•preprint•May 30 2025

Gastrointestinal (GI) endoscopy is essential in identifying GI tract abnormalities in order to detect diseases in their early stages and improve patient outcomes. Although deep learning has shown success in supporting GI diagnostics and decision-making, these models require curated datasets with labels that are expensive to acquire. Foundation models offer a promising solution by learning general-purpose representations, which can be finetuned for specific tasks, overcoming data scarcity. Developing foundation models for medical imaging holds significant potential, but the sensitive and protected nature of medical data presents unique challenges. Foundation model training typically requires extensive datasets, and while hospitals generate large volumes of data, privacy restrictions prevent direct data sharing, making foundation model training infeasible in most scenarios. In this work, we propose a FL framework for training foundation models for gastroendoscopy imaging, enabling data to remain within local hospital environments while contributing to a shared model. We explore several established FL algorithms, assessing their suitability for training foundation models without relying on task-specific labels, conducting experiments in both homogeneous and heterogeneous settings. We evaluate the trained foundation model on three critical downstream tasks--classification, detection, and segmentation--and demonstrate that it achieves improved performance across all tasks, highlighting the effectiveness of our approach in a federated, privacy-preserving setting.

Mixed Modality Detection Abdominal Methodology In Silico None Academic Lab GenAI

End-to-end 2D/3D registration from pre-operative MRI to intra-operative fluoroscopy for orthopedic procedures.

Ku PC, Liu M, Grupp R, Harris A, Oni JK, Mears SC, Martin-Gomez A, Armand M

•papers•May 30 2025

Soft tissue pathologies and bone defects are not easily visible in intra-operative fluoroscopic images; therefore, we develop an end-to-end MRI-to-fluoroscopic image registration framework, aiming to enhance intra-operative visualization for surgeons during orthopedic procedures. The proposed framework utilizes deep learning to segment MRI scans and generate synthetic CT (sCT) volumes. These sCT volumes are then used to produce digitally reconstructed radiographs (DRRs), enabling 2D/3D registration with intra-operative fluoroscopic images. The framework's performance was validated through simulation and cadaver studies for core decompression (CD) surgery, focusing on the registration accuracy of femur and pelvic regions. The framework achieved a mean translational registration accuracy of 2.4 ± 1.0 mm and rotational accuracy of 1.6 ± <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>0</mn> <mo>.</mo> <msup><mn>8</mn> <mo>∘</mo></msup> </mrow> </math> for the femoral region in cadaver studies. The method successfully enabled intra-operative visualization of necrotic lesions that were not visible on conventional fluoroscopic images, marking a significant advancement in image guidance for femur and pelvic surgeries. The MRI-to-fluoroscopic registration framework offers a novel approach to image guidance in orthopedic surgeries, exclusively using MRI without the need for CT scans. This approach enhances the visualization of soft tissues and bone defects, reduces radiation exposure, and provides a safer, more effective alternative for intra-operative surgical guidance.

Mixed Modality Registration Musculoskeletal Methodology Phantom/Animal None Academic Lab

Phantom-Based Ultrasound-ECG Deep Learning Framework for Prospective Cardiac Computed Tomography.

Ganesh S, Lindsey BD, Tridandapani S, Bhatti PT

•papers•May 30 2025

We present the first multimodal deep learning framework combining ultrasound (US) and electrocardiography (ECG) data to predict cardiac quiescent periods (QPs) for optimized computed tomography angiography gating (CTA). The framework integrates a 3D convolutional neural network (CNN) for US data and an artificial neural network (ANN) for ECG data. A dynamic heart motion phantom, replicating diverse cardiac conditions, including arrhythmias, was used to validate the framework. Performance was assessed across varying QP lengths, cardiac segments, and motions to simulate real-world conditions. The multimodal US-ECG 3D CNN-ANN framework demonstrated improved QP prediction accuracy compared to single-modality ECG-only gating, achieving 96.87% accuracy compared to 85.56%, including scenarios involving arrhythmic conditions. Notably, the framework shows higher accuracy for longer QP durations (100 ms - 200 ms) compared to shorter durations (<100ms), while still outperforming single-modality methods, which often fail to detect shorter quiescent phases, especially in arrhythmic cases. Consistently outperforming single-modality approaches, it achieves reliable QP prediction across cardiac regions, including the whole phantom, interventricular septum, and cardiac wall regions. Analysis of QP prediction accuracy across cardiac segments demonstrated an average accuracy of 92% in clinically relevant echocardiographic views, highlighting the framework's robustness. Combining US and ECG data using a multimodal framework improves QP prediction accuracy under variable cardiac motion, particularly in arrhythmic conditions. Since even small errors in cardiac CTA can result in non-diagnostic scans, the potential benefits of multimodal gating may improve diagnostic scan rates in patients with high and variable heart rates and arrhythmias.

Mixed Modality Triage Cardiac Methodology Phantom/Animal None Academic Lab

Pretraining Deformable Image Registration Networks with Random Images

Junyu Chen, Shuwen Wei, Yihao Liu, Aaron Carass, Yong Du

•preprint•May 30 2025

Recent advances in deep learning-based medical image registration have shown that training deep neural networks~(DNNs) does not necessarily require medical images. Previous work showed that DNNs trained on randomly generated images with carefully designed noise and contrast properties can still generalize well to unseen medical data. Building on this insight, we propose using registration between random images as a proxy task for pretraining a foundation model for image registration. Empirical results show that our pretraining strategy improves registration accuracy, reduces the amount of domain-specific data needed to achieve competitive performance, and accelerates convergence during downstream training, thereby enhancing computational efficiency.

Mixed Modality Registration Other Methodology In Silico None Academic Lab Benchmark SOTA

pyMEAL: A Multi-Encoder Augmentation-Aware Learning for Robust and Generalizable Medical Image Translation

Abdul-mojeed Olabisi Ilyas, Adeleke Maradesa, Jamal Banzi, Jianpan Huang, Henry K. F. Mak, Kannie W. Y. Chan

•preprint•May 30 2025

Medical imaging is critical for diagnostics, but clinical adoption of advanced AI-driven imaging faces challenges due to patient variability, image artifacts, and limited model generalization. While deep learning has transformed image analysis, 3D medical imaging still suffers from data scarcity and inconsistencies due to acquisition protocols, scanner differences, and patient motion. Traditional augmentation uses a single pipeline for all transformations, disregarding the unique traits of each augmentation and struggling with large data volumes. To address these challenges, we propose a Multi-encoder Augmentation-Aware Learning (MEAL) framework that leverages four distinct augmentation variants processed through dedicated encoders. Three fusion strategies such as concatenation (CC), fusion layer (FL), and adaptive controller block (BD) are integrated to build multi-encoder models that combine augmentation-specific features before decoding. MEAL-BD uniquely preserves augmentation-aware representations, enabling robust, protocol-invariant feature learning. As demonstrated in a Computed Tomography (CT)-to-T1-weighted Magnetic Resonance Imaging (MRI) translation study, MEAL-BD consistently achieved the best performance on both unseen- and predefined-test data. On both geometric transformations (like rotations and flips) and non-augmented inputs, MEAL-BD outperformed other competing methods, achieving higher mean peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) scores. These results establish MEAL as a reliable framework for preserving structural fidelity and generalizing across clinically relevant variability. By reframing augmentation as a source of diverse, generalizable features, MEAL supports robust, protocol-invariant learning, advancing clinically reliable medical imaging solutions.

Mixed Modality Image Synthesis Other Methodology In Silico Academic Lab

ACM-UNet: Adaptive Integration of CNNs and Mamba for Efficient Medical Image Segmentation

Jing Huang, Yongkang Zhao, Yuhan Li, Zhitao Dai, Cheng Chen, Qiying Lai

•preprint•May 30 2025

The U-shaped encoder-decoder architecture with skip connections has become a prevailing paradigm in medical image segmentation due to its simplicity and effectiveness. While many recent works aim to improve this framework by designing more powerful encoders and decoders, employing advanced convolutional neural networks (CNNs) for local feature extraction, Transformers or state space models (SSMs) such as Mamba for global context modeling, or hybrid combinations of both, these methods often struggle to fully utilize pretrained vision backbones (e.g., ResNet, ViT, VMamba) due to structural mismatches. To bridge this gap, we introduce ACM-UNet, a general-purpose segmentation framework that retains a simple UNet-like design while effectively incorporating pretrained CNNs and Mamba models through a lightweight adapter mechanism. This adapter resolves architectural incompatibilities and enables the model to harness the complementary strengths of CNNs and SSMs-namely, fine-grained local detail extraction and long-range dependency modeling. Additionally, we propose a hierarchical multi-scale wavelet transform module in the decoder to enhance feature fusion and reconstruction fidelity. Extensive experiments on the Synapse and ACDC benchmarks demonstrate that ACM-UNet achieves state-of-the-art performance while remaining computationally efficient. Notably, it reaches 85.12% Dice Score and 13.89mm HD95 on the Synapse dataset with 17.93G FLOPs, showcasing its effectiveness and scalability. Code is available at: https://github.com/zyklcode/ACM-UNet.

Mixed Modality Segmentation Abdominal Methodology In Silico None Academic Lab Open Code Benchmark SOTA

Deep learning enables fast and accurate quantification of MRI-guided near-infrared spectral tomography for breast cancer diagnosis.

Feng J, Tang Y, Lin S, Jiang S, Xu J, Zhang W, Geng M, Dang Y, Wei C, Li Z, Sun Z, Jia K, Pogue BW, Paulsen KD

•papers•May 29 2025

The utilization of magnetic resonance (MR) im-aging to guide near-infrared spectral tomography (NIRST) shows significant potential for improving the specificity and sensitivity of breast cancer diagnosis. However, the ef-ficiency and accuracy of NIRST image reconstruction have been limited by the complexities of light propagation mod-eling and MRI image segmentation. To address these chal-lenges, we developed and evaluated a deep learning-based approach for MR-guided 3D NIRST image reconstruction (DL-MRg-NIRST). Using a network trained on synthetic data, the DL-MRg-NIRST system reconstructed images from data acquired during 38 clinical imaging exams of pa-tients with breast abnormalities. Statistical analysis of the results demonstrated a sensitivity of 87.5%, a specificity of 92.9%, and a diagnostic accuracy of 89.5% in distinguishing pathologically defined benign from malignant lesions. Ad-ditionally, the combined use of MRI and DL-MRg-NIRST di-agnoses achieved an area under the receiver operating characteristic (ROC) curve of 0.98. Remarkably, the DL-MRg-NIRST image reconstruction process required only 1.4 seconds, significantly faster than state-of-the-art MR-guided NIRST methods.

Mixed Modality Reconstruction Breast Retrospective Clinical Clinical Pilot None Academic Lab Benchmark SOTA

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson

•preprint•May 29 2025

In medical image analysis, feature engineering plays an important role in the design and performance of machine learning models. Persistent homology (PH), from the field of topological data analysis (TDA), demonstrates robustness and stability to data perturbations and addresses the limitation from traditional feature extraction approaches where a small change in input results in a large change in feature representation. Using PH, we store persistent topological and geometrical features in the form of the persistence barcode whereby large bars represent global topological features and small bars encapsulate geometrical information of the data. When multiple barcodes are computed from 2D or 3D medical images, two approaches can be used to construct the final topological feature vector in each dimension: aggregating persistence barcodes followed by featurization or concatenating topological feature vectors derived from each barcode. In this study, we conduct a comprehensive analysis across diverse medical imaging datasets to compare the effects of the two aforementioned approaches on the performance of classification models. The results of this analysis indicate that feature concatenation preserves detailed topological information from individual barcodes, yields better classification performance and is therefore a preferred approach when conducting similar experiments.

Mixed Modality Classification Other Methodology In Silico None Academic Lab

ADC-MambaNet: A Lightweight U-Shaped Architecture with Mamba and Multi-Dimensional Priority Attention for Medical Image Segmentation.

Nguyen TN, Ho QH, Nguyen VQ, Pham VT, Tran TT

•papers•May 29 2025

Medical image segmentation is becoming a growing crucial step in assisting with disease detection and diagnosis. However, medical images often exhibit complex structures and textures, resulting in the need for highly complex methods. Particularly, when Deep Learning methods are utilized, they often require large-scale pretraining, leading to significant memory demands and increased computational costs. The well-known Convolutional Neural Networks (CNNs) have become the backbone of medical image segmentation tasks thanks to their effective feature extraction abilities. However, they often struggle to capture global context due to the limited sizes of their kernels. To address this, various Transformer-based models have been introduced to learn long-range dependencies through self-attention mechanisms. However, these architectures typically incur relatively high computational complexity.Methods: To address the aforementioned challenges, we propose a lightweight and computationally efficient model named ADC-MambaNet, which combines the conventional Depthwise Convolutional layers with the Mamba algorithm that can address the computational complexity of Transformers. In the proposed model, a new feature extractor named Harmonious Mamba-Convolution (HMC) block, and the Multi-Dimensional Priority Attention (MDPA) block have been designed. These blocks enhance the feature extraction process, thereby improving the overall performance of the model. In particular, the mechanisms enable the model to effectively capture local and global patterns from the feature maps while keeping the computational costs low. A novel loss function called the Balanced Normalized Cross Entropy is also introduced, bringing promising performance compared to other losses. Evaluations on five public medical image datasets: ISIC 2018 Lesion Segmentation, PH2, Data Science Bowl 2018, GlaS, and Lung X-ray demonstrate that ADC-MambaNet achieves higher evaluation scores while maintaining compact parameters and low computational complexity.Conclusion: ADC-MambaNet offers a promising solution for accurate and efficient medical image segmentation, especially in resource-limited or edge-computing environments. Implementation code will be publicly accessible at: https://github.com/nqnguyen812/mambaseg-model.

Mixed Modality Segmentation Other Methodology In Silico None Academic Lab Open Code

Gaussian random fields as an abstract representation of patient metadata for multimodal medical image segmentation.

Cassidy B, McBride C, Kendrick C, Reeves ND, Pappachan JM, Raad S, Yap MH

•papers•May 29 2025

Growing rates of chronic wound occurrence, especially in patients with diabetes, has become a recent concerning trend. Chronic wounds are difficult and costly to treat, and have become a serious burden on health care systems worldwide. Innovative deep learning methods for the detection and monitoring of such wounds have the potential to reduce the impact to patients and clinicians. We present a novel multimodal segmentation method which allows for the introduction of patient metadata into the training workflow whereby the patient data are expressed as Gaussian random fields. Our results indicate that the proposed method improved performance when utilising multiple models, each trained on different metadata categories. Using the Diabetic Foot Ulcer Challenge 2022 test set, when compared to the baseline results (intersection over union = 0.4670, Dice similarity coefficient = 0.5908) we demonstrate improvements of +0.0220 and +0.0229 for intersection over union and Dice similarity coefficient respectively. This paper presents the first study to focus on integrating patient data into a chronic wound segmentation workflow. Our results show significant performance gains when training individual models using specific metadata categories, followed by average merging of prediction masks using distance transforms. All source code for this study is available at: https://github.com/mmu-dermatology-research/multimodal-grf.

Mixed Modality Segmentation Other Methodology In Silico None Academic Lab Open Code

Federated Foundation Model for GI Endoscopy Images

End-to-end 2D/3D registration from pre-operative MRI to intra-operative fluoroscopy for orthopedic procedures.

Phantom-Based Ultrasound-ECG Deep Learning Framework for Prospective Cardiac Computed Tomography.

Pretraining Deformable Image Registration Networks with Random Images

pyMEAL: A Multi-Encoder Augmentation-Aware Learning for Robust and Generalizable Medical Image Translation

ACM-UNet: Adaptive Integration of CNNs and Mamba for Efficient Medical Image Segmentation

Deep learning enables fast and accurate quantification of MRI-guided near-infrared spectral tomography for breast cancer diagnosis.

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

ADC-MambaNet: A Lightweight U-Shaped Architecture with Mamba and Multi-Dimensional Priority Attention for Medical Image Segmentation.

Gaussian random fields as an abstract representation of patient metadata for multimodal medical image segmentation.

Ready to Sharpen Your Edge?