Latest Papers on Radiology AI. Tags: Mixed Modality

Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification

Kunal Kawadkar

•preprint•Jul 24 2025

The emergence of Vision Transformers (ViTs) has revolutionized computer vision, yet their effectiveness compared to traditional Convolutional Neural Networks (CNNs) in medical imaging remains under-explored. This study presents a comprehensive comparative analysis of CNN and ViT architectures across three critical medical imaging tasks: chest X-ray pneumonia detection, brain tumor classification, and skin cancer melanoma detection. We evaluated four state-of-the-art models - ResNet-50, EfficientNet-B0, ViT-Base, and DeiT-Small - across datasets totaling 8,469 medical images. Our results demonstrate task-specific model advantages: ResNet-50 achieved 98.37% accuracy on chest X-ray classification, DeiT-Small excelled at brain tumor detection with 92.16% accuracy, and EfficientNet-B0 led skin cancer classification at 81.84% accuracy. These findings provide crucial insights for practitioners selecting architectures for medical AI applications, highlighting the importance of task-specific architecture selection in clinical decision support systems.

Mixed Modality Classification Methodology In Silico

DGEAHorNet: high-order spatial interaction network with dual cross global efficient attention for medical image segmentation.

Peng H, An X, Chen X, Chen Z

•papers•Jul 24 2025

Medical image segmentation is a complex and challenging task, which aims to accurately segment various structures or abnormal regions in medical images. However, obtaining accurate segmentation results is difficult because of the great uncertainty in the shape, location, and scale of the target region. To address these challenges, we propose a higher-order spatial interaction framework with dual cross global efficient attention (DGEAHorNet), which employs a neural network architecture based on recursive gate convolution to adequately extract multi-scale contextual information from images. Specifically, a Dual Cross-Attentions (DCA) is added to the skip connection that can effectively blend multi-stage encoder features and narrow the semantic gap. In the bottleneck stage, global channel spatial attention module (GCSAM) is used to extract image global information. To obtain better feature representation, we feed the output from the GCSAM into the multi-branch dense layer (SENetV2) for excitation. Furthermore, we adopt Depthwise Over-parameterized Convolutional Layer (DO-Conv) in order to replace the common convolutional layer in the input and output part of our network, then add Efficient Attention (EA) to diminish computational complexity and enhance our model's performance. For evaluating the effectiveness of our proposed DGEAHorNet, we conduct comprehensive experiments on four publicly-available datasets, and achieving 0.9320, 0.9337, 0.9312 and 0.7799 in Dice similarity coefficient on ISIC2018, ISIC2017, CVC-ClinicDB and HRF respectively. Our results show that DGEAHorNet has better performance compared with advanced methods. The code is publicly available at https://github.com/penghaixin/mymodel .

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Patient Perspectives on Artificial Intelligence in Health Care: Focus Group Study for Diagnostic Communication and Tool Implementation.

Foresman G, Biro J, Tran A, MacRae K, Kazi S, Schubel L, Visconti A, Gallagher W, Smith KM, Giardina T, Haskell H, Miller K

•papers•Jul 24 2025

Artificial intelligence (AI) is rapidly transforming health care, offering potential benefits in diagnosis, treatment, and workflow efficiency. However, limited research explores patient perspectives on AI, especially in its role in diagnosis and communication. This study examines patient perceptions of various AI applications, focusing on the diagnostic process and communication. This study aimed to examine patient perspectives on AI use in health care, particularly in diagnostic processes and communication, identifying key concerns, expectations, and opportunities to guide the development and implementation of AI tools. This study used a qualitative focus group methodology with co-design principles to explore patient and family member perspectives on AI in clinical practice. A single 2-hour session was conducted with 17 adult participants. The session included interactive activities and breakout sessions focused on five specific AI scenarios relevant to diagnosis and communication: (1) portal messaging, (2) radiology review, (3) digital scribe, (4) virtual human, and (5) decision support. The session was audio-recorded and transcribed, with facilitator notes and demographic questionnaires collected. Data were analyzed using inductive thematic analysis by 2 independent researchers (GF and JB), with discrepancies resolved via consensus. Participants reported varying comfort levels with AI applications contingent on the level of patient interaction, with digital scribe (average 4.24, range 2-5) and radiology review (average 4.00, range 2-5) being the highest, and virtual human (average 1.68, range 1-4) being the lowest. In total, five cross-cutting themes emerged: (1) validation (concerns about model reliability), (2) usability (impact on diagnostic processes), (3) transparency (expectations for disclosing AI usage), (4) opportunities (potential for AI to improve care), and (5) privacy (concerns about data security). Participants valued the co-design session and felt they had a significant say in the discussions. This study highlights the importance of incorporating patient perspectives in the design and implementation of AI tools in health care. Transparency, human oversight, clear communication, and data privacy are crucial for patient trust and acceptance of AI in diagnostic processes. These findings inform strategies for individual clinicians, health care organizations, and policy makers to ensure responsible and patient-centered AI deployment in health care.

Mixed Modality Classification Retrospective Clinical Prototype Academic Lab Ethics Policy

Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models

Xingyu Qiu, Mengying Yang, Xinghua Ma, Dong Liang, Yuzhen Li, Fanding Li, Gongning Luo, Wei Wang, Kuanquan Wang, Shuo Li

•preprint•Jul 24 2025

EDM elucidates the unified design space of diffusion models, yet its fixed noise patterns restricted to pure Gaussian noise, limit advancements in image restoration. Our study indicates that forcibly injecting Gaussian noise corrupts the degraded images, overextends the image transformation distance, and increases restoration complexity. To address this problem, our proposed EDA Elucidates the Design space of Arbitrary-noise-based diffusion models. Theoretically, EDA expands the freedom of noise pattern while preserving the original module flexibility of EDM, with rigorous proof that increased noise complexity incurs no additional computational overhead during restoration. EDA is validated on three typical tasks: MRI bias field correction (global smooth noise), CT metal artifact reduction (global sharp noise), and natural image shadow removal (local boundary-aware noise). With only 5 sampling steps, EDA outperforms most task-specific methods and achieves state-of-the-art performance in bias field correction and shadow removal.

Mixed Modality Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model

Yilong Hu, Shijie Chang, Lihe Zhang, Feng Tian, Weibing Sun, Huchuan Lu

•preprint•Jul 24 2025

The Diffusion Probabilistic Model (DPM) has demonstrated remarkable performance across a variety of generative tasks. The inherent randomness in diffusion models helps address issues such as blurring at the edges of medical images and labels, positioning Diffusion Probabilistic Models (DPMs) as a promising approach for lesion segmentation. However, we find that the current training and inference strategies of diffusion models result in an uneven distribution of attention across different timesteps, leading to longer training times and suboptimal solutions. To this end, we propose UniSegDiff, a novel diffusion model framework designed to address lesion segmentation in a unified manner across multiple modalities and organs. This framework introduces a staged training and inference approach, dynamically adjusting the prediction targets at different stages, forcing the model to maintain high attention across all timesteps, and achieves unified lesion segmentation through pre-training the feature extraction network for segmentation. We evaluate performance on six different organs across various imaging modalities. Comprehensive experimental results demonstrate that UniSegDiff significantly outperforms previous state-of-the-art (SOTA) approaches. The code is available at https://github.com/HUYILONG-Z/UniSegDiff.

Mixed Modality Segmentation Methodology In Silico Open Code Benchmark SOTA

Q-Former Autoencoder: A Modern Framework for Medical Anomaly Detection

Francesco Dalmonte, Emirhan Bayar, Emre Akbas, Mariana-Iuliana Georgescu

•preprint•Jul 24 2025

Anomaly detection in medical images is an important yet challenging task due to the diversity of possible anomalies and the practical impossibility of collecting comprehensively annotated data sets. In this work, we tackle unsupervised medical anomaly detection proposing a modernized autoencoder-based framework, the Q-Former Autoencoder, that leverages state-of-the-art pretrained vision foundation models, such as DINO, DINOv2 and Masked Autoencoder. Instead of training encoders from scratch, we directly utilize frozen vision foundation models as feature extractors, enabling rich, multi-stage, high-level representations without domain-specific fine-tuning. We propose the usage of the Q-Former architecture as the bottleneck, which enables the control of the length of the reconstruction sequence, while efficiently aggregating multiscale features. Additionally, we incorporate a perceptual loss computed using features from a pretrained Masked Autoencoder, guiding the reconstruction towards semantically meaningful structures. Our framework is evaluated on four diverse medical anomaly detection benchmarks, achieving state-of-the-art results on BraTS2021, RESC, and RSNA. Our results highlight the potential of vision foundation model encoders, pretrained on natural images, to generalize effectively to medical image analysis tasks without further fine-tuning. We release the code and models at https://github.com/emirhanbayar/QFAE.

Mixed Modality Detection Methodology In Silico Academic Lab Open Code

Malignancy classification of thyroid incidentalomas using 18F-fluorodeoxy-d-glucose PET/computed tomography-derived radiomics.

Yeghaian M, Piek MW, Bartels-Rutten A, Abdelatty MA, Herrero-Huertas M, Vogel WV, de Boer JP, Hartemink KJ, Bodalal Z, Beets-Tan RGH, Trebeschi S, van der Ploeg IMC

•papers•Jul 24 2025

Thyroid incidentalomas (TIs) are incidental thyroid lesions detected on fluorodeoxy-d-glucose (18F-FDG) PET/computed tomography (PET/CT) scans. This study aims to investigate the role of noninvasive PET/CT-derived radiomic features in characterizing 18F-FDG PET/CT TIs and distinguishing benign from malignant thyroid lesions in oncological patients. We included 46 patients with PET/CT TIs who underwent thyroid ultrasound and thyroid surgery at our oncological referral hospital. Radiomic features extracted from regions of interest (ROI) in both PET and CT images and analyzed for their association with thyroid cancer and their predictive ability. The TIs were graded using the ultrasound TIRADS classification, and histopathological results served as the reference standard. Univariate and multivariate analyses were performed using features from each modality individually and combined. The performance of radiomic features was compared to the TIRADS classification. Among the 46 included patients, 36 patients (78%) had malignant thyroid lesions, while 10 patients (22%) had benign lesions. The combined run length nonuniformity radiomic feature from PET and CT cubical ROIs demonstrated the highest area under the curve (AUC) of 0.88 (P < 0.05), with a negative correlation with malignancy. This performance was comparable to the TIRADS classification (AUC: 0.84, P < 0.05), which showed a positive correlation with thyroid cancer. Multivariate analysis showed higher predictive performance using CT-derived radiomics (AUC: 0.86 ± 0.13) compared to TIRADS (AUC: 0.80 ± 0.08). This study highlights the potential of 18F-FDG PET/CT-derived radiomics to distinguish benign from malignant thyroid lesions. Further studies with larger cohorts and deep learning-based methods could obtain more robust results.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab

Back to the Future-Cardiovascular Imaging From 1966 to Today and Tomorrow.

Wintersperger BJ, Alkadhi H, Wildberger JE

•papers•Jul 23 2025

This article, on the 60th anniversary of the journal Investigative Radiology, a journal dedicated to cutting-edge imaging technology, discusses key historical milestones in CT and MRI technology, as well as the ongoing advancement of contrast agent development for cardiovascular imaging over the past decades. It specifically highlights recent developments and the current state-of-the-art technology, including photon-counting detector CT and artificial intelligence, which will further push the boundaries of cardiovascular imaging. What were once ideas and visions have become today's clinical reality for the benefit of patients, and imaging technology will continue to evolve and transform modern medicine.

Mixed Modality Classification Cardiac Review Clinical Pilot GenAI

Mitigating Data Bias in Healthcare AI with Self-Supervised Standardization.

Lan G, Zhu Y, Xiao S, Iqbal M, Yang J

•papers•Jul 23 2025

The rapid advancement of artificial intelligence (AI) in healthcare has accelerated innovations in medical algorithms, yet its broader adoption faces critical ethical and technical barriers. A key challenge lies in algorithmic bias stemming from heterogeneous medical data across institutions, equipment, and workflows, which may perpetuate disparities in AI-driven diagnoses and exacerbate inequities in patient care. While AI's ability to extract deep features from large-scale data offers transformative potential, its effectiveness heavily depends on standardized, high-quality datasets. Current standardization gaps not only limit model generalizability but also raise concerns about reliability and fairness in real-world clinical settings, particularly for marginalized populations. Addressing these urgent issues, this paper proposes an ethical AI framework centered on a novel self-supervised medical image standardization method. By integrating self-supervised image style conversion, channel attention mechanisms, and contrastive learning-based loss functions, our approach enhances structural and style consistency in diverse datasets while preserving patient privacy through decentralized learning paradigms. Experiments across multi-institutional medical image datasets demonstrate that our method significantly improves AI generalizability without requiring centralized data sharing. By bridging the data standardization gap, this work advances technical foundations for trustworthy AI in healthcare.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Ethics Reproducibility

VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions

Zehui Zhao, Laith Alzubaidi, Haider A. Alwzwazy, Jinglan Zhang, Yuantong Gu

•preprint•Jul 23 2025

In recent years, advanced deep learning architectures have shown strong performance in medical imaging tasks. However, the traditional centralized learning paradigm poses serious privacy risks as all data is collected and trained on a single server. To mitigate this challenge, decentralized approaches such as federated learning and swarm learning have emerged, allowing model training on local nodes while sharing only model weights. While these methods enhance privacy, they struggle with heterogeneous and imbalanced data and suffer from inefficiencies due to frequent communication and the aggregation of weights. More critically, the dynamic and complex nature of clinical environments demands scalable AI systems capable of continuously learning from diverse modalities and multilabels. Yet, both centralized and decentralized models are prone to catastrophic forgetting during system expansion, often requiring full model retraining to incorporate new data. To address these limitations, we propose VGS-ATD, a novel distributed learning framework. To validate VGS-ATD, we evaluate it in experiments spanning 30 datasets and 80 independent labels across distributed nodes, VGS-ATD achieved an overall accuracy of 92.7%, outperforming centralized learning (84.9%) and swarm learning (72.99%), while federated learning failed under these conditions due to high requirements on computational resources. VGS-ATD also demonstrated strong scalability, with only a 1% drop in accuracy on existing nodes after expansion, compared to a 20% drop in centralized learning, highlighting its resilience to catastrophic forgetting. Additionally, it reduced computational costs by up to 50% relative to both centralized and swarm learning, confirming its superior efficiency and scalability.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

Filter Papers

Tags

Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification

DGEAHorNet: high-order spatial interaction network with dual cross global efficient attention for medical image segmentation.

Patient Perspectives on Artificial Intelligence in Health Care: Focus Group Study for Diagnostic Communication and Tool Implementation.

Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models

UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model

Q-Former Autoencoder: A Modern Framework for Medical Anomaly Detection

Malignancy classification of thyroid incidentalomas using 18F-fluorodeoxy-d-glucose PET/computed tomography-derived radiomics.

Back to the Future-Cardiovascular Imaging From 1966 to Today and Tomorrow.

Mitigating Data Bias in Healthcare AI with Self-Supervised Standardization.

VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions

Ready to Sharpen Your Edge?