Latest Papers on Radiology AI. Tags: Mixed Modality, Order: Best Match, Limit: 10.

Can Large Language Models Challenge CNNS in Medical Image Analysis?

Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

•preprint•May 29 2025

This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings.

Mixed Modality Classification Methodology In Silico None GenAI

Deep Modeling and Optimization of Medical Image Classification

Yihang Wu, Muhammad Owais, Reem Kateb, Ahmad Chaddad

•preprint•May 29 2025

Deep models, such as convolutional neural networks (CNNs) and vision transformer (ViT), demonstrate remarkable performance in image classification. However, those deep models require large data to fine-tune, which is impractical in the medical domain due to the data privacy issue. Furthermore, despite the feasible performance of contrastive language image pre-training (CLIP) in the natural domain, the potential of CLIP has not been fully investigated in the medical field. To face these challenges, we considered three scenarios: 1) we introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer, 2) we combine 12 deep models with two federated learning techniques to protect data privacy, and 3) we involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data. The experimental results indicate that maxvit shows the highest averaged (AVG) test metrics (AVG = 87.03\%) in HAM10000 dataset with multimodal learning, while convnext\_l demonstrates remarkable test with an F1-score of 83.98\% compared to swin\_b with 81.33\% in FL model. Furthermore, the use of support vector machine (SVM) can improve the overall test metrics with AVG of $\sim 2\%$ for swin transformer series in ISIC2018. Our codes are available at https://github.com/AIPMLab/SkinCancerSimulation.

Mixed Modality Classification Other Methodology In Silico None Academic Lab Open Code

Can Large Language Models Challenge CNNs in Medical Image Analysis?

Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

•preprint•May 29 2025

Mixed Modality Classification Other Methodology In Silico GenAI

The use of imaging in the diagnosis and treatment of thromboembolic pulmonary hypertension.

Szewczuk K, Dzikowska-Diduch O, Gołębiowski M

•papers•May 29 2025

Chronic thromboembolic pulmonary hypertension (CTEPH) is a potentially life-threatening condition, classified as group 4 pulmonary hypertension (PH), caused by stenosis or occlusion of the pulmonary arteries due to unresolved thromboembolic material. The prognosis for untreated CTEPH patients is poor because it leads to elevated pulmonary artery pressure and right heart failure. Early and accurate diagnosis of CTEPH is crucial because it remains the only form of PH that is potentially curable. However, diagnosing CTEPH is often challenging and frequently delayed or misdiagnosed. This review discusses the current role of multimodal imaging in diagnosing CTEPH, guiding clinical decision-making, and monitoring post-treatment outcomes. The characteristic findings, strengths, and limitations of various imaging modalities, such as computed tomography, ventilation-perfusion lung scintigraphy, digital subtraction pulmonary angiography, and magnetic resonance imaging, are evaluated. Additionally, the role of artificial intelligence in improving the diagnosis and treatment outcomes of CTEPH is explored. Optimal patient assessment and therapeutic decision-making should ideally be conducted in specialized centers by a multidisciplinary team, utilizing data from imaging, pulmonary hemodynamics, and patient comorbidities.

Mixed Modality Classification Chest Review Concept Academic Lab

Multimodal medical image-to-image translation via variational autoencoder latent space mapping.

Liang Z, Cheng M, Ma J, Hu Y, Li S, Tian X

•papers•May 29 2025

Medical image translation has become an essential tool in modern radiotherapy, providing complementary information for target delineation and dose calculation. However, current approaches are constrained by their modality-specific nature, requiring separate model training for each pair of imaging modalities. This limitation hinders the efficient deployment of comprehensive multimodal solutions in clinical practice. To develop a unified image translation method using variational autoencoder (VAE) latent space mapping, which enables flexible conversion between different medical imaging modalities to meet clinical demands. We propose a three-stage approach to construct a unified image translation model. Initially, a VAE is trained to learn a shared latent space for various medical images. A stacked bidirectional transformer is subsequently utilized to learn the mapping between different modalities within the latent space under the guidance of the image modality. Finally, the VAE decoder is fine-tuned to improve image quality. Our internal dataset collected paired imaging data from 87 head and neck cases, with each case containing cone beam computed tomography (CBCT), computed tomography (CT), MR T1c, and MR T2W images. The effectiveness of this strategy is quantitatively evaluated on our internal dataset and a public dataset by the mean absolute error (MAE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Additionally, the dosimetry characteristics of the synthetic CT images are evaluated, and subjective quality assessments of the synthetic MR images are conducted to determine their clinical value. The VAE with the Kullback‒Leibler (KL)-16 image tokenizer demonstrates superior image reconstruction ability, achieving a Fréchet inception distance (FID) of 4.84, a PSNR of 32.80 dB, and an SSIM of 92.33%. In synthetic CT tasks, the model shows greater accuracy in intramodality translations than in cross-modality translations, as evidenced by an MAE of 21.60 ± 8.80 Hounsfield unit (HU) in the CBCT-to-CT task and 45.23 ± 13.21 HU/47.55 ± 13.88 in the MR T1c/T2w-to-CT tasks. For the cross-contrast MR translation tasks, the results are very close, with mean PSNR and SSIM values of 26.33 ± 1.36 dB and 85.21% ± 2.21%, respectively, for the T1c-to-T2w translation and 26.03 ± 1.67 dB and 85.73% ± 2.66%, respectively, for the T2w-to-T1c translation. Dosimetric results indicate that all the gamma pass rates for synthetic CTs are higher than 99% for photon intensity-modulated radiation therapy (IMRT) planning. However, the subjective quality assessment scores for synthetic MR images are lower than those for real MR images. The proposed three-stage approach successfully develops a unified image translation model that can effectively handle a wide range of medical image translation tasks. This flexibility and effectiveness make it a valuable tool for clinical applications.

Mixed Modality Image Synthesis Neurological Methodology In Silico None Academic Lab GenAI

Multimodal artificial intelligence in radiology: Text-dominant reasoning limits image understanding.

Duron L, Lecler A

•papers•May 28 2025

Mixed Modality LLM Radiology Report Neurological Review Concept Academic Lab GenAI Ethics

RadCLIP: Enhancing Radiologic Image Analysis Through Contrastive Language-Image Pretraining.

Lu Z, Li H, Parikh NA, Dillman JR, He L

•papers•May 28 2025

The integration of artificial intelligence (AI) with radiology signifies a transformative era in medicine. Vision foundation models have been adopted to enhance radiologic imaging analysis. However, the inherent complexities of 2D and 3D radiologic data present unique challenges that existing models, which are typically pretrained on general nonmedical images, do not adequately address. To bridge this gap and harness the diagnostic precision required in radiologic imaging, we introduce radiologic contrastive language-image pretraining (RadCLIP): a cross-modal vision-language foundational model that utilizes a vision-language pretraining (VLP) framework to improve radiologic image analysis. Building on the contrastive language-image pretraining (CLIP) approach, RadCLIP incorporates a slice pooling mechanism designed for volumetric image analysis and is pretrained using a large, diverse dataset of radiologic image-text pairs. This pretraining effectively aligns radiologic images with their corresponding text annotations, resulting in a robust vision backbone for radiologic imaging. Extensive experiments demonstrate RadCLIP's superior performance in both unimodal radiologic image classification and cross-modal image-text matching, underscoring its significant promise for enhancing diagnostic accuracy and efficiency in clinical settings. Our key contributions include curating a large dataset featuring diverse radiologic 2D/3D image-text pairs, pretraining RadCLIP as a vision-language foundation model on this dataset, developing a slice pooling adapter with an attention mechanism for integrating 2D images, and conducting comprehensive evaluations of RadCLIP on various radiologic downstream tasks.

Mixed Modality Classification Other Methodology In Silico None Academic Lab GenAI Open Dataset

Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

Siyeop Yoon, Sifan Song, Pengfei Jin, Matthew Tivnan, Yujin Oh, Sekeun Kim, Dufan Wu, Xiang Li, Quanzheng Li

•preprint•May 28 2025

We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative process. An initial score-based diffusion model synthesizes low-resolution PET/CT volumes from demographic variables alone, providing global anatomical structures and approximate metabolic activity. This is followed by a super-resolution residual diffusion model that refines spatial resolution. Our framework was trained on 18-F FDG PET/CT scans from the AutoPET dataset and evaluated using organ-wise volume and standardized uptake value (SUV) distributions, comparing synthetic and real data between demographic subgroups. The organ-wise comparison demonstrated strong concordance between synthetic and real images. In particular, most deviations in metabolic uptake values remained within 3-5% of the ground truth in subgroup analysis. These findings highlight the potential of cascaded 3D diffusion models to generate anatomically and metabolically accurate PET/CT images, offering a robust alternative to traditional phantoms and enabling scalable, population-informed synthetic imaging for clinical and research applications.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Open Dataset GenAI

Integrating SEResNet101 and SE-VGG19 for advanced cervical lesion detection: a step forward in precision oncology.

Ye Y, Chen Y, Pan J, Li P, Ni F, He H

•papers•May 28 2025

Cervical cancer remains a significant global health issue, with accurate differentiation between low-grade (LSIL) and high-grade squamous intraepithelial lesions (HSIL) crucial for effective screening and management. Current methods, such as Pap smears and HPV testing, often fall short in sensitivity and specificity. Deep learning models hold the potential to enhance the accuracy of cervical cancer screening but require thorough evaluation to ascertain their practical utility. This study compares the performance of two advanced deep learning models, SEResNet101 and SE-VGG19, in classifying cervical lesions using a dataset of 3,305 high-quality colposcopy images. We assessed the models based on their accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The SEResNet101 model demonstrated superior performance over SE-VGG19 across all evaluated metrics. Specifically, SEResNet101 achieved a sensitivity of 95%, a specificity of 97%, and an AUC of 0.98, compared to 89% sensitivity, 93% specificity, and an AUC of 0.94 for SE-VGG19. These findings suggest that SEResNet101 could significantly reduce both over- and under-treatment rates by enhancing diagnostic precision. Our results indicate that SEResNet101 offers a promising enhancement over existing screening methods, integrating advanced deep learning algorithms to significantly improve the precision of cervical lesion classification. This study advocates for the inclusion of SEResNet101 in clinical workflows to enhance cervical cancer screening protocols, thereby improving patient outcomes. Future work should focus on multicentric trials to validate these findings and facilitate widespread clinical adoption.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Estimation of time-to-total knee replacement surgery with multimodal modeling and artificial intelligence.

Cigdem O, Hedayati E, Rajamohan HR, Cho K, Chang G, Kijowski R, Deniz CM

•papers•May 27 2025

The methods for predicting time-to-total knee replacement (TKR) do not provide enough information to make robust and accurate predictions. Develop and evaluate an artificial intelligence-based model for predicting time-to-TKR by analyzing longitudinal knee data and identifying key features associated with accelerated knee osteoarthritis progression. A total of 547 subjects underwent TKR in the Osteoarthritis Initiative over nine years, and their longitudinal data was used for model training and testing. 518 and 164 subjects from Multi-Center Osteoarthritis Study and internal hospital data were used for external testing, respectively. The clinical variables, magnetic resonance (MR) images, radiographs, and quantitative and semi-quantitative assessments from images were analyzed. Deep learning (DL) models were used to extract features from radiographs and MR images. DL features were combined with clinical and image assessment features for survival analysis. A Lasso Cox feature selection method combined with a random survival forest model was used to estimate time-to-TKR. Utilizing only clinical variables for time-to-TKR predictions provided the estimation accuracy of 60.4% and C-index of 62.9%. Combining DL features extracted from radiographs, MR images with clinical, quantitative, and semi-quantitative image assessment features achieved the highest accuracy of 73.2%, (p=.001) and C-index of 77.3% for predicting time-to-TKR. The proposed predictive model demonstrated the potential of DL models and multimodal data fusion in accurately predicting time-to-TKR surgery that may help assist physicians to personalize treatment strategies and improve patient outcomes.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico None Academic Lab

Can Large Language Models Challenge CNNS in Medical Image Analysis?

Deep Modeling and Optimization of Medical Image Classification

Can Large Language Models Challenge CNNs in Medical Image Analysis?

The use of imaging in the diagnosis and treatment of thromboembolic pulmonary hypertension.

Multimodal medical image-to-image translation via variational autoencoder latent space mapping.

Multimodal artificial intelligence in radiology: Text-dominant reasoning limits image understanding.

RadCLIP: Enhancing Radiologic Image Analysis Through Contrastive Language-Image Pretraining.

Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

Integrating SEResNet101 and SE-VGG19 for advanced cervical lesion detection: a step forward in precision oncology.

Estimation of time-to-total knee replacement surgery with multimodal modeling and artificial intelligence.

Ready to Sharpen Your Edge?