Latest Papers on Radiology AI. Tags: Mixed Modality

Can Large Language Models Challenge CNNs in Medical Image Analysis?

Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

•preprint•May 29 2025

This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings.

Mixed Modality Classification Methodology In Silico GenAI

The use of imaging in the diagnosis and treatment of thromboembolic pulmonary hypertension.

Szewczuk K, Dzikowska-Diduch O, Gołębiowski M

•papers•May 29 2025

Chronic thromboembolic pulmonary hypertension (CTEPH) is a potentially life-threatening condition, classified as group 4 pulmonary hypertension (PH), caused by stenosis or occlusion of the pulmonary arteries due to unresolved thromboembolic material. The prognosis for untreated CTEPH patients is poor because it leads to elevated pulmonary artery pressure and right heart failure. Early and accurate diagnosis of CTEPH is crucial because it remains the only form of PH that is potentially curable. However, diagnosing CTEPH is often challenging and frequently delayed or misdiagnosed. This review discusses the current role of multimodal imaging in diagnosing CTEPH, guiding clinical decision-making, and monitoring post-treatment outcomes. The characteristic findings, strengths, and limitations of various imaging modalities, such as computed tomography, ventilation-perfusion lung scintigraphy, digital subtraction pulmonary angiography, and magnetic resonance imaging, are evaluated. Additionally, the role of artificial intelligence in improving the diagnosis and treatment outcomes of CTEPH is explored. Optimal patient assessment and therapeutic decision-making should ideally be conducted in specialized centers by a multidisciplinary team, utilizing data from imaging, pulmonary hemodynamics, and patient comorbidities.

Mixed Modality Classification Chest Review Concept Academic Lab

Multimodal artificial intelligence in radiology: Text-dominant reasoning limits image understanding.

Duron L, Lecler A

•papers•May 28 2025

Mixed Modality LLM Radiology Report Neurological Review Concept Academic Lab GenAI Ethics

RadCLIP: Enhancing Radiologic Image Analysis Through Contrastive Language-Image Pretraining.

Lu Z, Li H, Parikh NA, Dillman JR, He L

•papers•May 28 2025

The integration of artificial intelligence (AI) with radiology signifies a transformative era in medicine. Vision foundation models have been adopted to enhance radiologic imaging analysis. However, the inherent complexities of 2D and 3D radiologic data present unique challenges that existing models, which are typically pretrained on general nonmedical images, do not adequately address. To bridge this gap and harness the diagnostic precision required in radiologic imaging, we introduce radiologic contrastive language-image pretraining (RadCLIP): a cross-modal vision-language foundational model that utilizes a vision-language pretraining (VLP) framework to improve radiologic image analysis. Building on the contrastive language-image pretraining (CLIP) approach, RadCLIP incorporates a slice pooling mechanism designed for volumetric image analysis and is pretrained using a large, diverse dataset of radiologic image-text pairs. This pretraining effectively aligns radiologic images with their corresponding text annotations, resulting in a robust vision backbone for radiologic imaging. Extensive experiments demonstrate RadCLIP's superior performance in both unimodal radiologic image classification and cross-modal image-text matching, underscoring its significant promise for enhancing diagnostic accuracy and efficiency in clinical settings. Our key contributions include curating a large dataset featuring diverse radiologic 2D/3D image-text pairs, pretraining RadCLIP as a vision-language foundation model on this dataset, developing a slice pooling adapter with an attention mechanism for integrating 2D images, and conducting comprehensive evaluations of RadCLIP on various radiologic downstream tasks.

Mixed Modality Classification Methodology In Silico Academic Lab GenAI Open Dataset

Integrating SEResNet101 and SE-VGG19 for advanced cervical lesion detection: a step forward in precision oncology.

Ye Y, Chen Y, Pan J, Li P, Ni F, He H

•papers•May 28 2025

Cervical cancer remains a significant global health issue, with accurate differentiation between low-grade (LSIL) and high-grade squamous intraepithelial lesions (HSIL) crucial for effective screening and management. Current methods, such as Pap smears and HPV testing, often fall short in sensitivity and specificity. Deep learning models hold the potential to enhance the accuracy of cervical cancer screening but require thorough evaluation to ascertain their practical utility. This study compares the performance of two advanced deep learning models, SEResNet101 and SE-VGG19, in classifying cervical lesions using a dataset of 3,305 high-quality colposcopy images. We assessed the models based on their accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The SEResNet101 model demonstrated superior performance over SE-VGG19 across all evaluated metrics. Specifically, SEResNet101 achieved a sensitivity of 95%, a specificity of 97%, and an AUC of 0.98, compared to 89% sensitivity, 93% specificity, and an AUC of 0.94 for SE-VGG19. These findings suggest that SEResNet101 could significantly reduce both over- and under-treatment rates by enhancing diagnostic precision. Our results indicate that SEResNet101 offers a promising enhancement over existing screening methods, integrating advanced deep learning algorithms to significantly improve the precision of cervical lesion classification. This study advocates for the inclusion of SEResNet101 in clinical workflows to enhance cervical cancer screening protocols, thereby improving patient outcomes. Future work should focus on multicentric trials to validate these findings and facilitate widespread clinical adoption.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

Siyeop Yoon, Sifan Song, Pengfei Jin, Matthew Tivnan, Yujin Oh, Sekeun Kim, Dufan Wu, Xiang Li, Quanzheng Li

•preprint•May 28 2025

We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative process. An initial score-based diffusion model synthesizes low-resolution PET/CT volumes from demographic variables alone, providing global anatomical structures and approximate metabolic activity. This is followed by a super-resolution residual diffusion model that refines spatial resolution. Our framework was trained on 18-F FDG PET/CT scans from the AutoPET dataset and evaluated using organ-wise volume and standardized uptake value (SUV) distributions, comparing synthetic and real data between demographic subgroups. The organ-wise comparison demonstrated strong concordance between synthetic and real images. In particular, most deviations in metabolic uptake values remained within 3-5% of the ground truth in subgroup analysis. These findings highlight the potential of cascaded 3D diffusion models to generate anatomically and metabolically accurate PET/CT images, offering a robust alternative to traditional phantoms and enabling scalable, population-informed synthetic imaging for clinical and research applications.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Open Dataset GenAI

Estimation of time-to-total knee replacement surgery with multimodal modeling and artificial intelligence.

Cigdem O, Hedayati E, Rajamohan HR, Cho K, Chang G, Kijowski R, Deniz CM

•papers•May 27 2025

The methods for predicting time-to-total knee replacement (TKR) do not provide enough information to make robust and accurate predictions. Develop and evaluate an artificial intelligence-based model for predicting time-to-TKR by analyzing longitudinal knee data and identifying key features associated with accelerated knee osteoarthritis progression. A total of 547 subjects underwent TKR in the Osteoarthritis Initiative over nine years, and their longitudinal data was used for model training and testing. 518 and 164 subjects from Multi-Center Osteoarthritis Study and internal hospital data were used for external testing, respectively. The clinical variables, magnetic resonance (MR) images, radiographs, and quantitative and semi-quantitative assessments from images were analyzed. Deep learning (DL) models were used to extract features from radiographs and MR images. DL features were combined with clinical and image assessment features for survival analysis. A Lasso Cox feature selection method combined with a random survival forest model was used to estimate time-to-TKR. Utilizing only clinical variables for time-to-TKR predictions provided the estimation accuracy of 60.4% and C-index of 62.9%. Combining DL features extracted from radiographs, MR images with clinical, quantitative, and semi-quantitative image assessment features achieved the highest accuracy of 73.2%, (p=.001) and C-index of 77.3% for predicting time-to-TKR. The proposed predictive model demonstrated the potential of DL models and multimodal data fusion in accurately predicting time-to-TKR surgery that may help assist physicians to personalize treatment strategies and improve patient outcomes.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Rate and Patient Specific Risk Factors for Periprosthetic Acetabular Fractures during Primary Total Hip Arthroplasty using a Pressfit Cup.

Simon S, Gobi H, Mitterer JA, Frank BJ, Huber S, Aichmair A, Dominkus M, Hofstaetter JG

•papers•May 26 2025

Periprosthetic acetabular fractures following primary total hip arthroplasty (THA) using a cementless acetabular component range from occult to severe fractures. The aims of this study were to evaluate the perioperative periprosthetic acetabular fracture rate and patient-specific risks of a modular cementless acetabular component. In this study, we included 7,016 primary THAs (61.4% women, 38.6% men; age, 67 years; interquartile-range, 58 to 74) that received a cementless-hydroxyapatite-coated modular-titanium press-fit acetabular component from a single manufacturer between January 2013 and September 2022. All perioperative radiographs and CT (computer tomography) scans were analyzed for all causes. Patient-specific data and the revision rate were retrieved, and radiographic measurements were performed using artificial intelligence-based software. Following matching based on patients' demographics, a comparison was made between patients who had and did not have periacetabular fractures in order to identify patient-specific and radiographic risk factors for periacetabular fractures. The fracture rate was 0.8% (56 of 7,016). Overall, 33.9% (19 of 56) were small occult fractures solely visible on CT. Additionally, there were 21 of 56 (37.5%) with a stable small fracture. Both groups (40 of 56 (71.4%)) were treated nonoperatively. Revision THA was necessary in 16 of 56, resulting in an overall revision rate of 0.2% (16 of 7,016). Patient-specific risk factors were small acetabular-component size (≤ 50), a low body mass index (BMI) (< 24.5), a higher age (> 68 years), women, a low lateral-central-age-angle (< 24°), a high Extrusion-index (> 20%), a high sharp-angle (> 38°), and a high Tönnis-angle (> 10°). A wide range of periprosthetic acetabular fractures were observed following primary cementless THA. In total, 71.4% of acetabular fractures were small cracks that did not necessitate revision surgery. By identifying patient-specific risk factors, such as advanced age, women, low BMI, and dysplastic hips, future complications may be reduced.

Mixed Modality Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab

tUbe net: a generalisable deep learning tool for 3D vessel segmentation

Holroyd, N. A., Li, Z., Walsh, C., Brown, E. E., Shipley, R. J., Walker-Samuel, S.

•preprint•May 26 2025

Deep learning has become an invaluable tool for bioimage analysis but, while open-source cell annotation software such as cellpose are widely used, an equivalent tool for three-dimensional (3D) vascular annotation does not exist. With the vascular system being directly impacted by a broad range of diseases, there is significant medical interest in quantitative analysis for vascular imaging. However, existing deep learning approaches for this task are specialised to particular tissue types or imaging modalities. We present a new deep learning model for segmentation of vasculature that is generalisable across tissues, modalities, scales and pathologies. To create a generalisable model, a 3D convolutional neural network was trained using data from multiple modalities including optical imaging, computational tomography and photoacoustic imaging. Through this varied training set, the model was forced to learn common features of vessels cross-modality and scale. Following this, the general model was fine-tuned to different applications with a minimal amount of manually labelled ground truth data. It was found that the general model could be specialised to segment new datasets, with a high degree of accuracy, using as little as 0.3% of the volume of that dataset for fine-tuning. As such, this model enables users to produce accurate segmentations of 3D vascular networks without the need to label large amounts of training data.

Mixed Modality Segmentation Vascular Methodology In Silico Academic Lab Open Code

Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging

Ho Hin Lee, Quan Liu, Shunxing Bao, Yuankai Huo, Bennett A. Landman

•preprint•May 26 2025

In contrast to vision transformers, which model long-range dependencies through global self-attention, large kernel convolutions provide a more efficient and scalable alternative, particularly in high-resolution 3D volumetric settings. However, naively increasing kernel size often leads to optimization instability and degradation in performance. Motivated by the spatial bias observed in effective receptive fields (ERFs), we hypothesize that different kernel elements converge at variable rates during training. To support this, we derive a theoretical connection between element-wise gradients and first-order optimization, showing that structurally re-parameterized convolution blocks inherently induce spatially varying learning rates. Building on this insight, we introduce Rep3D, a 3D convolutional framework that incorporates a learnable spatial prior into large kernel training. A lightweight two-stage modulation network generates a receptive-biased scaling mask, adaptively re-weighting kernel updates and enabling local-to-global convergence behavior. Rep3D adopts a plain encoder design with large depthwise convolutions, avoiding the architectural complexity of multi-branch compositions. We evaluate Rep3D on five challenging 3D segmentation benchmarks and demonstrate consistent improvements over state-of-the-art baselines, including transformer-based and fixed-prior re-parameterization methods. By unifying spatial inductive bias with optimization-aware learning, Rep3D offers an interpretable, and scalable solution for 3D medical image analysis. The source code is publicly available at https://github.com/leeh43/Rep3D.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Filter Papers

Tags

Can Large Language Models Challenge CNNs in Medical Image Analysis?

The use of imaging in the diagnosis and treatment of thromboembolic pulmonary hypertension.

Multimodal artificial intelligence in radiology: Text-dominant reasoning limits image understanding.

RadCLIP: Enhancing Radiologic Image Analysis Through Contrastive Language-Image Pretraining.

Integrating SEResNet101 and SE-VGG19 for advanced cervical lesion detection: a step forward in precision oncology.

Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

Estimation of time-to-total knee replacement surgery with multimodal modeling and artificial intelligence.

Rate and Patient Specific Risk Factors for Periprosthetic Acetabular Fractures during Primary Total Hip Arthroplasty using a Pressfit Cup.

tUbe net: a generalisable deep learning tool for 3D vessel segmentation

Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging

Ready to Sharpen Your Edge?