Latest Papers on Radiology AI. Tags: Benchmark SOTA

DG-TTA: Out-of-Domain Medical Image Segmentation Through Augmentation, Descriptor-Driven Domain Generalization, and Test-Time Adaptation.

Weihsbach C, Kruse CN, Bigalke A, Heinrich MP

•papers•Sep 8 2025

Applying pre-trained medical deep learning segmentation models to out-of-domain images often yields predictions of insufficient quality. In this study, we propose using a robust generalizing descriptor, along with augmentation, to enable domain-generalized pre-training and test-time adaptation, thereby achieving high-quality segmentation in unseen domains. In this study, five different publicly available datasets, including 3D CT and MRI images, are used to evaluate segmentation performance in out-of-domain scenarios. The settings include abdominal, spine, and cardiac imaging. Domain-generalized pre-training on source data is used to obtain the best initial performance in the target domain. We introduce a combination of the generalizing SSC descriptor and GIN intensity augmentation for optimal generalization. Segmentation results are subsequently optimized at test time, where we propose adapting the pre-trained models for every unseen scan using a consistency scheme with the augmentation-descriptor combination. The proposed generalized pre-training and subsequent test-time adaptation improve model performance significantly in CT to MRI cross-domain prediction for abdominal (+46.2 and +28.2 Dice), spine (+72.9), and cardiac (+14.2 and +55.7 Dice) scenarios (p < 0.001). Our method enables the optimal, independent use of source and target data, successfully bridging domain gaps with a compact and efficient methodology.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Radiologist-AI Collaboration for Ischemia Diagnosis in Small Bowel Obstruction: Multicentric Development and External Validation of a Multimodal Deep Learning Model

Vanderbecq, Q., Xia, W. F., Chouzenoux, E., Pesquet, J.-c., Zins, M., Wagner, M.

•preprint•Sep 8 2025

PurposeTo develop and externally validate a multimodal AI model for detecting ischaemia complicating small-bowel obstruction (SBO). MethodsWe combined 3D CT data with routine laboratory markers (C-reactive protein, neutrophil count) and, optionally, radiology report text. From two centers, 1,350 CT examinations were curated; 771 confirmed SBO scans were used for model development with patient-level splits. Ischemia labels were defined by surgical confirmation within 24 hours of imaging. Models (MViT, ResNet-101, DaViT) were trained as unimodal and multimodal variants. External testing was used for 66 independent cases from a third center. Two radiologists (attending, resident) read the test set with and without AI assistance. Performance was assessed using AUC, sensitivity, specificity, and 95% bootstrap confidence intervals; predictions included a confidence score. ResultsThe image-plus-laboratory model performed best on external testing (AUC 0.69 [0.59-0.79], sensitivity 0.89 [0.76-1.00], and specificity 0.44 [0.35-0.54]). Adding report text improved internal validation but did not generalize externally; image+text and full multimodal variants did not exceed image+laboratory performance. Without AI, the attending outperformed the resident (AUC 0.745 [0.617-0.845] vs 0.706 [0.581-0.818]); with AI, both improved, attending 0.752 [0.637-0.853] and resident 0.752 [0.629-0.867], rising to 0.750 [0.631-0.839] and 0.773 [0.657-0.867] with confidence display; differences were not statistically significant. ConclusionA multimodal AI that combines CT images with routine laboratory markers outperforms single-modality approaches and boosts radiologist readers performance notably junior, supporting earlier, more consistent decisions within the first 24 hours. Key PointsA multimodal artificial intelligence (AI) model that combines CT images with laboratory markers detected ischemia in small-bowel obstruction with AUC 0.69 (95% CI 0.59-0.79) and sensitivity 0.89 (0.76-1.00) on external testing, outperforming single-modality models. Adding report text did not generalize across sites: the image+text model fell from AUC 0.82 (internal) to 0.53 (external), and adding text to image+biology left external AUC unchanged (0.69) with similar specificity (0.43-0.44). With AI assistance both junior and senior readers improved; the juniors AUC rose from 0.71 to 0.77, reaching senior-level performance. Summary StatementA multicentric AI model combining CT and routine laboratory data (CRP and neutrophilia) improved radiologists detection of ischemia in small-bowel obstruction. This tool supports earlier decision-making within the first 24 hours.

CT Detection Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.

Bang Andersen I, Søndergaard Svendsen MB, Risgaard AL, Sander Danstrup C, Todsen T, Tolsgaard MG, Friis ML

•papers•Sep 7 2025

Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training. Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time. The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030). A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.

Ultrasound Classification Abdominal Methodology In Silico Academic Lab Benchmark SOTA

AI-Based Applied Innovation for Fracture Detection in X-rays Using Custom CNN and Transfer Learning Models

Amna Hassan, Ilsa Afzaal, Nouman Muneeb, Aneeqa Batool, Hamail Noor

•preprint•Sep 7 2025

Bone fractures present a major global health challenge, often resulting in pain, reduced mobility, and productivity loss, particularly in low-resource settings where access to expert radiology services is limited. Conventional imaging methods suffer from high costs, radiation exposure, and dependency on specialized interpretation. To address this, we developed an AI-based solution for automated fracture detection from X-ray images using a custom Convolutional Neural Network (CNN) and benchmarked it against transfer learning models including EfficientNetB0, MobileNetV2, and ResNet50. Training was conducted on the publicly available FracAtlas dataset, comprising 4,083 anonymized musculoskeletal radiographs. The custom CNN achieved 95.96% accuracy, 0.94 precision, 0.88 recall, and an F1-score of 0.91 on the FracAtlas dataset. Although transfer learning models (EfficientNetB0, MobileNetV2, ResNet50) performed poorly in this specific setup, these results should be interpreted in light of class imbalance and data set limitations. This work highlights the promise of lightweight CNNs for detecting fractures in X-rays and underscores the importance of fair benchmarking, diverse datasets, and external validation for clinical translation

X-Ray Detection Musculoskeletal Methodology In Silico Benchmark SOTA Open Dataset

MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation

Yiwen Ye, Yicheng Wu, Xiangde Luo, He Zhang, Ziyang Chen, Ting Dang, Yanning Zhang, Yong Xia

•preprint•Sep 7 2025

Foundation models have become a promising paradigm for advancing medical image analysis, particularly for segmentation tasks where downstream applications often emerge sequentially. Existing fine-tuning strategies, however, remain limited: parallel fine-tuning isolates tasks and fails to exploit shared knowledge, while multi-task fine-tuning requires simultaneous access to all datasets and struggles with incremental task integration. To address these challenges, we propose MedSeqFT, a sequential fine-tuning framework that progressively adapts pre-trained models to new tasks while refining their representational capacity. MedSeqFT introduces two core components: (1) Maximum Data Similarity (MDS) selection, which identifies downstream samples most representative of the original pre-training distribution to preserve general knowledge, and (2) Knowledge and Generalization Retention Fine-Tuning (K&G RFT), a LoRA-based knowledge distillation scheme that balances task-specific adaptation with the retention of pre-trained knowledge. Extensive experiments on two multi-task datasets covering ten 3D segmentation tasks demonstrate that MedSeqFT consistently outperforms state-of-the-art fine-tuning strategies, yielding substantial performance gains (e.g., an average Dice improvement of 3.0%). Furthermore, evaluations on two unseen tasks (COVID-19-20 and Kidney) verify that MedSeqFT enhances transferability, particularly for tumor segmentation. Visual analyses of loss landscapes and parameter variations further highlight the robustness of MedSeqFT. These results establish sequential fine-tuning as an effective, knowledge-retentive paradigm for adapting foundation models to evolving clinical tasks. Code will be released.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA Open Code

Artificial intelligence-assisted assessment of metabolic response to tebentafusp in metastatic uveal melanoma: a long axial field-of-view [18F]FDG PET/CT study.

Sachpekidis C, Machiraju D, Strauss DS, Pan L, Kopp-Schneider A, Edenbrandt L, Dimitrakopoulou-Strauss A, Hassel JC

•papers•Sep 6 2025

Tebentafusp has emerged as the first systemic therapy to significantly prolong survival in treatment-naïve HLA-A*02:01 + patients with unresectable or metastatic uveal melanoma (mUM). Notably, a survival benefit has been observed even in the absence of radiographic response. This study aims to investigate the feasibility and prognostic value of artificial intelligence (AI)-assisted quantification and metabolic response assessment of [18F]FDG long axial field-of-view (LAFOV) PET/CT in mUM patients undergoing tebentafusp therapy. Fifteen patients with mUM treated with tebentafusp underwent [18F]FDG LAFOV PET/CT at baseline and 3 months post-treatment. Total metabolic tumor volume (TMTV) and total lesion glycolysis (TLG) were quantified using a deep learning-based segmentation tool On the RECOMIA platform. Metabolic response was assessed according to AI-assisted PERCIST 1.0 criteria. Associations between PET-derived parameters and overall survival (OS) were evaluated using Kaplan-Meier survival analysis. The median follow up (95% CI) was 14.1 months (12.9 months - not available). Automated TMTV and TLG measurements were successfully obtained in all patients. Elevated baseline TMTV and TLG were significantly associated with shorter OS (TMTV: 16.9 vs. 27.2 months; TLG: 16.9 vs. 27.2 months; p < 0.05). Similarly, higher TMTV and TLG at 3 months post-treatment predicted poorer survival outcomes (TMTV: 14.3 vs. 24.5 months; TLG: 14.3 vs. 24.5 months; p < 0.05). AI-assisted PERCIST response evaluation identified six patients with disease control (complete metabolic response, partial metabolic response, stable metabolic disease) and nine with progressive metabolic disease. A trend toward improved OS was observed in patients with disease control (24.5 vs. 14.6 months, p = 0.08). Circulating tumor DNA (ctDNA) levels based on GNAQ and GNA11 mutations were available in 8 patients; after 3 months Of tebentafusp treatment, 5 showed reduced Or stable ctDNA levels, and 3 showed an increase (median OS: 24.5 vs. 3.3 months; p = 0.13). Patients with increasing ctDNA levels exhibited significantly higher TMTV and TLG on follow-up imaging. AI-assisted whole-body quantification of [1⁸F]FDG PET/CT and PERCIST-based response assessment are feasible and hold prognostic significance in tebentafusp-treated mUM. TMTV and TLG may serve as non-invasive imaging biomarkers for risk stratification and treatment monitoring in this malignancy.

PET Segmentation Whole Body Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

PM2: A new prompting multi-modal model paradigm for few-shot medical image classification.

Wang Z, Sun Q, Zhang B, Wang P, Zhang J, Zhang Q

•papers•Sep 6 2025

Few-shot learning has emerged as a key technological solution to address challenges such as limited data and the difficulty of acquiring annotations in medical image classification. However, relying solely on a single image modality is insufficient to capture conceptual categories. Therefore, medical image classification requires a comprehensive approach to capture conceptual category information that aids in the interpretation of image content. This study proposes a novel medical image classification paradigm based on a multi-modal foundation model, called PM2. In addition to the image modality, PM2 introduces supplementary text input (prompt) to further describe images or conceptual categories and facilitate cross-modal few-shot learning. We empirically studied five different prompting schemes under this new paradigm. Furthermore, linear probing in multi-modal models only takes class token as input, ignoring the rich statistical data contained in high-level visual tokens. Therefore, we alternately perform linear classification on the feature distributions of visual tokens and class token. To effectively extract statistical information, we use global covariance pool with efficient matrix power normalization to aggregate the visual tokens. We then combine two classification heads: one for handling image class token and prompt representations encoded by the text encoder, and the other for classifying the feature distributions of visual tokens. Experimental results on three datasets: breast cancer, brain tumor, and diabetic retinopathy demonstrate that PM2 effectively improves the performance of medical image classification. Compared to existing multi-modal models, PM2 achieves state-of-the-art performance. Integrating text prompts as supplementary samples effectively enhances the model's performance. Additionally, by leveraging second-order features of visual tokens to enrich the category feature space and combining them with class token, the model's representational capacity is significantly strengthened.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

Implementation of Fully Automated AI-Integrated System for Body Composition Assessment on Computed Tomography for Opportunistic Sarcopenia Screening: Multicenter Prospective Study.

Urooj B, Ko Y, Na S, Kim IO, Lee EH, Cho S, Jeong H, Khang S, Lee J, Kim KW

•papers•Sep 5 2025

Opportunistic computed tomography (CT) screening for the evaluation of sarcopenia and myosteatosis has been gaining emphasis. A fully automated artificial intelligence (AI)-integrated system for body composition assessment on CT scans is a prerequisite for effective opportunistic screening. However, no study has evaluated the implementation of fully automated AI systems for opportunistic screening in real-world clinical practice for routine health check-ups. The aim of this study is to evaluate the performance and clinical utility of a fully automated AI-integrated system for body composition assessment on opportunistic CT during routine health check-ups. This prospective multicenter study included 537 patients who underwent routine health check-ups across 3 institutions. Our AI algorithm models are composed of selecting L3 slice and segmenting muscle and fat area in an end-to-end manner. The AI models were integrated into the Picture Archiving and Communication System (PACS) at each institution. Technical success rate, processing time, and segmentation accuracy in Dice similarity coefficient were assessed. Body composition metrics were analyzed across age and sex groups. The fully automated AI-integrated system successfully retrieved anonymized CT images from the PACS, performed L3 selection and segmentation, and provided body composition metrics, including muscle quality maps and muscle age. The technical success rate was 100% without any failed cases requiring manual adjustment. The mean processing time from CT acquisition to report generation was 4.12 seconds. Segmentation accuracy comparing AI results and human expert results was 97.4%. Significant age-related declines in skeletal muscle area and normal-attenuation muscle area were observed, alongside increases in low-attenuation muscle area and intramuscular adipose tissue. Implementation of the fully automated AI-integrated system significantly enhanced opportunistic sarcopenia screening, achieving excellent technical success and high segmentation accuracy without manual intervention. This system has the potential to transform routine health check-ups by providing rapid and accurate assessments of body composition.

CT Segmentation Abdominal Prospective Clinical Pilot Academic Lab Benchmark SOTA

AI-powered automated model construction for patient-specific CFD simulations of aortic flows.

Du P, An D, Wang C, Wang JX

•papers•Sep 5 2025

Image-based modeling is essential for understanding cardiovascular hemodynamics and advancing the diagnosis and treatment of cardiovascular diseases. Constructing patient-specific vascular models remains labor-intensive, error-prone, and time-consuming, limiting their clinical applications. This study introduces a deep-learning framework that automates the creation of simulation-ready vascular models from medical images. The framework integrates a segmentation module for accurate voxel-based vessel delineation with a surface deformation module that performs anatomically consistent and unsupervised surface refinements guided by medical image data. The integrated pipeline addresses key limitations of existing methods, enhancing geometric accuracy and computational efficiency. Evaluated on public datasets, it achieves state-of-the-art segmentation performance while substantially reducing manual effort and processing time. The resulting vascular models exhibit anatomically accurate and visually realistic geometries, effectively capturing both primary vessels and intricate branching patterns. In conclusion, this work advances the scalability and reliability of image-based computational modeling, facilitating broader applications in clinical and research settings.

CT Segmentation Vascular Methodology In Silico Academic Lab Benchmark SOTA

A dual-branch encoder network based on squeeze-and-excitation UNet and transformer for 3D PET-CT image tumor segmentation.

Li M, Zhu R, Li M, Wang H, Teng Y

•papers•Sep 5 2025

Recognition of tumors is very important in clinical practice and radiomics; however, the segmentation task currently still needs to be done manually by experts. With the development of deep learning, automatic segmentation of tumors is gradually becoming possible. This paper combines the molecular information from PET and the pathology information from CT for tumor segmentation. A dual-branch encoder is designed based on SE-UNet (Squeeze-and-Excitation Normalization UNet) and Transformer, 3D Convolutional Block Attention Module (CBAM) is added to skip-connection, and BCE loss is used in training for improving segmentation accuracy. The new model is named TASE-UNet. The proposed method was tested on the HECKTOR2022 dataset, which obtains the best segmentation accuracy compared with state-of-the-art methods. Specifically, we obtained results of 76.10 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>%</mo></math> and 3.27 for the two key evaluation metrics, DSC and HD95. Experiments demonstrate that the designed network is reasonable and effective. The full implementation is available at https://github.com/LiMingrui1/TASE-UNet .

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

DG-TTA: Out-of-Domain Medical Image Segmentation Through Augmentation, Descriptor-Driven Domain Generalization, and Test-Time Adaptation.

Radiologist-AI Collaboration for Ischemia Diagnosis in Small Bowel Obstruction: Multicentric Development and External Validation of a Multimodal Deep Learning Model

Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.

AI-Based Applied Innovation for Fracture Detection in X-rays Using Custom CNN and Transfer Learning Models

MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation

Artificial intelligence-assisted assessment of metabolic response to tebentafusp in metastatic uveal melanoma: a long axial field-of-view [<sup>18</sup>F]FDG PET/CT study.

PM<sup>2</sup>: A new prompting multi-modal model paradigm for few-shot medical image classification.

Implementation of Fully Automated AI-Integrated System for Body Composition Assessment on Computed Tomography for Opportunistic Sarcopenia Screening: Multicenter Prospective Study.

AI-powered automated model construction for patient-specific CFD simulations of aortic flows.

A dual-branch encoder network based on squeeze-and-excitation UNet and transformer for 3D PET-CT image tumor segmentation.

Ready to Sharpen Your Edge?