Latest Papers on Radiology AI. Tags: Benchmark SOTA

Explainable AI for Precision Oncology: A Task-Specific Approach Using Imaging, Multi-omics, and Clinical Data

Park, Y., Park, S., Bae, E.

•preprint•Jul 14 2025

Despite continued advances in oncology, cancer remains a leading cause of global mortality, highlighting the need for diagnostic and prognostic tools that are both accurate and interpretable. Unimodal approaches often fail to capture the biological and clinical complexity of tumors. In this study, we present a suite of task-specific AI models that leverage CT imaging, multi-omics profiles, and structured clinical data to address distinct challenges in segmentation, classification, and prognosis. We developed three independent models across large public datasets. Task 1 applied a 3D U-Net to segment pancreatic tumors from CT scans, achieving a Dice Similarity Coefficient (DSC) of 0.7062. Task 2 employed a hierarchical ensemble of omics-based classifiers to distinguish tumor from normal tissue and classify six major cancer types with 98.67% accuracy. Task 3 benchmarked classical machine learning models on clinical data for prognosis prediction across three cancers (LIHC, KIRC, STAD), achieving strong performance (e.g., C-index of 0.820 in KIRC, AUC of 0.978 in LIHC). Across all tasks, explainable AI methods such as SHAP and attention-based visualization enabled transparent interpretation of model outputs. These results demonstrate the value of tailored, modality-aware models and underscore the clinical potential of applying such tailored AI systems for precision oncology. Technical FoundationsO_LISegmentation (Task 1): A custom 3D U-Net was trained using the Task07_Pancreas dataset from the Medical Segmentation Decathlon (MSD). CT images were preprocessed with MONAI-based pipelines, resampled to (64, 96, 96) voxels, and intensity-windowed to HU ranges of -100 to 240. C_LIO_LIClassification (Task 2): Multi-omics data from TCGA--including gene expression, methylation, miRNA, CNV, and mutation profiles--were log-transformed and normalized. Five modality-specific LightGBM classifiers generated meta-features for a late-fusion ensemble. Stratified 5-fold cross-validation was used for evaluation. C_LIO_LIPrognosis (Task 3): Clinical variables from TCGA were curated and imputed (median/mode), with high-missing-rate columns removed. Survival models (e.g., Cox-PH, Random Forest, XGBoost) were trained with early stopping. No omics or imaging data were used in this task. C_LIO_LIInterpretability: SHAP values were computed for all tree-based models, and attention-based overlays were used in imaging tasks to visualize salient regions. C_LI

CT Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA

Deep Learning-Accelerated Prostate MRI: Improving Speed, Accuracy, and Sustainability.

Reschke P, Koch V, Gruenewald LD, Bachir AA, Gotta J, Booz C, Alrahmoun MA, Strecker R, Nickel D, D'Angelo T, Dahm DM, Konrad P, Solim LA, Holzer M, Al-Saleh S, Scholtz JE, Sommer CM, Hammerstingl RM, Eichler K, Vogl TJ, Leistner DM, Haberkorn SM, Mahmoudi S

•papers•Jul 14 2025

This study aims to evaluate the effectiveness of a deep learning (DL)-enhanced four-fold parallel acquisition technique (P4) in improving prostate MR image quality while optimizing scan efficiency compared to the traditional two-fold parallel acquisition technique (P2). Patients undergoing prostate MRI with DL-enhanced acquisitions were analyzed from January 2024 to July 2024. The participants prospectively received T2-weighted sequences in all imaging planes using both P2 and P4. Three independent readers assessed image quality, signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR). Significant differences in contrast and gray-level properties between P2 and P4 were identified through radiomics analysis (p <.05). A total of 51 participants (mean age 69.4 years ± 10.5 years) underwent P2 and P4 imaging. P4 demonstrated higher CNR and SNR values compared to P2 (p <.001). P4 was consistently rated superior to P2, demonstrating enhanced image quality and greater diagnostic precision across all evaluated categories (p <.001). Furthermore, radiomics analysis confirmed that P4 significantly altered structural and textural differentiation in comparison to P2. The P4 protocol reduced T2w scan times by 50.8%, from 11:48 min to 5:48 min (p <.001). In conclusion, P4 imaging enhances diagnostic quality and reduces scan times, improving workflow efficiency, and potentially contributing to a more patient-centered and sustainable radiology practice.

MRI Reconstruction Abdominal Prospective Clinical Pilot Benchmark SOTA

Deep-learning reconstruction for noise reduction in respiratory-triggered single-shot phase sensitive inversion recovery myocardial delayed enhancement cardiac magnetic resonance.

Tang M, Wang H, Wang S, Wali E, Gutbrod J, Singh A, Landeras L, Janich MA, Mor-Avi V, Patel AR, Patel H

•papers•Jul 14 2025

Phase-sensitive inversion recovery late gadolinium enhancement (LGE) improves tissue contrast, however it is challenging to combine with a free-breathing acquisition. Deep-learning (DL) algorithms have growing applications in cardiac magnetic resonance imaging (CMR) to improve image quality. We compared a novel combination of a free-breathing single-shot phase-sensitive LGE with respiratory triggering (FB-PS) sequence with DL noise reduction reconstruction algorithm to a conventional segmented phase-sensitive LGE acquired during breath holding (BH-PS). 61 adult subjects (29 male, age 51 ± 15) underwent clinical CMR (1.5 T) with the FB-PS sequence and the conventional BH-PS sequence. DL noise reduction was incorporated into the image reconstruction pipeline. Qualitative metrics included image quality, artifact severity, diagnostic confidence. Quantitative metrics included septal-blood border sharpness, LGE sharpness, blood-myocardium apparent contrast-to-noise ratio (CNR), LGE-myocardium CNR, LGE apparent signal-to-noise ratio (SNR), and LGE burden. The sequences were compared via paired t-tests. 27 subjects had positive LGE. Average time to acquire a slice for FB-PS was 4-12 s versus ~32-38 s for BH-PS (including breath instructions and break time in between breath hold). FB-PS with medium DL noise reduction had better image quality (FB-PS 3.0 ± 0.7 vs. BH-PS 1.5 ± 0.6, p < 0.0001), less artifact (4.8 ± 0.5 vs. 3.4 ± 1.1, p < 0.0001), and higher diagnostic confidence (4.0 ± 0.6 vs. 2.6 ± 0.8, p < 0.0001). Septum sharpness in FB-PS with DL reconstruction versus BH-PS was not significantly different. There was no significant difference in LGE sharpness or LGE burden. FB-PS had superior blood-myocardium CNR (17.2 ± 6.9 vs. 16.4 ± 6.0, p = 0.040), LGE-myocardium CNR (12.1 ± 7.2 vs. 10.4 ± 6.6, p = 0.054), and LGE SNR (59.8 ± 26.8 vs. 31.2 ± 24.1, p < 0.001); these metrics further improved with DL noise reduction. A FB-PS sequence shortens scan time by over 5-fold and reduces motion artifact. Combined with a DL noise reduction algorithm, FB-PS provides better or similar image quality compared to BH-PS. This is a promising solution for patients who cannot hold their breath.

MRI Reconstruction Cardiac Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Comparing large language models and text embedding models for automated classification of textual, semantic, and critical changes in radiology reports.

Lindholz M, Burdenski A, Ruppel R, Schulze-Weddige S, Baumgärtner GL, Schobert I, Haack AM, Eminovic S, Milnik A, Hamm CA, Frisch A, Penzkofer T

•papers•Jul 14 2025

Radiology reports can change during workflows, especially when residents draft preliminary versions that attending physicians finalize. We explored how large language models (LLMs) and embedding techniques can categorize these changes into textual, semantic, or clinically actionable types. We evaluated 400 adult CT reports drafted by residents against finalized versions by attending physicians. Changes were rated on a five-point scale from no changes to critical ones. We examined open-source LLMs alongside traditional metrics like normalized word differences, Levenshtein and Jaccard similarity, and text embedding similarity. Model performance was assessed using quadratic weighted Cohen's kappa (κ), (balanced) accuracy, F1, precision, and recall. Inter-rater reliability among evaluators was excellent (κ = 0.990). Of the reports analyzed, 1.3 % contained critical changes. The tested methods showed significant performance differences (P < 0.001). The Qwen3-235B-A22B model using a zero-shot prompt, most closely aligned with human assessments of changes in clinical reports, achieving a κ of 0.822 (SD 0.031). The best conventional metric, word difference, had a κ of 0.732 (SD 0.048), the difference between the two showed statistical significance in unadjusted post-hoc tests (P = 0.038) but lost significance after adjusting for multiple testing (P = 0.064). Embedding models underperformed compared to LLMs and classical methods, showing statistical significance in most cases. Large language models like Qwen3-235B-A22B demonstrated moderate to strong alignment with expert evaluations of the clinical significance of changes in radiology reports. LLMs outperformed embedding methods and traditional string and word approaches, achieving statistical significance in most instances. This demonstrates their potential as tools to support peer review.

CT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n

Saadat Behzadi, Danial Sharifrazi, Bita Mesbahzadeh, Javad Hassannataj Joloudarid, Roohallah Alizadehsani

•preprint•Jul 14 2025

Objectives: Timely and accurate detection of colorectal polyps plays a crucial role in diagnosing and preventing colorectal cancer, a major cause of mortality worldwide. This study introduces a new, lightweight, and efficient framework for polyp detection that combines the Local Outlier Factor (LOF) algorithm for filtering noisy data with the YOLO-v11n deep learning model. Study design: An experimental study leveraging deep learning and outlier removal techniques across multiple public datasets. Methods: The proposed approach was tested on five diverse and publicly available datasets: CVC-ColonDB, CVC-ClinicDB, Kvasir-SEG, ETIS, and EndoScene. Since these datasets originally lacked bounding box annotations, we converted their segmentation masks into suitable detection labels. To enhance the robustness and generalizability of our model, we apply 5-fold cross-validation and remove anomalous samples using the LOF method configured with 30 neighbors and a contamination ratio of 5%. Cleaned data are then fed into YOLO-v11n, a fast and resource-efficient object detection architecture optimized for real-time applications. We train the model using a combination of modern augmentation strategies to improve detection accuracy under diverse conditions. Results: Our approach significantly improves polyp localization performance, achieving a precision of 95.83%, recall of 91.85%, F1-score of 93.48%, [email protected] of 96.48%, and [email protected]:0.95 of 77.75%. Compared to previous YOLO-based methods, our model demonstrates enhanced accuracy and efficiency. Conclusions: These results suggest that the proposed method is well-suited for real-time colonoscopy support in clinical settings. Overall, the study underscores how crucial data preprocessing and model efficiency are when designing effective AI systems for medical imaging.

OCT Detection Abdominal Methodology In Silico Academic Lab Benchmark SOTA

STF: A Spherical Transformer for Versatile Cortical Surfaces Applications.

Cheng J, Zhao F, Wu Z, Yuan X, Wang L, Gilmore JH, Lin W, Zhang X, Li G

•papers•Jul 14 2025

Inspired by the remarkable success of attention mechanisms in various applications, there is a growing need to adapt the Transformer architecture from conventional Euclidean domains to non-Euclidean spaces commonly encountered in medical imaging. Structures such as brain cortical surfaces, represented by triangular meshes, exhibit spherical topology and present unique challenges. To address this, we propose the Spherical Transformer (STF), a versatile backbone that leverages self-attention for analyzing cortical surface data. Our approach involves mapping cortical surfaces onto a sphere, dividing them into overlapping patches, and tokenizing both patches and vertices. By performing self-attention at patch and vertex levels, the model simultaneously captures global dependencies and preserves fine-grained contextual information within each patch. Overlapping regions between neighboring patches naturally enable efficient cross-patch information sharing. To handle longitudinal cortical surface data, we introduce the spatiotemporal self-attention mechanism, which jointly captures spatial context and temporal developmental patterns within a single layer. This innovation enhances the representational power of the model, making it well-suited for dynamic surface data. We evaluate the Spherical Transformer on key tasks, including cognition prediction at the surface level and two vertex-level tasks: cortical surface parcellation and cortical property map prediction. Across these applications, our model consistently outperforms state-of-the-art methods, demonstrating its ability to effectively model global dependencies and preserve detailed spatial information. The results highlight its potential as a general-purpose framework for cortical surface analysis.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Predicting the molecular subtypes of 2021 WHO grade 4 glioma by a multiparametric MRI-based machine learning model.

Xu W, Li Y, Zhang J, Zhang Z, Shen P, Wang X, Yang G, Du J, Zhang H, Tan Y

•papers•Jul 14 2025

Accurately distinguishing the different molecular subtypes of 2021 World Health Organization (WHO) grade 4 Central Nervous System (CNS) gliomas is highly relevant for prognostic stratification and personalized treatment. To develop and validate a machine learning (ML) model using multiparametric MRI for the preoperative differentiation of astrocytoma, CNS WHO grade 4, and glioblastoma (GBM), isocitrate dehydrogenase-wild-type (IDH-wt) (WHO 2021) (Task 1:grade 4 vs. GBM); and to stratify astrocytoma, CNS WHO grade 4, by distinguish astrocytoma, IDH-mutant (IDH-mut), CNS WHO grade 4 from astrocytoma, IDH-wild-type (IDH-wt), CNS WHO grade 4 (Task 2:IDH-mut grade 4 vs. IDH-wt grade 4). Additionally, to evaluate the model's prognostic value. We retrospectively analyzed 320 glioma patients from three hospitals (training/testing, 7:3 ratio) and 99 patients from ‌The Cancer Genome Atlas (TCGA) database for external validation‌. Radiomic features were extracted from tumor and edema on contrast-enhanced T1-weighted imaging (CE-T1WI) and T2 fluid-attenuated inversion recovery (T2-FLAIR). Extreme gradient boosting (XGBoost) was utilized for constructing the ML, clinical, and combined models. Model performance was evaluated with receiver operating characteristic (ROC) curves, decision curves, and calibration curves. Stability was evaluated using six additional classifiers. Kaplan-Meier (KM) survival analysis and the log-rank test assessed the model's prognostic value. In Task 1 and Task 2, the combined model (AUC = 0.907, 0.852 and 0.830 for Task 1; AUC = 0.899, 0.895 and 0.792 for Task 2) and the optimal ML model (AUC = 0.902, 0.854 and 0.832 for Task 1; AUC = 0.904, 0.899 and 0.783 for Task 2) significantly outperformed the clinical model (AUC = 0.671, 0.656, and 0.543 for Task 1; AUC = 0.619, 0.605 and 0.400 for Task 2) in both the training, testing and validation sets. Survival analysis showed the combined model performed similarly to molecular subtype in both tasks (p = 0.964 and p = 0.746). The multiparametric MRI ML model effectively distinguished astrocytoma, CNS WHO grade 4 from GBM, IDH-wt (WHO 2021) and differentiated astrocytoma, IDH-mut from astrocytoma, IDH-wt, CNS WHO grade 4. Additionally, the model provided reliable survival stratification for glioma patients across different molecular subtypes.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multimodal Deep Learning Model Based on Ultrasound and Cytological Images Predicts Risk Stratification of cN0 Papillary Thyroid Carcinoma.

He F, Chen S, Liu X, Yang X, Qin X

•papers•Jul 14 2025

Accurately assessing the risk stratification of cN0 papillary thyroid carcinoma (PTC) preoperatively aids in making treatment decisions. We integrated preoperative ultrasound and cytological images of patients to develop and validate a multimodal deep learning (DL) model for non-invasive assessment of N0 PTC risk stratification before surgery. In this retrospective multicenter group study, we developed a comprehensive DL model based on ultrasound and cytological images. The model was trained and validated on 890 PTC patients undergoing thyroidectomy and lymph node dissection across five medical centers. The testing group included 107 patients from one medical center. We analyzed the model's performance, including the area under the receiver operating characteristic curve, accuracy, sensitivity, and specificity. The combined DL model demonstrated strong performance, with an area under the curve (AUC) of 0.922 (0.866-0.979) in the internal validation group and an AUC of 0.845 (0.794-0.895) in the testing group. The diagnostic performance of the combined DL model surpassed that of clinical models. Image region heatmaps assisted in interpreting the diagnosis of risk stratification. The multimodal DL model based on ultrasound and cytological images can accurately determine the risk stratification of N0 PTC and guide treatment decisions.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

An improved U-NET3+ with transformer and adaptive attention map for lung segmentation.

Joseph Raj V, Christopher P

•papers•Jul 13 2025

Accurate segmentation of lung regions from CT scan images is critical for diagnosing and monitoring respiratory diseases. This study introduces a novel hybrid architecture Adaptive Attention U-NetAA, which combines the strengths of U-Net3 + and Transformer based attention mechanisms models for high-precision lung segmentation. The U-Net3 + module effectively segments the lung region by leveraging its deep convolutional network with nested skip connections, ensuring rich multi-scale feature extraction. A key innovation is introducing an adaptive attention mechanism within the Transformer module, which dynamically adjusts the focus on critical regions in the image based on local and global contextual relationships. This model's adaptive attention mechanism addresses variations in lung morphology, image artifacts, and low-contrast regions, leading to improved segmentation accuracy. The combined convolutional and attention-based architecture enhances robustness and precision. Experimental results on benchmark CT datasets demonstrate that the proposed model achieves an IoU of 0.984, a Dice coefficient of 0.989, a MIoU of 0.972, and an HD95 of 1.22 mm, surpassing state-of-the-art methods. These results establish U-NetAA as a superior tool for clinical lung segmentation, with enhanced accuracy, sensitivity, and generalization capability.

CT Segmentation Chest Methodology In Silico Benchmark SOTA

Early breast cancer detection via infrared thermography using a CNN enhanced with particle swarm optimization.

Alzahrani RM, Sikkandar MY, Begum SS, Babetat AFS, Alhashim M, Alduraywish A, Prakash NB, Ng EYK

•papers•Jul 13 2025

Breast cancer remains the most prevalent cause of cancer-related mortality among women worldwide, with an estimated incidence exceeding 500,000 new cases annually. Timely diagnosis is vital for enhancing therapeutic outcomes and increasing survival probabilities. Although conventional diagnostic tools such as mammography are widely used and generally effective, they are often invasive, costly, and exhibit reduced efficacy in patients with dense breast tissue. Infrared thermography, by contrast, offers a non-invasive and economical alternative; however, its clinical adoption has been limited, largely due to difficulties in accurate thermal image interpretation and the suboptimal tuning of machine learning algorithms. To overcome these limitations, this study proposes an automated classification framework that employs convolutional neural networks (CNNs) for distinguishing between malignant and benign thermographic breast images. An Enhanced Particle Swarm Optimization (EPSO) algorithm is integrated to automatically fine-tune CNN hyperparameters, thereby minimizing manual effort and enhancing computational efficiency. The methodology also incorporates advanced image preprocessing techniques-including Mamdani fuzzy logic-based edge detection, Contrast-Limited Adaptive Histogram Equalization (CLAHE) for contrast enhancement, and median filtering for noise suppression-to bolster classification performance. The proposed model achieves a superior classification accuracy of 98.8%, significantly outperforming conventional CNN implementations in terms of both computational speed and predictive accuracy. These findings suggest that the developed system holds substantial potential for early, reliable, and cost-effective breast cancer screening in real-world clinical environments.

OCT Classification Breast Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Explainable AI for Precision Oncology: A Task-Specific Approach Using Imaging, Multi-omics, and Clinical Data

Deep Learning-Accelerated Prostate MRI: Improving Speed, Accuracy, and Sustainability.

Deep-learning reconstruction for noise reduction in respiratory-triggered single-shot phase sensitive inversion recovery myocardial delayed enhancement cardiac magnetic resonance.

Comparing large language models and text embedding models for automated classification of textual, semantic, and critical changes in radiology reports.

A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n

STF: A Spherical Transformer for Versatile Cortical Surfaces Applications.

Predicting the molecular subtypes of 2021 WHO grade 4 glioma by a multiparametric MRI-based machine learning model.

Multimodal Deep Learning Model Based on Ultrasound and Cytological Images Predicts Risk Stratification of cN0 Papillary Thyroid Carcinoma.

An improved U-NET3+ with transformer and adaptive attention map for lung segmentation.

Early breast cancer detection via infrared thermography using a CNN enhanced with particle swarm optimization.

Ready to Sharpen Your Edge?