Latest Papers on Radiology AI. Tags: Mixed Modality

KonfAI: A Modular and Fully Configurable Framework for Deep Learning in Medical Imaging

Valentin Boussot, Jean-Louis Dillenseger

•preprint•Aug 13 2025

KonfAI is a modular, extensible, and fully configurable deep learning framework specifically designed for medical imaging tasks. It enables users to define complete training, inference, and evaluation workflows through structured YAML configuration files, without modifying the underlying code. This declarative approach enhances reproducibility, transparency, and experimental traceability while reducing development time. Beyond the capabilities of standard pipelines, KonfAI provides native abstractions for advanced strategies including patch-based learning, test-time augmentation, model ensembling, and direct access to intermediate feature representations for deep supervision. It also supports complex multi-model training setups such as generative adversarial architectures. Thanks to its modular and extensible architecture, KonfAI can easily accommodate custom models, loss functions, and data processing components. The framework has been successfully applied to segmentation, registration, and image synthesis tasks, and has contributed to top-ranking results in several international medical imaging challenges. KonfAI is open source and available at \href{https://github.com/vboussot/KonfAI}{https://github.com/vboussot/KonfAI}.

Mixed Modality Segmentation Methodology Prototype Open Source Open Code Reproducibility

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

Haibo Jin, Haoxuan Che, Sunan He, Hao Chen

•preprint•Aug 13 2025

Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Mixed Modality Report Generation Methodology In Silico Academic Lab Open Dataset Benchmark SOTA GenAI

A stacking ensemble framework integrating radiomics and deep learning for prognostic prediction in head and neck cancer.

Wang B, Liu J, Zhang X, Lin J, Li S, Wang Z, Cao Z, Wen D, Liu T, Ramli HRH, Harith HH, Hasan WZW, Dong X

•papers•Aug 13 2025

Radiomics models frequently face challenges related to reproducibility and robustness. To address these issues, we propose a multimodal, multi-model fusion framework utilizing stacking ensemble learning for prognostic prediction in head and neck cancer (HNC). This approach seeks to improve the accuracy and reliability of survival predictions. A total of 806 cases from nine centers were collected; 143 cases from two centers were assigned as the external validation cohort, while the remaining 663 were stratified and randomly split into training (n = 530) and internal validation (n = 133) sets. Radiomics features were extracted according to IBSI standards, and deep learning features were obtained using a 3D DenseNet-121 model. Following feature selection, the selected features were input into Cox, SVM, RSF, DeepCox, and DeepSurv models. A stacking fusion strategy was employed to develop the prognostic model. Model performance was evaluated using Kaplan-Meier survival curves and time-dependent ROC curves. On the external validation set, the model using combined PET and CT radiomics features achieved superior performance compared to single-modality models, with the RSF model obtaining the highest concordance index (C-index) of 0.7302. When using deep features extracted by 3D DenseNet-121, the PET + CT-based models demonstrated significantly improved prognostic accuracy, with Deepsurv and DeepCox achieving C-indices of 0.9217 and 0.9208, respectively. In stacking models, the PET + CT model using only radiomics features reached a C-index of 0.7324, while the deep feature-based stacking model achieved 0.9319. The best performance was obtained by the multi-feature fusion model, which integrated both radiomics and deep learning features from PET and CT, yielding a C-index of 0.9345. Kaplan-Meier survival analysis further confirmed the fusion model's ability to distinguish between high-risk and low-risk groups. The stacking-based ensemble model demonstrates superior performance compared to individual machine learning models, markedly improving the robustness of prognostic predictions.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

ES-UNet: efficient 3D medical image segmentation with enhanced skip connections in 3D UNet.

Park M, Oh S, Park J, Jeong T, Yu S

•papers•Aug 13 2025

Deep learning has significantly advanced medical image analysis, particularly in semantic segmentation, which is essential for clinical decisions. However, existing 3D segmentation models, like the traditional 3D UNet, face challenges in balancing computational efficiency and accuracy when processing volumetric medical data. This study aims to develop an improved architecture for 3D medical image segmentation with enhanced learning strategies to improve accuracy and address challenges related to limited training data. We propose ES-UNet, a 3D segmentation architecture that achieves superior segmentation performance while offering competitive efficiency across multiple computational metrics, including memory usage, inference time, and parameter count. The model builds upon the full-scale skip connection design of UNet3+ by integrating channel attention modules into each encoder-to-decoder path and incorporating full-scale deep supervision to enhance multi-resolution feature learning. We further introduce Region Specific Scaling (RSS), a data augmentation method that adaptively applies geometric transformations to annotated regions, and a Dynamically Weighted Dice (DWD) loss to improve the balance between precision and recall. The model was evaluated on the MICCAI HECKTOR dataset, and additional validation was conducted on selected tasks from the Medical Segmentation Decathlon (MSD). On the HECKTOR dataset, ES-UNet achieved a Dice Similarity Coefficient (DSC) of 76.87%, outperforming baseline models including 3D UNet, 3D UNet 3+, nnUNet, and Swin UNETR. Ablation studies showed that RSS and DWD contributed up to 1.22% and 1.06% improvement in DSC, respectively. A sensitivity analysis demonstrated that the chosen scaling range in RSS offered a favorable trade-off between deformation and anatomical plausibility. Cross-dataset evaluation on MSD Heart and Spleen tasks also indicated strong generalization. Computational analysis revealed that ES-UNet achieves superior segmentation performance with moderate computational demands. Specifically, the enhanced skip connection design with lightweight channel attention modules integrated throughout the network architecture enables this favorable balance between high segmentation accuracy and computational efficiency. ES-UNet integrates architectural and algorithmic improvements to achieve robust 3D medical image segmentation. While the framework incorporates established components, its core contributions lie in the optimized skip connection strategy and supporting techniques like RSS and DWD. Future work will explore adaptive scaling strategies and broader validation across diverse imaging modalities.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

An optimized multi-task contrastive learning framework for HIFU lesion detection and segmentation.

Zavar M, Ghaffari HR, Tabatabaee H

•papers•Aug 13 2025

Accurate detection and segmentation of lesions induced by High-Intensity Focused Ultrasound (HIFU) in medical imaging remain significant challenges in automated disease diagnosis. Traditional methods heavily rely on labeled data, which is often scarce, expensive, and time-consuming to obtain. Moreover, existing approaches frequently struggle with variations in medical data and the limited availability of annotated datasets, leading to suboptimal performance. To address these challenges, this paper introduces an innovative framework called the Optimized Multi-Task Contrastive Learning Framework (OMCLF), which leverages self-supervised learning (SSL) and genetic algorithms (GA) to enhance HIFU lesion detection and segmentation. OMCLF integrates classification and segmentation into a unified model, utilizing a shared backbone to extract common features. The framework systematically optimizes feature representations, hyperparameters, and data augmentation strategies tailored for medical imaging, ensuring that critical information, such as lesion details, is preserved. By employing a genetic algorithm, OMCLF explores and optimizes augmentation techniques suitable for medical data, avoiding distortions that could compromise diagnostic accuracy. Experimental results demonstrate that OMCLF outperforms single-task methods in both classification and segmentation tasks while significantly reducing dependency on labeled data. Specifically, OMCLF achieves an accuracy of 93.3% in lesion detection and a Dice score of 92.5% in segmentation, surpassing state-of-the-art methods such as SimCLR and MoCo. The proposed approach achieves superior accuracy in identifying and delineating HIFU-induced lesions, marking a substantial advancement in medical image interpretation and automated diagnosis. OMCLF represents a significant step forward in the evolutionary optimization of self-supervised learning, with potential applications across various medical imaging domains.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Brown adipose tissue machine learning nnU-Net V2 network using TriDFusion (3DF).

Lafontaine D, Chahwan S, Barraza G, Ucpinar BA, Kayal G, Gómez-Banoy N, Cohen P, Humm JL, Schöder H

•papers•Aug 13 2025

Recent advances in machine learning have revolutionized medical imaging. Currently, identifying brown adipose tissue (BAT) relies on manual identification and segmentation on Fluorine-<sup>18</sup> fluorodeoxyglucose positron emission tomography/computed tomography (<sup>18</sup>F-FDG PET/CT) scans. However, the process is time-consuming, especially for studies involving a large number of cases, and is subject to bias due to observer dependency. The introduction of machine learning algorithms, such as the PET/CT algorithm implemented in the TriDFusion (3DF) Image Viewer, represents a significant advancement in BAT detection. In the context of cancer care, artificial intelligence (AI)-driven BAT detection holds immense promise for rapid and automatic differentiation between malignant lesions and non-malignant BAT confounds. By leveraging machine learning to discern intricate patterns in imaging data, this study aims to advance the automation of BAT recognition and provide precise quantitative assessment of radiographic features. We used a semi-automatic, threshold-based 3DF workflow to segment 317 PET/CT scans containing BAT. To minimize manual edits, we defined exclusion zones via machine-learning-based CT organ segmentation and used those organ masks to assign each volume of interest (VOI) to its anatomical site. Three physicians then reviewed and corrected all segmentations using the 3DF contour panel. The final, edited masks were used to train an nnU-Net V2 model, which we subsequently applied to 118 independent PET/CT scans. Across all anatomical sites, physicians reviewed the network’s automated segmentations to be approximately 90% accurate. Although nnU-Net V2 effectively identified BAT from PET/CT scans, training an AI model capable of perfect BAT segmentation remains a challenge due to factors such as PET/CT misregistration and the absence of visible BAT activity across contiguous slices.

Mixed Modality Segmentation Whole Body Retrospective Clinical In Silico Academic Lab

MedAtlas: Evaluating LLMs for Multi-Round, Multi-Task Medical Reasoning Across Diverse Imaging Modalities and Clinical Text

Ronghao Xu, Zhen Huang, Yangbo Wei, Xiaoqian Zhou, Zikang Xu, Ting Liu, Zihang Jiang, S. Kevin Zhou

•preprint•Aug 13 2025

Artificial intelligence has demonstrated significant potential in clinical decision-making; however, developing models capable of adapting to diverse real-world scenarios and performing complex diagnostic reasoning remains a major challenge. Existing medical multi-modal benchmarks are typically limited to single-image, single-turn tasks, lacking multi-modal medical image integration and failing to capture the longitudinal and multi-modal interactive nature inherent to clinical practice. To address this gap, we introduce MedAtlas, a novel benchmark framework designed to evaluate large language models on realistic medical reasoning tasks. MedAtlas is characterized by four key features: multi-turn dialogue, multi-modal medical image interaction, multi-task integration, and high clinical fidelity. It supports four core tasks: open-ended multi-turn question answering, closed-ended multi-turn question answering, multi-image joint reasoning, and comprehensive disease diagnosis. Each case is derived from real diagnostic workflows and incorporates temporal interactions between textual medical histories and multiple imaging modalities, including CT, MRI, PET, ultrasound, and X-ray, requiring models to perform deep integrative reasoning across images and clinical texts. MedAtlas provides expert-annotated gold standards for all tasks. Furthermore, we propose two novel evaluation metrics: Round Chain Accuracy and Error Propagation Resistance. Benchmark results with existing multi-modal models reveal substantial performance gaps in multi-stage clinical reasoning. MedAtlas establishes a challenging evaluation platform to advance the development of robust and trustworthy medical AI.

Mixed Modality LLM Radiology Report Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset

Applications of artificial intelligence in liver cancer: A scoping review.

Chierici A, Lareyre F, Iannelli A, Salucki B, Goffart S, Guzzi L, Poggi E, Delingette H, Raffort J

•papers•Aug 13 2025

This review explores the application of Artificial Intelligence (AI) in managing primary liver cancer, focusing on recent advancements. AI, particularly machine learning (ML) and deep learning (DL), shows potential in improving screening, diagnosis, treatment planning, efficacy assessment, prognosis prediction, and follow-up-crucial elements given the high mortality of liver cancer. A systematic search was conducted in the PubMed, Scopus, Embase, and Web of Science databases, focusing on original research published until June 2024 on AI's clinical applications in liver cancer. Studies not relevant or lacking clinical evaluation were excluded. Out of 13,122 screened articles, 62 were selected for full review. The studies highlight significant improvements in detecting hepatocellular carcinoma and intrahepatic cholangiocarcinoma through AI. DL models show high sensitivity and specificity, particularly in early detection. In diagnosis, AI models using CT and MRI data improve precision in distinguishing benign from malignant lesions through multimodal data integration. Recent AI models outperform earlier non-neural network versions, though a gap remains between development and clinical implementation. Many models lack thorough clinical applicability assessments and external validation. AI integration in primary liver cancer management is promising but requires rigorous development and validation practices to enhance clinical outcomes fully.

Mixed Modality Classification Abdominal Review In Silico

Multi-organ AI Endophenotypes Chart the Heterogeneity of Pan-disease in the Brain, Eye, and Heart

Consortium, T. M., Boquet-Pujadas, A., anagnostakis, f., Yang, Z., Tian, Y. E., duggan, m., erus, g., srinivasan, d., Joynes, C., Bai, W., patel, p., Walker, K. A., Zalesky, A., davatzikos, c., WEN, J.

•preprint•Aug 13 2025

Disease heterogeneity and commonality pose significant challenges to precision medicine, as traditional approaches frequently focus on single disease entities and overlook shared mechanisms across conditions1. Inspired by pan-cancer2 and multi-organ research3, we introduce the concept of "pan-disease" to investigate the heterogeneity and shared etiology in brain, eye, and heart diseases. Leveraging individual-level data from 129,340 participants, as well as summary-level data from the MULTI consortium, we applied a weakly-supervised deep learning model (Surreal-GAN4,5) to multi-organ imaging, genetic, proteomic, and RNA-seq data, identifying 11 AI-derived biomarkers - called Multi-organ AI Endophenotypes (MAEs) - for the brain (Brain 1-6), eye (Eye 1-3), and heart (Heart 1-2), respectively. We found Brain 3 to be a risk factor for Alzheimers disease (AD) progression and mortality, whereas Brain 5 was protective against AD progression. Crucially, in data from an anti-amyloid AD drug (solanezumab6), heterogeneity in cognitive decline trajectories was observed across treatment groups. At week 240, patients with lower brain 1-3 expression had slower cognitive decline, whereas patients with higher expression had faster cognitive decline. A multi-layer causal pathway pinpointed Brain 1 as a mediational endophenotype7 linking the FLRT2 protein to migraine, exemplifying novel therapeutic targets and pathways. Additionally, genes associated with Eye 1 and Eye 3 were enriched in cancer drug-related gene sets with causal links to specific cancer types and proteins. Finally, Heart 1 and Heart 2 had the highest mortality risk and unique medication history profiles, with Heart 1 showing favorable responses to antihypertensive medications and Heart 2 to digoxin treatment. The 11 MAEs provide novel AI dimensional representations for precision medicine and highlight the potential of AI-driven patient stratification for disease risk monitoring, clinical trials, and drug discovery.

Mixed Modality Classification Whole Body Retrospective Clinical In Silico Consortium GenAI

Graph Neural Networks for Realistic Bleeding Prediction in Surgical Simulators.

Kakdas YC, De S, Demirel D

•papers•Aug 12 2025

This study presents a novel approach using graph neural networks to predict the risk of internal bleeding using vessel maps derived from patient CT and MRI scans, aimed at enhancing the realism of surgical simulators for emergency scenarios such as trauma, where rapid detection of internal bleeding can be lifesaving. First, medical images are segmented and converted into graph representations of the vasculature, where nodes represent vessel branching points with spatial coordinates and edges encode vessel features such as length and radius. Due to no existing dataset directly labeling bleeding risks, we calculate the bleeding probability for each vessel node using a physics-based heuristic, peripheral vascular resistance via the Hagen-Poiseuille equation. A graph attention network is then trained to regress these probabilities, effectively learning to predict hemorrhage risk from the graph-structured imaging data. The model is trained using a tenfold cross-validation on a combined dataset of 1708 vessel graphs extracted from four public image datasets (MSD, KiTS, AbdomenCT, CT-ORG) with optimization via the Adam optimizer, mean squared error loss, early stopping, and L2 regularization. Our model achieves a mean R-squared of 0.86, reaching up to 0.9188 in optimal configurations and low mean training and validation losses of 0.0069 and 0.0074, respectively, in predicting bleeding risk, with higher performance on well-connected vascular graphs. Finally, we integrate the trained model into an immersive virtual reality environment to simulate intra-abdominal bleeding scenarios for immersive surgical training. The model demonstrates robust predictive performance despite the inherent sparsity of real-life datasets.

Mixed Modality Registration Abdominal Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

KonfAI: A Modular and Fully Configurable Framework for Deep Learning in Medical Imaging

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

A stacking ensemble framework integrating radiomics and deep learning for prognostic prediction in head and neck cancer.

ES-UNet: efficient 3D medical image segmentation with enhanced skip connections in 3D UNet.

An optimized multi-task contrastive learning framework for HIFU lesion detection and segmentation.

Brown adipose tissue machine learning nnU-Net V2 network using TriDFusion (3DF).

MedAtlas: Evaluating LLMs for Multi-Round, Multi-Task Medical Reasoning Across Diverse Imaging Modalities and Clinical Text

Applications of artificial intelligence in liver cancer: A scoping review.

Multi-organ AI Endophenotypes Chart the Heterogeneity of Pan-disease in the Brain, Eye, and Heart

Graph Neural Networks for Realistic Bleeding Prediction in Surgical Simulators.

Ready to Sharpen Your Edge?