Latest Papers on Radiology AI. Tags: In Silico

A Modified VGG19-Based Framework for Accurate and Interpretable Real-Time Bone Fracture Detection

Md. Ehsanul Haque, Abrar Fahim, Shamik Dey, Syoda Anamika Jahan, S. M. Jahidul Islam, Sakib Rokoni, Md Sakib Morshed

•preprint•Jul 31 2025

Early and accurate detection of the bone fracture is paramount to initiating treatment as early as possible and avoiding any delay in patient treatment and outcomes. Interpretation of X-ray image is a time consuming and error prone task, especially when resources for such interpretation are limited by lack of radiology expertise. Additionally, deep learning approaches used currently, typically suffer from misclassifications and lack interpretable explanations to clinical use. In order to overcome these challenges, we propose an automated framework of bone fracture detection using a VGG-19 model modified to our needs. It incorporates sophisticated preprocessing techniques that include Contrast Limited Adaptive Histogram Equalization (CLAHE), Otsu's thresholding, and Canny edge detection, among others, to enhance image clarity as well as to facilitate the feature extraction. Therefore, we use Grad-CAM, an Explainable AI method that can generate visual heatmaps of the model's decision making process, as a type of model interpretability, for clinicians to understand the model's decision making process. It encourages trust and helps in further clinical validation. It is deployed in a real time web application, where healthcare professionals can upload X-ray images and get the diagnostic feedback within 0.5 seconds. The performance of our modified VGG-19 model attains 99.78\% classification accuracy and AUC score of 1.00, making it exceptionally good. The framework provides a reliable, fast, and interpretable solution for bone fracture detection that reasons more efficiently for diagnoses and better patient care.

X-Ray Detection Musculoskeletal Methodology In Silico GenAI

CX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning

Wenjie Li, Yujie Zhang, Haoran Sun, Yueqi Li, Fanrui Zhang, Mengzhe Xu, Victoria Borja Clausich, Sade Mellin, Renhao Yang, Chenrun Wang, Jethro Zih-Shuo Wang, Shiyi Yao, Gen Li, Yidong Xu, Hanyu Wang, Yilin Huang, Angela Lin Wang, Chen Shi, Yin Zhang, Jianan Guo, Luqi Yang, Renxuan Li, Yang Xu, Jiawei Liu, Yao Zhang, Lei Liu, Carlos Gutiérrez SanRomán, Lei Wang

•preprint•Jul 31 2025

Chest X-ray (CXR) imaging is one of the most widely used diagnostic modalities in clinical practice, encompassing a broad spectrum of diagnostic tasks. Recent advancements have seen the extensive application of reasoning-based multimodal large language models (MLLMs) in medical imaging to enhance diagnostic efficiency and interpretability. However, existing multimodal models predominantly rely on "one-time" diagnostic approaches, lacking verifiable supervision of the reasoning process. This leads to challenges in multi-task CXR diagnosis, including lengthy reasoning, sparse rewards, and frequent hallucinations. To address these issues, we propose CX-Mind, the first generative model to achieve interleaved "think-answer" reasoning for CXR tasks, driven by curriculum-based reinforcement learning and verifiable process rewards (CuRL-VPR). Specifically, we constructed an instruction-tuning dataset, CX-Set, comprising 708,473 images and 2,619,148 samples, and generated 42,828 high-quality interleaved reasoning data points supervised by clinical reports. Optimization was conducted in two stages under the Group Relative Policy Optimization framework: initially stabilizing basic reasoning with closed-domain tasks, followed by transfer to open-domain diagnostics, incorporating rule-based conditional process rewards to bypass the need for pretrained reward models. Extensive experimental results demonstrate that CX-Mind significantly outperforms existing medical and general-domain MLLMs in visual understanding, text generation, and spatiotemporal alignment, achieving an average performance improvement of 25.1% over comparable CXR-specific models. On real-world clinical dataset (Rui-CXR), CX-Mind achieves a mean recall@1 across 14 diseases that substantially surpasses the second-best results, with multi-center expert evaluations further confirming its clinical utility across multiple dimensions.

X-Ray LLM Radiology Report Chest Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

MitoStructSeg: mitochondrial structural complexity resolution via adaptive learning for cross-sample morphometric profiling

Wang, X., Wan, X., Cai, B., Jia, Z., Chen, Y., Guo, S., Liu, Z., Zhang, F., Hu, B.

•preprint•Jul 30 2025

Mitochondrial morphology and structural changes are closely associated with metabolic dysfunction and disease progression. However, the structural complexity of mitochondria presents a major challenge for accurate segmentation and analysis. Most existing methods focus on delineating entire mitochondria but lack the capability to resolve fine internal features, particularly cristae. In this study, we introduce MitoStructSeg, a deep learning-based framework for mitochondrial structure segmentation and quantitative analysis. The core of MitoStructSeg is AMM-Seg, a novel model that integrates domain adaptation to improve cross-sample generalization, dual-channel feature fusion to enhance structural detail extraction, and continuity learning to preserve spatial coherence. This architecture enables accurate segmentation of both mitochondrial membranes and intricately folded cristae. MitoStructSeg further incorporates a quantitative analysis module that extracts key morphological metrics, including surface area, volume, and cristae density, allowing comprehensive and scalable assessment of mitochondrial morphology. The effectiveness of our approach has been validated on both human myocardial tissue and mouse kidney tissue, demonstrating its robustness in accurately segmenting mitochondria with diverse morphologies. In addition, we provide an open source, user-friendly tool to ensure practical usability.

OCT Segmentation Methodology In Silico Academic Lab Open Code

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study.

Maruyama H, Toyama Y, Takanami K, Takase K, Kamei T

•papers•Jul 30 2025

Artificial intelligence and large language models (LLMs)-particularly GPT-4 and GPT-4o-have demonstrated high correct-answer rates in medical examinations. GPT-4o has enhanced diagnostic capabilities, advanced image processing, and updated knowledge. Japanese surgeons face critical challenges, including a declining workforce, regional health care disparities, and work-hour-related challenges. Nonetheless, although LLMs could be beneficial in surgical education, no studies have yet assessed GPT-4o's surgical knowledge or its performance in the field of surgery. This study aims to evaluate the potential of GPT-4 and GPT-4o in surgical education by using them to take the Japan Surgical Board Examination (JSBE), which includes both textual questions and medical images-such as surgical and computed tomography scans-to comprehensively assess their surgical knowledge. We used 297 multiple-choice questions from the 2021-2023 JSBEs. The questions were in Japanese, and 104 of them included images. First, the GPT-4 and GPT-4o responses to only the textual questions were collected via OpenAI's application programming interface to evaluate their correct-answer rate. Subsequently, the correct-answer rate of their responses to questions that included images was assessed by inputting both text and images. The overall correct-answer rates of GPT-4o and GPT-4 for the text-only questions were 78% (231/297) and 55% (163/297), respectively, with GPT-4o outperforming GPT-4 by 23% (P=<.01). By contrast, there was no significant improvement in the correct-answer rate for questions that included images compared with the results for the text-only questions. GPT-4o outperformed GPT-4 on the JSBE. However, the results of the LLMs were lower than those of the examinees. Despite the capabilities of LLMs, image recognition remains a challenge for them, and their clinical application requires caution owing to the potential inaccuracy of their results.

CT Classification Abdominal Methodology In Silico Big Tech Benchmark SOTA

Optimizing Thyroid Nodule Management With Artificial Intelligence: Multicenter Retrospective Study on Reducing Unnecessary Fine Needle Aspirations.

Ni JH, Liu YY, Chen C, Shi YL, Zhao X, Li XL, Ye BB, Hu JL, Mou LC, Sun LP, Fu HJ, Zhu XX, Zhang YF, Guo L, Xu HX

•papers•Jul 30 2025

Most artificial intelligence (AI) models for thyroid nodules are designed to screen for malignancy to guide further interventions; however, these models have not yet been fully implemented in clinical practice. This study aimed to evaluate AI in real clinical settings for identifying potentially benign thyroid nodules initially deemed to be at risk for malignancy by radiologists, reducing unnecessary fine needle aspiration (FNA) and optimizing management. We retrospectively collected a validation cohort of thyroid nodules that had undergone FNA. These nodules were initially assessed as "suspicious for malignancy" by radiologists based on ultrasound features, following standard clinical practice, which prompted further FNA procedures. Ultrasound images of these nodules were re-evaluated using a deep learning-based AI system, and its diagnostic performance was assessed in terms of correct identification of benign nodules and error identification of malignant nodules. Performance metrics such as sensitivity, specificity, and the area under the receiver operating characteristic curve were calculated. In addition, a separate comparison cohort was retrospectively assembled to compare the AI system's ability to correctly identify benign thyroid nodules with that of radiologists. The validation cohort comprised 4572 thyroid nodules (benign: n=3134, 68.5%; malignant: n=1438, 31.5%). AI correctly identified 2719 (86.8% among benign nodules) and reduced unnecessary FNAs from 68.5% (3134/4572) to 9.1% (415/4572). However, 123 malignant nodules (8.6% of malignant cases) were mistakenly identified as benign, with the majority of these being of low or intermediate suspicion. In the comparison cohort, AI successfully identified 81.4% (96/118) of benign nodules. It outperformed junior and senior radiologists, who identified only 40% and 55%, respectively. The area under the curve (AUC) for the AI model was 0.88 (95% CI 0.85-0.91), demonstrating a superior AUC compared with that of the junior radiologists (AUC=0.43, 95% CI 0.36-0.50; P=.002) and senior radiologists (AUC=0.63, 95% CI 0.55-0.70; P=.003). Compared with radiologists, AI can better serve as a "goalkeeper" in reducing unnecessary FNAs by identifying benign nodules that are initially assessed as malignant by radiologists. However, active surveillance is still necessary for all these nodules since a very small number of low-aggressiveness malignant nodules may be mistakenly identified.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

Optimizing Federated Learning Configurations for MRI Prostate Segmentation and Cancer Detection: A Simulation Study.

Moradi A, Zerka F, Bosma JS, Sunoqrot MRS, Abrahamsen BS, Yakar D, Geerdink J, Huisman H, Bathen TF, Elschot M

•papers•Jul 30 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop and optimize a federated learning (FL) framework across multiple clients for biparametric MRI prostate segmentation and clinically significant prostate cancer (csPCa) detection. Materials and Methods A retrospective study was conducted using Flower FL to train a nnU-Net-based architecture for MRI prostate segmentation and csPCa detection, using data collected from January 2010 to August 2021. Model development included training and optimizing local epochs, federated rounds, and aggregation strategies for FL-based prostate segmentation on T2-weighted MRIs (four clients, 1294 patients) and csPCa detection using biparametric MRIs (three clients, 1440 patients). Performance was evaluated on independent test sets using the Dice score for segmentation and the Prostate Imaging: Cancer Artificial Intelligence (PI-CAI) score, defined as the average of the area under the receiver operating characteristic curve and average precision, for csPCa detection. P values for performance differences were calculated using permutation testing. Results The FL configurations were independently optimized for both tasks, showing improved performance at 1 epoch 300 rounds using FedMedian for prostate segmentation and 5 epochs 200 rounds using FedAdagrad, for csPCa detection. Compared with the average performance of the clients, the optimized FL model significantly improved performance in prostate segmentation (Dice score increase from 0.73 ± 0.06 to 0.88 ± 0.03; P ≤ .01) and csPCa detection (PI-CAI score increase from 0.63 ± 0.07 to 0.74 ± 0.06; P ≤ .01) on the independent test set. The optimized FL model showed higher lesion detection performance compared with the FL-baseline model (PICAI score increase from 0.72 ± 0.06 to 0.74 ± 0.06; P ≤ .01), but no evidence of a difference was observed for prostate segmentation (Dice scores, 0.87 ± 0.03 vs 0.88 ± 03; P > .05). Conclusion FL enhanced the performance and generalizability of MRI prostate segmentation and csPCa detection compared with local models, and optimizing its configuration further improved lesion detection performance. ©RSNA, 2025.

MRI Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Structural MRI-based Computer-aided Diagnosis Models for Alzheimer Disease: Insights into Misclassifications and Diagnostic Limitations.

Kang X, Lin J, Zhao K, Yan S, Chen P, Wang D, Yao H, Zhou B, Yu C, Wang P, Liao Z, Chen Y, Zhang X, Han Y, Lu J, Liu Y

•papers•Jul 30 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To examine common patterns among different computer-aided diagnosis (CAD) models for Alzheimer's disease (AD) using structural MRI data and to characterize the clinical and imaging features associated with their misclassifications. Materials and Methods This retrospective study utilized 3258 baseline structural MRIs from five multisite datasets and two multidisease datasets collected between September 2005 and December 2019. The 3D Nested Hierarchical Transformer (3DNesT) model and other CAD techniques were utilized for AD classification using 10-fold cross-validation and cross-dataset validation. Subgroup analysis of CAD-misclassified individuals compared clinical/neuroimaging biomarkers using independent t tests with Bonferroni correction. Results This study included 1391 patients with AD (mean age, 72.1 ± 9.2 years, 757 female), 205 with other neurodegenerative diseases (mean age, 64.9 ± 9.9 years, 117 male), and 1662 healthy controls (mean age, 70.6 ± 7.6 years, 935 female). The 3DNesT model achieved 90.1 ± 2.3% crossvalidation accuracy and 82.2%, 90.1%, and 91.6% in three external datasets. Further analysis suggested that false negative (FN) subgroup (n = 223) exhibited minimal atrophy and better cognitive performance than true positive (TP) subgroup (MMSE, FN, 21.4 ± 4.4; TP, 19.7 ± 5.7; PFWE < 0.001), despite displaying similar levels of amyloid beta (FN, 705.9 ± 353.9; TP, 665.7 ± 305.8; PFWE = 0.47), Tau (FN, 352.4 ± 166.8; TP, 371.0 ± 141.8; PFWE = 0.47) burden. Conclusion FN subgroup exhibited atypical structural MRI patterns and clinical measures, fundamentally limiting the diagnostic performance of CAD models based solely on structural MRI. ©RSNA, 2025.

MRI Classification Neurological Retrospective Clinical In Silico

Wall Shear Stress Estimation in Abdominal Aortic Aneurysms: Towards Generalisable Neural Surrogate Models

Patryk Rygiel, Julian Suk, Christoph Brune, Kak Khee Yeung, Jelmer M. Wolterink

•preprint•Jul 30 2025

Abdominal aortic aneurysms (AAAs) are pathologic dilatations of the abdominal aorta posing a high fatality risk upon rupture. Studying AAA progression and rupture risk often involves in-silico blood flow modelling with computational fluid dynamics (CFD) and extraction of hemodynamic factors like time-averaged wall shear stress (TAWSS) or oscillatory shear index (OSI). However, CFD simulations are known to be computationally demanding. Hence, in recent years, geometric deep learning methods, operating directly on 3D shapes, have been proposed as compelling surrogates, estimating hemodynamic parameters in just a few seconds. In this work, we propose a geometric deep learning approach to estimating hemodynamics in AAA patients, and study its generalisability to common factors of real-world variation. We propose an E(3)-equivariant deep learning model utilising novel robust geometrical descriptors and projective geometric algebra. Our model is trained to estimate transient WSS using a dataset of CT scans of 100 AAA patients, from which lumen geometries are extracted and reference CFD simulations with varying boundary conditions are obtained. Results show that the model generalizes well within the distribution, as well as to the external test set. Moreover, the model can accurately estimate hemodynamics across geometry remodelling and changes in boundary conditions. Furthermore, we find that a trained model can be applied to different artery tree topologies, where new and unseen branches are added during inference. Finally, we find that the model is to a large extent agnostic to mesh resolution. These results show the accuracy and generalisation of the proposed model, and highlight its potential to contribute to hemodynamic parameter estimation in clinical practice.

CT Registration Vascular Methodology In Silico Academic Lab Benchmark SOTA

Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings

Dongli He, Hu Wang, Mohammad Yaqub

•preprint•Jul 30 2025

Accurate fetal biometric measurements, such as abdominal circumference, play a vital role in prenatal care. However, obtaining high-quality ultrasound images for these measurements heavily depends on the expertise of sonographers, posing a significant challenge in low-income countries due to the scarcity of trained personnel. To address this issue, we leverage FetalCLIP, a vision-language model pretrained on a curated dataset of over 210,000 fetal ultrasound image-caption pairs, to perform automated fetal ultrasound image quality assessment (IQA) on blind-sweep ultrasound data. We introduce FetalCLIP$_{CLS}$, an IQA model adapted from FetalCLIP using Low-Rank Adaptation (LoRA), and evaluate it on the ACOUSLIC-AI dataset against six CNN and Transformer baselines. FetalCLIP$_{CLS}$ achieves the highest F1 score of 0.757. Moreover, we show that an adapted segmentation model, when repurposed for classification, further improves performance, achieving an F1 score of 0.771. Our work demonstrates how parameter-efficient fine-tuning of fetal ultrasound foundation models can enable task-specific adaptations, advancing prenatal care in resource-limited settings. The experimental code is available at: https://github.com/donglihe-hub/FetalCLIP-IQA.

Ultrasound Classification Abdominal Methodology In Silico Academic Lab Open Code

Optimizing Federated Learning Configurations for MRI Prostate Segmentation and Cancer Detection: A Simulation Study

Ashkan Moradi, Fadila Zerka, Joeran S. Bosma, Mohammed R. S. Sunoqrot, Bendik S. Abrahamsen, Derya Yakar, Jeroen Geerdink, Henkjan Huisman, Tone Frost Bathen, Mattijs Elschot

•preprint•Jul 30 2025

Purpose: To develop and optimize a federated learning (FL) framework across multiple clients for biparametric MRI prostate segmentation and clinically significant prostate cancer (csPCa) detection. Materials and Methods: A retrospective study was conducted using Flower FL to train a nnU-Net-based architecture for MRI prostate segmentation and csPCa detection, using data collected from January 2010 to August 2021. Model development included training and optimizing local epochs, federated rounds, and aggregation strategies for FL-based prostate segmentation on T2-weighted MRIs (four clients, 1294 patients) and csPCa detection using biparametric MRIs (three clients, 1440 patients). Performance was evaluated on independent test sets using the Dice score for segmentation and the Prostate Imaging: Cancer Artificial Intelligence (PI-CAI) score, defined as the average of the area under the receiver operating characteristic curve and average precision, for csPCa detection. P-values for performance differences were calculated using permutation testing. Results: The FL configurations were independently optimized for both tasks, showing improved performance at 1 epoch 300 rounds using FedMedian for prostate segmentation and 5 epochs 200 rounds using FedAdagrad, for csPCa detection. Compared with the average performance of the clients, the optimized FL model significantly improved performance in prostate segmentation and csPCa detection on the independent test set. The optimized FL model showed higher lesion detection performance compared to the FL-baseline model, but no evidence of a difference was observed for prostate segmentation. Conclusions: FL enhanced the performance and generalizability of MRI prostate segmentation and csPCa detection compared with local models, and optimizing its configuration further improved lesion detection performance.

MRI Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

A Modified VGG19-Based Framework for Accurate and Interpretable Real-Time Bone Fracture Detection

CX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning

MitoStructSeg: mitochondrial structural complexity resolution via adaptive learning for cross-sample morphometric profiling

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study.

Optimizing Thyroid Nodule Management With Artificial Intelligence: Multicenter Retrospective Study on Reducing Unnecessary Fine Needle Aspirations.

Optimizing Federated Learning Configurations for MRI Prostate Segmentation and Cancer Detection: A Simulation Study.

Structural MRI-based Computer-aided Diagnosis Models for Alzheimer Disease: Insights into Misclassifications and Diagnostic Limitations.

Wall Shear Stress Estimation in Abdominal Aortic Aneurysms: Towards Generalisable Neural Surrogate Models

Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings

Optimizing Federated Learning Configurations for MRI Prostate Segmentation and Cancer Detection: A Simulation Study

Ready to Sharpen Your Edge?