Latest Papers on Radiology AI. Tags: In Silico, Order: Best Match, Limit: 10.

A Mixed-attention Network for Automated Interventricular Septum Segmentation in Bright-blood Myocardial T2* MRI Relaxometry in Thalassemia.

Wu X, Wang H, Chen Z, Sun S, Lian Z, Zhang X, Peng P, Feng Y

•papers•May 30 2025

This study develops a deep-learning method for automatic segmentation of the interventricular septum (IS) in MR images to measure myocardial T2* and estimate cardiac iron deposition in patients with thalassemia. This retrospective study used multiple-gradient-echo cardiac MR scans from 419 thalassemia patients to develop and evaluate the segmentation network. The network was trained on 1.5 T images from Center 1 and evaluated on 3.0 T unseen images from Center 1, all data from Center 2, and the CHMMOTv1 dataset. Model performance was assessed using five metrics, and T2* values were obtained by fitting the network output. Bland-Altman analysis, coefficient of variation (CoV), and regression analysis were used to evaluate the consistency between automatic and manual methods. MA-BBIsegNet achieved a Dice of 0.90 on the internal test set, 0.85 on the external test set, and 0.81 on the CHMMOTv1 dataset. Bland-Altman analysis showed mean differences of 0.08 (95% LoA: -2.79 ∼ 2.63) ms (internal), 0.29 (95% LoA: -4.12 ∼ 3.54) ms (external) and 0.19 (95% LoA: -3.50 ∼ 3.88) ms (CHMMOTv1), with CoV of 8.9%, 6.8%, and 9.3%. Regression analysis yielded r values of 0.98 for the internal and CHMMOTv1 datasets, and 0.99 for the external dataset (p < 0.05). The IS segmentation network based on multiple-gradient-echo bright-blood images yielded T2* values that were in strong agreement with manual measurements, highlighting its potential for the efficient, non-invasive monitoring of myocardial iron deposition in patients with thalassemia.

MRI Segmentation Cardiac Retrospective Clinical In Silico Academic Lab

Combining structural equation modeling analysis with machine learning for early malignancy detection in Bethesda Category III thyroid nodules.

Kasap ZA, Kurt B, Güner A, Özsağır E, Ercin ME

•papers•May 30 2025

Atypia of Undetermined Significance (AUS), classified as Category III in the Bethesda Thyroid Cytopathology Reporting System, presents significant diagnostic challenges for clinicians. This study aims to develop a clinical decision support system that integrates structural equation modeling (SEM) and machine learning to predict malignancy in AUS thyroid nodules. The model integrates preoperative clinical data, ultrasonography (USG) findings, and cytopathological and morphometric variables. This retrospective cohort study was conducted between 2011 and 2019 at Karadeniz Technical University (KTU) Farabi Hospital. The dataset included 56 variables derived from 204 thyroid nodules diagnosed via ultrasound-guided fine-needle aspiration biopsy (FNAB) in 183 patients over 18 years. Logistic regression (LR) and SEM were used to identify risk factors for early thyroid cancer detection. Subsequently, machine learning algorithms-including Support Vector Machines (SVM), Naive Bayes (NB), and Decision Trees (DT) were used to construct decision support models. After feature selection with SEM, the SVM model achieved the highest performance, with an accuracy of 82 %, a specificity of 97 %, and an AUC value of 84 %. Additional models were developed for different scenarios, and their performance metrics were compared. Accurate preoperative prediction of malignancy in thyroid nodules is crucial for avoiding unnecessary surgeries. The proposed model supports more informed clinical decision-making by effectively identifying benign cases, thereby reducing surgical risk and improving patient care.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

Edge Computing for Physics-Driven AI in Computational MRI: A Feasibility Study

Yaşar Utku Alçalar, Yu Cao, Mehmet Akçakaya

•preprint•May 30 2025

Physics-driven artificial intelligence (PD-AI) reconstruction methods have emerged as the state-of-the-art for accelerating MRI scans, enabling higher spatial and temporal resolutions. However, the high resolution of these scans generates massive data volumes, leading to challenges in transmission, storage, and real-time processing. This is particularly pronounced in functional MRI, where hundreds of volumetric acquisitions further exacerbate these demands. Edge computing with FPGAs presents a promising solution for enabling PD-AI reconstruction near the MRI sensors, reducing data transfer and storage bottlenecks. However, this requires optimization of PD-AI models for hardware efficiency through quantization and bypassing traditional FFT-based approaches, which can be a limitation due to their computational demands. In this work, we propose a novel PD-AI computational MRI approach optimized for FPGA-based edge computing devices, leveraging 8-bit complex data quantization and eliminating redundant FFT/IFFT operations. Our results show that this strategy improves computational efficiency while maintaining reconstruction quality comparable to conventional PD-AI methods, and outperforms standard clinical methods. Our approach presents an opportunity for high-resolution MRI reconstruction on resource-constrained devices, highlighting its potential for real-world deployment.

MRI Reconstruction Methodology In Silico Academic Lab Reproducibility

Evaluation of uncertainty estimation methods in medical image segmentation: Exploring the usage of uncertainty in clinical deployment.

Li S, Yuan M, Dai X, Zhang C

•papers•May 30 2025

Uncertainty estimation methods are essential for the application of artificial intelligence (AI) models in medical image segmentation, particularly in addressing reliability and feasibility challenges in clinical deployment. Despite their significance, the adoption of uncertainty estimation methods in clinical practice remains limited due to the lack of a comprehensive evaluation framework tailored to their clinical usage. To address this gap, a simulation of uncertainty-assisted clinical workflows is conducted, highlighting the roles of uncertainty in model selection, sample screening, and risk visualization. Furthermore, uncertainty evaluation is extended to pixel, sample, and model levels to enable a more thorough assessment. At the pixel level, the Uncertainty Confusion Metric (UCM) is proposed, utilizing density curves to improve robustness against variability in uncertainty distributions and to assess the ability of pixel uncertainty to identify potential errors. At the sample level, the Expected Segmentation Calibration Error (ESCE) is introduced to provide more accurate calibration aligned with Dice, enabling more effective identification of low-quality samples. At the model level, the Harmonic Dice (HDice) metric is developed to integrate uncertainty and accuracy, mitigating the influence of dataset biases and offering a more robust evaluation of model performance on unseen data. Using this systematic evaluation framework, five mainstream uncertainty estimation methods are compared on organ and tumor datasets, providing new insights into their clinical applicability. Extensive experimental analyses validated the practicality and effectiveness of the proposed metrics. This study offers clear guidance for selecting appropriate uncertainty estimation methods in clinical settings, facilitating their integration into clinical workflows and ultimately improving diagnostic efficiency and patient outcomes.

Segmentation Abdominal Methodology In Silico Academic Lab

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson

•preprint•May 29 2025

In medical image analysis, feature engineering plays an important role in the design and performance of machine learning models. Persistent homology (PH), from the field of topological data analysis (TDA), demonstrates robustness and stability to data perturbations and addresses the limitation from traditional feature extraction approaches where a small change in input results in a large change in feature representation. Using PH, we store persistent topological and geometrical features in the form of the persistence barcode whereby large bars represent global topological features and small bars encapsulate geometrical information of the data. When multiple barcodes are computed from 2D or 3D medical images, two approaches can be used to construct the final topological feature vector in each dimension: aggregating persistence barcodes followed by featurization or concatenating topological feature vectors derived from each barcode. In this study, we conduct a comprehensive analysis across diverse medical imaging datasets to compare the effects of the two aforementioned approaches on the performance of classification models. The results of this analysis indicate that feature concatenation preserves detailed topological information from individual barcodes, yields better classification performance and is therefore a preferred approach when conducting similar experiments.

Mixed Modality Classification Methodology In Silico Academic Lab

DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray Classification

Youssef Mohamed, Noran Mohamed, Khaled Abouhashad, Feilong Tang, Sara Atito, Shoaib Jameel, Imran Razzak, Ahmed B. Zaky

•preprint•May 29 2025

While Multi-Task Learning (MTL) offers inherent advantages in complex domains such as medical imaging by enabling shared representation learning, effectively balancing task contributions remains a significant challenge. This paper addresses this critical issue by introducing DeepChest, a novel, computationally efficient and effective dynamic task-weighting framework specifically designed for multi-label chest X-ray (CXR) classification. Unlike existing heuristic or gradient-based methods that often incur substantial overhead, DeepChest leverages a performance-driven weighting mechanism based on effective analysis of task-specific loss trends. Given a network architecture (e.g., ResNet18), our model-agnostic approach adaptively adjusts task importance without requiring gradient access, thereby significantly reducing memory usage and achieving a threefold increase in training speed. It can be easily applied to improve various state-of-the-art methods. Extensive experiments on a large-scale CXR dataset demonstrate that DeepChest not only outperforms state-of-the-art MTL methods by 7% in overall accuracy but also yields substantial reductions in individual task losses, indicating improved generalization and effective mitigation of negative transfer. The efficiency and performance gains of DeepChest pave the way for more practical and robust deployment of deep learning in critical medical diagnostic applications. The code is publicly available at https://github.com/youssefkhalil320/DeepChest-MTL

X-Ray Classification Chest Methodology In Silico Open Code

Menopausal hormone therapy and the female brain: Leveraging neuroimaging and prescription registry data from the UK Biobank cohort.

Barth C, Galea LAM, Jacobs EG, Lee BH, Westlye LT, de Lange AG

•papers•May 29 2025

Menopausal hormone therapy (MHT) is generally thought to be neuroprotective, yet results have been inconsistent. Here, we present a comprehensive study of MHT use and brain characteristics in females from the UK Biobank. 19,846 females with magnetic resonance imaging data were included. Detailed MHT prescription data from primary care records was available for 538. We tested for associations between the brain measures (i.e. gray/white matter brain age, hippocampal volumes, white matter hyperintensity volumes) and MHT user status, age at first and last use, duration of use, formulation, route of administration, dosage, type, and active ingredient. We further tested for the effects of a history of hysterectomy ± bilateral oophorectomy among MHT users and examined associations by APOE ε4 status. Current MHT users, not past users, showed older gray and white matter brain age, with a difference of up to 9 mo, and smaller hippocampal volumes compared to never-users. Longer duration of use and older age at last use post-menopause was associated with older gray and white matter brain age, larger white matter hyperintensity volume, and smaller hippocampal volumes. MHT users with a history of hysterectomy ± bilateral oophorectomy showed <i>younger</i> gray matter brain age relative to MHT users without such history. We found no associations by APOE ε4 status and with other MHT variables. Our results indicate that population-level associations between MHT use and female brain health might vary depending on duration of use and past surgical history. The authors received funding from the Research Council of Norway (LTW: 223273, 249795, 273345, 298646, 300768), the South-Eastern Norway Regional Health Authority (CB: 2023037, 2022103; LTW: 2018076, 2019101), the European Research Council under the European Union's Horizon 2020 research and innovation program (LTW: 802998), the Swiss National Science Foundation (AMGdL: PZ00P3_193658), the Canadian Institutes for Health Research (LAMG: PJT-173554), the Treliving Family Chair in Women's Mental Health at the Centre for Addiction and Mental Health (LAMG), womenmind at the Centre for Addiction and Mental Health (LAMG, BHL), the Ann S. Bowers Women's Brain Health Initiative (EGJ), and the National Institutes of Health (EGJ: AG063843).

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs

Zheng Sun, Yi Wei, Long Yu

•preprint•May 29 2025

Multimodal Large Language Models (MLLMs) are of great application across many domains, such as multimodal understanding and generation. With the development of diffusion models (DM) and unified MLLMs, the performance of image generation has been significantly improved, however, the study of image screening is rare and its performance with MLLMs is unsatisfactory due to the lack of data and the week image aesthetic reasoning ability in MLLMs. In this work, we propose a complete solution to address these problems in terms of data and methodology. For data, we collect a comprehensive medical image screening dataset with 1500+ samples, each sample consists of a medical image, four generated images, and a multiple-choice answer. The dataset evaluates the aesthetic reasoning ability under four aspects: \textit{(1) Appearance Deformation, (2) Principles of Physical Lighting and Shadow, (3) Placement Layout, (4) Extension Rationality}. For methodology, we utilize long chains of thought (CoT) and Group Relative Policy Optimization with Dynamic Proportional Accuracy reward, called DPA-GRPO, to enhance the image aesthetic reasoning ability of MLLMs. Our experimental results reveal that even state-of-the-art closed-source MLLMs, such as GPT-4o and Qwen-VL-Max, exhibit performance akin to random guessing in image aesthetic reasoning. In contrast, by leveraging the reinforcement learning approach, we are able to surpass the score of both large-scale models and leading closed-source models using a much smaller model. We hope our attempt on medical image screening will serve as a regular configuration in image aesthetic reasoning in the future.

Classification Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset

Can Large Language Models Challenge CNNS in Medical Image Analysis?

Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

•preprint•May 29 2025

This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings.

Mixed Modality Classification Methodology In Silico GenAI

Dharma: A novel machine learning framework for pediatric appendicitis--diagnosis, severity assessment and evidence-based clinical decision support.

Thapa, A., Pahari, S., Timilsina, S., Chapagain, B.

•preprint•May 29 2025

BackgroundAcute appendicitis remains a challenging diagnosis in pediatric populations, with high rates of misdiagnosis and negative appendectomies despite advances in imaging modalities. Current diagnostic tools, including clinical scoring systems like Alvarado and Pediatric Appendicitis Score (PAS), lack sufficient sensitivity and specificity, while reliance on CT scans raises concerns about radiation exposure, contrast hazards and sedation in children. Moreover, no established tool effectively predicts progression from uncomplicated to complicated appendicitis, creating a critical gap in clinical decision-making. ObjectiveTo develop and evaluate a machine learning model that integrates clinical, laboratory, and radiological findings for accurate diagnosis and complication prediction in pediatric appendicitis and to deploy this model as an interpretable web-based tool for clinical decision support. MethodsWe analyzed data from 780 pediatric patients (ages 0-18) with suspected appendicitis admitted to Childrens Hospital St. Hedwig, Regensburg, between 2016 and 2021. For severity prediction, our dataset was augmented with 430 additional cases from published literature and only the confirmed cases of acute appendicitis(n=602) were used. After feature selection using statistical methods and recursive feature elimination, we developed a Random Forest model named Dharma, optimized through hyperparameter tuning and cross-validation. Model performance was evaluated on independent test sets and compared with conventional diagnostic tools. ResultsDharma demonstrated superior diagnostic performance with an AUC-ROC of 0.96 ({+/-}0.02 SD) in cross-validation and 0.97-0.98 on independent test sets. At an optimal threshold of 64%, the model achieved specificity of 88%-98%, sensitivity of 89%-95%, and positive predictive value of 93%-99%. For complication prediction, Dharma attained a sensitivity of 93% ({+/-}0.05 SD) in cross-validation and 96% on the test set, with a negative predictive value of 98%. The model maintained strong performance even in cases where the appendix could not be visualized on ultrasonography (AUC-ROC 0.95, sensitivity 89%, specificity 87% at the threshold of 30%). ConclusionDharma is a novel, interpretable machine learning based clinical decision support tool designed to address the diagnostic challenges of pediatric appendicitis by integrating easily obtainable clinical, laboratory, and radiological data into a unified, real-time predictive framework. Unlike traditional scoring systems and imaging modalities, which may lack specificity or raise safety concerns in children, Dharma demonstrates high accuracy in diagnosing appendicitis and predicting progression from uncomplicated to complicated cases, potentially reducing unnecessary surgeries and CT scans. Its robust performance, even with incomplete imaging data, underscores its utility in resource-limited settings. Delivered through an intuitive, transparent, and interpretable web application, Dharma supports frontline providers--particularly in low- and middle-income settings--in making timely, evidence-based decisions, streamlining patient referrals, and improving clinical outcomes. By bridging critical gaps in current diagnostic and prognostic tools, Dharma offers a practical and accessible 21st-century solution tailored to real-world pediatric surgical care across diverse healthcare contexts. Furthermore, the underlying framework and concepts of Dharma may be adaptable to other clinical challenges beyond pediatric appendicitis, providing a foundation for broader applications of machine learning in healthcare. Author SummaryAccurate diagnosis of pediatric appendicitis remains challenging, with current clinical scores and imaging tests limited by sensitivity, specificity, predictive values, and safety concerns. We developed Dharma, an interpretable machine learning model that integrates clinical, laboratory, and radiological data to assist in diagnosing appendicitis and predicting its severity in children. Evaluated on a large dataset supplemented by published cases, Dharma demonstrated strong diagnostic and prognostic performance, including in cases with incomplete imaging--making it potentially especially useful in resource-limited settings for early decision-making and streamlined referrals. Available as a web-based tool, it provides real-time support to healthcare providers in making evidence-based decisions that could reduce negative appendectomies while avoiding hazards associated with advanced imaging modalities such as sedation, contrast, or radiation exposure. Furthermore, the open-access concepts and framework underlying Dharma have the potential to address diverse healthcare challenges beyond pediatric appendicitis.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

A Mixed-attention Network for Automated Interventricular Septum Segmentation in Bright-blood Myocardial T2* MRI Relaxometry in Thalassemia.

Combining structural equation modeling analysis with machine learning for early malignancy detection in Bethesda Category III thyroid nodules.

Edge Computing for Physics-Driven AI in Computational MRI: A Feasibility Study

Evaluation of uncertainty estimation methods in medical image segmentation: Exploring the usage of uncertainty in clinical deployment.

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray Classification

Menopausal hormone therapy and the female brain: Leveraging neuroimaging and prescription registry data from the UK Biobank cohort.

Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs

Can Large Language Models Challenge CNNS in Medical Image Analysis?

Dharma: A novel machine learning framework for pediatric appendicitis--diagnosis, severity assessment and evidence-based clinical decision support.

Ready to Sharpen Your Edge?