Latest Papers on Radiology AI. Tags: Classification

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson

•preprint•May 29 2025

In medical image analysis, feature engineering plays an important role in the design and performance of machine learning models. Persistent homology (PH), from the field of topological data analysis (TDA), demonstrates robustness and stability to data perturbations and addresses the limitation from traditional feature extraction approaches where a small change in input results in a large change in feature representation. Using PH, we store persistent topological and geometrical features in the form of the persistence barcode whereby large bars represent global topological features and small bars encapsulate geometrical information of the data. When multiple barcodes are computed from 2D or 3D medical images, two approaches can be used to construct the final topological feature vector in each dimension: aggregating persistence barcodes followed by featurization or concatenating topological feature vectors derived from each barcode. In this study, we conduct a comprehensive analysis across diverse medical imaging datasets to compare the effects of the two aforementioned approaches on the performance of classification models. The results of this analysis indicate that feature concatenation preserves detailed topological information from individual barcodes, yields better classification performance and is therefore a preferred approach when conducting similar experiments.

Mixed Modality Classification Methodology In Silico Academic Lab

Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs

Zheng Sun, Yi Wei, Long Yu

•preprint•May 29 2025

Multimodal Large Language Models (MLLMs) are of great application across many domains, such as multimodal understanding and generation. With the development of diffusion models (DM) and unified MLLMs, the performance of image generation has been significantly improved, however, the study of image screening is rare and its performance with MLLMs is unsatisfactory due to the lack of data and the week image aesthetic reasoning ability in MLLMs. In this work, we propose a complete solution to address these problems in terms of data and methodology. For data, we collect a comprehensive medical image screening dataset with 1500+ samples, each sample consists of a medical image, four generated images, and a multiple-choice answer. The dataset evaluates the aesthetic reasoning ability under four aspects: \textit{(1) Appearance Deformation, (2) Principles of Physical Lighting and Shadow, (3) Placement Layout, (4) Extension Rationality}. For methodology, we utilize long chains of thought (CoT) and Group Relative Policy Optimization with Dynamic Proportional Accuracy reward, called DPA-GRPO, to enhance the image aesthetic reasoning ability of MLLMs. Our experimental results reveal that even state-of-the-art closed-source MLLMs, such as GPT-4o and Qwen-VL-Max, exhibit performance akin to random guessing in image aesthetic reasoning. In contrast, by leveraging the reinforcement learning approach, we are able to surpass the score of both large-scale models and leading closed-source models using a much smaller model. We hope our attempt on medical image screening will serve as a regular configuration in image aesthetic reasoning in the future.

Classification Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset

Can Large Language Models Challenge CNNs in Medical Image Analysis?

Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

•preprint•May 29 2025

This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings.

Mixed Modality Classification Methodology In Silico GenAI

Estimating Head Motion in Structural MRI Using a Deep Neural Network Trained on Synthetic Artifacts

Charles Bricout, Samira Ebrahimi Kahou, Sylvain Bouix

•preprint•May 29 2025

Motion-related artifacts are inevitable in Magnetic Resonance Imaging (MRI) and can bias automated neuroanatomical metrics such as cortical thickness. Manual review cannot objectively quantify motion in anatomical scans, and existing automated approaches often require specialized hardware or rely on unbalanced noisy training data. Here, we train a 3D convolutional neural network to estimate motion severity using only synthetically corrupted volumes. We validate our method with one held-out site from our training cohort and with 14 fully independent datasets, including one with manual ratings, achieving a representative $R^2 = 0.65$ versus manual labels and significant thickness-motion correlations in 12/15 datasets. Furthermore, our predicted motion correlates with subject age in line with prior studies. Our approach generalizes across scanner brands and protocols, enabling objective, scalable motion assessment in structural MRI studies without prospective motion correction.

MRI Classification Neurological Methodology In Silico Academic Lab

Dharma: A novel machine learning framework for pediatric appendicitis--diagnosis, severity assessment and evidence-based clinical decision support.

Thapa, A., Pahari, S., Timilsina, S., Chapagain, B.

•preprint•May 29 2025

BackgroundAcute appendicitis remains a challenging diagnosis in pediatric populations, with high rates of misdiagnosis and negative appendectomies despite advances in imaging modalities. Current diagnostic tools, including clinical scoring systems like Alvarado and Pediatric Appendicitis Score (PAS), lack sufficient sensitivity and specificity, while reliance on CT scans raises concerns about radiation exposure, contrast hazards and sedation in children. Moreover, no established tool effectively predicts progression from uncomplicated to complicated appendicitis, creating a critical gap in clinical decision-making. ObjectiveTo develop and evaluate a machine learning model that integrates clinical, laboratory, and radiological findings for accurate diagnosis and complication prediction in pediatric appendicitis and to deploy this model as an interpretable web-based tool for clinical decision support. MethodsWe analyzed data from 780 pediatric patients (ages 0-18) with suspected appendicitis admitted to Childrens Hospital St. Hedwig, Regensburg, between 2016 and 2021. For severity prediction, our dataset was augmented with 430 additional cases from published literature and only the confirmed cases of acute appendicitis(n=602) were used. After feature selection using statistical methods and recursive feature elimination, we developed a Random Forest model named Dharma, optimized through hyperparameter tuning and cross-validation. Model performance was evaluated on independent test sets and compared with conventional diagnostic tools. ResultsDharma demonstrated superior diagnostic performance with an AUC-ROC of 0.96 ({+/-}0.02 SD) in cross-validation and 0.97-0.98 on independent test sets. At an optimal threshold of 64%, the model achieved specificity of 88%-98%, sensitivity of 89%-95%, and positive predictive value of 93%-99%. For complication prediction, Dharma attained a sensitivity of 93% ({+/-}0.05 SD) in cross-validation and 96% on the test set, with a negative predictive value of 98%. The model maintained strong performance even in cases where the appendix could not be visualized on ultrasonography (AUC-ROC 0.95, sensitivity 89%, specificity 87% at the threshold of 30%). ConclusionDharma is a novel, interpretable machine learning based clinical decision support tool designed to address the diagnostic challenges of pediatric appendicitis by integrating easily obtainable clinical, laboratory, and radiological data into a unified, real-time predictive framework. Unlike traditional scoring systems and imaging modalities, which may lack specificity or raise safety concerns in children, Dharma demonstrates high accuracy in diagnosing appendicitis and predicting progression from uncomplicated to complicated cases, potentially reducing unnecessary surgeries and CT scans. Its robust performance, even with incomplete imaging data, underscores its utility in resource-limited settings. Delivered through an intuitive, transparent, and interpretable web application, Dharma supports frontline providers--particularly in low- and middle-income settings--in making timely, evidence-based decisions, streamlining patient referrals, and improving clinical outcomes. By bridging critical gaps in current diagnostic and prognostic tools, Dharma offers a practical and accessible 21st-century solution tailored to real-world pediatric surgical care across diverse healthcare contexts. Furthermore, the underlying framework and concepts of Dharma may be adaptable to other clinical challenges beyond pediatric appendicitis, providing a foundation for broader applications of machine learning in healthcare. Author SummaryAccurate diagnosis of pediatric appendicitis remains challenging, with current clinical scores and imaging tests limited by sensitivity, specificity, predictive values, and safety concerns. We developed Dharma, an interpretable machine learning model that integrates clinical, laboratory, and radiological data to assist in diagnosing appendicitis and predicting its severity in children. Evaluated on a large dataset supplemented by published cases, Dharma demonstrated strong diagnostic and prognostic performance, including in cases with incomplete imaging--making it potentially especially useful in resource-limited settings for early decision-making and streamlined referrals. Available as a web-based tool, it provides real-time support to healthcare providers in making evidence-based decisions that could reduce negative appendectomies while avoiding hazards associated with advanced imaging modalities such as sedation, contrast, or radiation exposure. Furthermore, the open-access concepts and framework underlying Dharma have the potential to address diverse healthcare challenges beyond pediatric appendicitis.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

The use of imaging in the diagnosis and treatment of thromboembolic pulmonary hypertension.

Szewczuk K, Dzikowska-Diduch O, Gołębiowski M

•papers•May 29 2025

Chronic thromboembolic pulmonary hypertension (CTEPH) is a potentially life-threatening condition, classified as group 4 pulmonary hypertension (PH), caused by stenosis or occlusion of the pulmonary arteries due to unresolved thromboembolic material. The prognosis for untreated CTEPH patients is poor because it leads to elevated pulmonary artery pressure and right heart failure. Early and accurate diagnosis of CTEPH is crucial because it remains the only form of PH that is potentially curable. However, diagnosing CTEPH is often challenging and frequently delayed or misdiagnosed. This review discusses the current role of multimodal imaging in diagnosing CTEPH, guiding clinical decision-making, and monitoring post-treatment outcomes. The characteristic findings, strengths, and limitations of various imaging modalities, such as computed tomography, ventilation-perfusion lung scintigraphy, digital subtraction pulmonary angiography, and magnetic resonance imaging, are evaluated. Additionally, the role of artificial intelligence in improving the diagnosis and treatment outcomes of CTEPH is explored. Optimal patient assessment and therapeutic decision-making should ideally be conducted in specialized centers by a multidisciplinary team, utilizing data from imaging, pulmonary hemodynamics, and patient comorbidities.

Mixed Modality Classification Chest Review Concept Academic Lab

Menopausal hormone therapy and the female brain: Leveraging neuroimaging and prescription registry data from the UK Biobank cohort.

Barth C, Galea LAM, Jacobs EG, Lee BH, Westlye LT, de Lange AG

•papers•May 29 2025

Menopausal hormone therapy (MHT) is generally thought to be neuroprotective, yet results have been inconsistent. Here, we present a comprehensive study of MHT use and brain characteristics in females from the UK Biobank. 19,846 females with magnetic resonance imaging data were included. Detailed MHT prescription data from primary care records was available for 538. We tested for associations between the brain measures (i.e. gray/white matter brain age, hippocampal volumes, white matter hyperintensity volumes) and MHT user status, age at first and last use, duration of use, formulation, route of administration, dosage, type, and active ingredient. We further tested for the effects of a history of hysterectomy ± bilateral oophorectomy among MHT users and examined associations by APOE ε4 status. Current MHT users, not past users, showed older gray and white matter brain age, with a difference of up to 9 mo, and smaller hippocampal volumes compared to never-users. Longer duration of use and older age at last use post-menopause was associated with older gray and white matter brain age, larger white matter hyperintensity volume, and smaller hippocampal volumes. MHT users with a history of hysterectomy ± bilateral oophorectomy showed <i>younger</i> gray matter brain age relative to MHT users without such history. We found no associations by APOE ε4 status and with other MHT variables. Our results indicate that population-level associations between MHT use and female brain health might vary depending on duration of use and past surgical history. The authors received funding from the Research Council of Norway (LTW: 223273, 249795, 273345, 298646, 300768), the South-Eastern Norway Regional Health Authority (CB: 2023037, 2022103; LTW: 2018076, 2019101), the European Research Council under the European Union's Horizon 2020 research and innovation program (LTW: 802998), the Swiss National Science Foundation (AMGdL: PZ00P3_193658), the Canadian Institutes for Health Research (LAMG: PJT-173554), the Treliving Family Chair in Women's Mental Health at the Centre for Addiction and Mental Health (LAMG), womenmind at the Centre for Addiction and Mental Health (LAMG, BHL), the Ann S. Bowers Women's Brain Health Initiative (EGJ), and the National Institutes of Health (EGJ: AG063843).

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Predicting abnormal fetal growth using deep learning.

Mikołaj KW, Christensen AN, Taksøe-Vester CA, Feragen A, Petersen OB, Lin M, Nielsen M, Svendsen MBS, Tolsgaard MG

•papers•May 29 2025

Ultrasound assessment of fetal size and growth is the mainstay of monitoring fetal well-being during pregnancy, as being small for gestational age (SGA) or large for gestational age (LGA) poses significant risks for both the fetus and the mother. This study aimed to enhance the prediction accuracy of abnormal fetal growth. We developed a deep learning model, trained on a dataset of 433,096 ultrasound images derived from 94,538 examinations conducted on 65,752 patients. The deep learning model performed significantly better in detecting both SGA (58% vs 70%) and LGA compared with the current clinical standard, the Hadlock formula (41% vs 55%), p < 0.001. Additionally, the model estimates were significantly less biased across all demographic and technical variables compared to the Hadlock formula. Incorporating key anatomical features such as cortical structures, liver texture, and skin thickness was likely to be responsible for the improved prediction accuracy observed.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray Classification

Youssef Mohamed, Noran Mohamed, Khaled Abouhashad, Feilong Tang, Sara Atito, Shoaib Jameel, Imran Razzak, Ahmed B. Zaky

•preprint•May 29 2025

While Multi-Task Learning (MTL) offers inherent advantages in complex domains such as medical imaging by enabling shared representation learning, effectively balancing task contributions remains a significant challenge. This paper addresses this critical issue by introducing DeepChest, a novel, computationally efficient and effective dynamic task-weighting framework specifically designed for multi-label chest X-ray (CXR) classification. Unlike existing heuristic or gradient-based methods that often incur substantial overhead, DeepChest leverages a performance-driven weighting mechanism based on effective analysis of task-specific loss trends. Given a network architecture (e.g., ResNet18), our model-agnostic approach adaptively adjusts task importance without requiring gradient access, thereby significantly reducing memory usage and achieving a threefold increase in training speed. It can be easily applied to improve various state-of-the-art methods. Extensive experiments on a large-scale CXR dataset demonstrate that DeepChest not only outperforms state-of-the-art MTL methods by 7% in overall accuracy but also yields substantial reductions in individual task losses, indicating improved generalization and effective mitigation of negative transfer. The efficiency and performance gains of DeepChest pave the way for more practical and robust deployment of deep learning in critical medical diagnostic applications. The code is publicly available at https://github.com/youssefkhalil320/DeepChest-MTL

X-Ray Classification Chest Methodology In Silico Open Code

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson

•preprint•May 29 2025

Classification Methodology In Silico Academic Lab

Filter Papers

Tags

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs

Can Large Language Models Challenge CNNs in Medical Image Analysis?

Estimating Head Motion in Structural MRI Using a Deep Neural Network Trained on Synthetic Artifacts

Dharma: A novel machine learning framework for pediatric appendicitis--diagnosis, severity assessment and evidence-based clinical decision support.

The use of imaging in the diagnosis and treatment of thromboembolic pulmonary hypertension.

Menopausal hormone therapy and the female brain: Leveraging neuroimaging and prescription registry data from the UK Biobank cohort.

Predicting abnormal fetal growth using deep learning.

DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray Classification

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Ready to Sharpen Your Edge?