Sort by:
Page 35 of 55546 results

Can Large Language Models Challenge CNNs in Medical Image Analysis?

Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

arxiv logopreprintMay 29 2025
This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings.

Dharma: A novel machine learning framework for pediatric appendicitis--diagnosis, severity assessment and evidence-based clinical decision support.

Thapa, A., Pahari, S., Timilsina, S., Chapagain, B.

medrxiv logopreprintMay 29 2025
BackgroundAcute appendicitis remains a challenging diagnosis in pediatric populations, with high rates of misdiagnosis and negative appendectomies despite advances in imaging modalities. Current diagnostic tools, including clinical scoring systems like Alvarado and Pediatric Appendicitis Score (PAS), lack sufficient sensitivity and specificity, while reliance on CT scans raises concerns about radiation exposure, contrast hazards and sedation in children. Moreover, no established tool effectively predicts progression from uncomplicated to complicated appendicitis, creating a critical gap in clinical decision-making. ObjectiveTo develop and evaluate a machine learning model that integrates clinical, laboratory, and radiological findings for accurate diagnosis and complication prediction in pediatric appendicitis and to deploy this model as an interpretable web-based tool for clinical decision support. MethodsWe analyzed data from 780 pediatric patients (ages 0-18) with suspected appendicitis admitted to Childrens Hospital St. Hedwig, Regensburg, between 2016 and 2021. For severity prediction, our dataset was augmented with 430 additional cases from published literature and only the confirmed cases of acute appendicitis(n=602) were used. After feature selection using statistical methods and recursive feature elimination, we developed a Random Forest model named Dharma, optimized through hyperparameter tuning and cross-validation. Model performance was evaluated on independent test sets and compared with conventional diagnostic tools. ResultsDharma demonstrated superior diagnostic performance with an AUC-ROC of 0.96 ({+/-}0.02 SD) in cross-validation and 0.97-0.98 on independent test sets. At an optimal threshold of 64%, the model achieved specificity of 88%-98%, sensitivity of 89%-95%, and positive predictive value of 93%-99%. For complication prediction, Dharma attained a sensitivity of 93% ({+/-}0.05 SD) in cross-validation and 96% on the test set, with a negative predictive value of 98%. The model maintained strong performance even in cases where the appendix could not be visualized on ultrasonography (AUC-ROC 0.95, sensitivity 89%, specificity 87% at the threshold of 30%). ConclusionDharma is a novel, interpretable machine learning based clinical decision support tool designed to address the diagnostic challenges of pediatric appendicitis by integrating easily obtainable clinical, laboratory, and radiological data into a unified, real-time predictive framework. Unlike traditional scoring systems and imaging modalities, which may lack specificity or raise safety concerns in children, Dharma demonstrates high accuracy in diagnosing appendicitis and predicting progression from uncomplicated to complicated cases, potentially reducing unnecessary surgeries and CT scans. Its robust performance, even with incomplete imaging data, underscores its utility in resource-limited settings. Delivered through an intuitive, transparent, and interpretable web application, Dharma supports frontline providers--particularly in low- and middle-income settings--in making timely, evidence-based decisions, streamlining patient referrals, and improving clinical outcomes. By bridging critical gaps in current diagnostic and prognostic tools, Dharma offers a practical and accessible 21st-century solution tailored to real-world pediatric surgical care across diverse healthcare contexts. Furthermore, the underlying framework and concepts of Dharma may be adaptable to other clinical challenges beyond pediatric appendicitis, providing a foundation for broader applications of machine learning in healthcare. Author SummaryAccurate diagnosis of pediatric appendicitis remains challenging, with current clinical scores and imaging tests limited by sensitivity, specificity, predictive values, and safety concerns. We developed Dharma, an interpretable machine learning model that integrates clinical, laboratory, and radiological data to assist in diagnosing appendicitis and predicting its severity in children. Evaluated on a large dataset supplemented by published cases, Dharma demonstrated strong diagnostic and prognostic performance, including in cases with incomplete imaging--making it potentially especially useful in resource-limited settings for early decision-making and streamlined referrals. Available as a web-based tool, it provides real-time support to healthcare providers in making evidence-based decisions that could reduce negative appendectomies while avoiding hazards associated with advanced imaging modalities such as sedation, contrast, or radiation exposure. Furthermore, the open-access concepts and framework underlying Dharma have the potential to address diverse healthcare challenges beyond pediatric appendicitis.

Deep Learning-Based Breast Cancer Detection in Mammography: A Multi-Center Validation Study in Thai Population

Isarun Chamveha, Supphanut Chaiyungyuen, Sasinun Worakriangkrai, Nattawadee Prasawang, Warasinee Chaisangmongkon, Pornpim Korpraphong, Voraparee Suvannarerg, Shanigarn Thiravit, Chalermdej Kannawat, Kewalin Rungsinaporn, Suwara Issaragrisil, Payia Chadbunchachai, Pattiya Gatechumpol, Chawiporn Muktabhant, Patarachai Sereerat

arxiv logopreprintMay 29 2025
This study presents a deep learning system for breast cancer detection in mammography, developed using a modified EfficientNetV2 architecture with enhanced attention mechanisms. The model was trained on mammograms from a major Thai medical center and validated on three distinct datasets: an in-domain test set (9,421 cases), a biopsy-confirmed set (883 cases), and an out-of-domain generalizability set (761 cases) collected from two different hospitals. For cancer detection, the model achieved AUROCs of 0.89, 0.96, and 0.94 on the respective datasets. The system's lesion localization capability, evaluated using metrics including Lesion Localization Fraction (LLF) and Non-Lesion Localization Fraction (NLF), demonstrated robust performance in identifying suspicious regions. Clinical validation through concordance tests showed strong agreement with radiologists: 83.5% classification and 84.0% localization concordance for biopsy-confirmed cases, and 78.1% classification and 79.6% localization concordance for out-of-domain cases. Expert radiologists' acceptance rate also averaged 96.7% for biopsy-confirmed cases, and 89.3% for out-of-domain cases. The system achieved a System Usability Scale score of 74.17 for source hospital, and 69.20 for validation hospitals, indicating good clinical acceptance. These results demonstrate the model's effectiveness in assisting mammogram interpretation, with the potential to enhance breast cancer screening workflows in clinical practice.

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson

arxiv logopreprintMay 29 2025
In medical image analysis, feature engineering plays an important role in the design and performance of machine learning models. Persistent homology (PH), from the field of topological data analysis (TDA), demonstrates robustness and stability to data perturbations and addresses the limitation from traditional feature extraction approaches where a small change in input results in a large change in feature representation. Using PH, we store persistent topological and geometrical features in the form of the persistence barcode whereby large bars represent global topological features and small bars encapsulate geometrical information of the data. When multiple barcodes are computed from 2D or 3D medical images, two approaches can be used to construct the final topological feature vector in each dimension: aggregating persistence barcodes followed by featurization or concatenating topological feature vectors derived from each barcode. In this study, we conduct a comprehensive analysis across diverse medical imaging datasets to compare the effects of the two aforementioned approaches on the performance of classification models. The results of this analysis indicate that feature concatenation preserves detailed topological information from individual barcodes, yields better classification performance and is therefore a preferred approach when conducting similar experiments.

Super-temporal-resolution Photoacoustic Imaging with Dynamic Reconstruction through Implicit Neural Representation in Sparse-view

Youshen Xiao, Yiling Shi, Ruixi Sun, Hongjiang Wei, Fei Gao, Yuyao Zhang

arxiv logopreprintMay 29 2025
Dynamic Photoacoustic Computed Tomography (PACT) is an important imaging technique for monitoring physiological processes, capable of providing high-contrast images of optical absorption at much greater depths than traditional optical imaging methods. However, practical instrumentation and geometric constraints limit the number of acoustic sensors available around the imaging target, leading to sparsity in sensor data. Traditional photoacoustic (PA) image reconstruction methods, when directly applied to sparse PA data, produce severe artifacts. Additionally, these traditional methods do not consider the inter-frame relationships in dynamic imaging. Temporal resolution is crucial for dynamic photoacoustic imaging, which is fundamentally limited by the low repetition rate (e.g., 20 Hz) and high cost of high-power laser technology. Recently, Implicit Neural Representation (INR) has emerged as a powerful deep learning tool for solving inverse problems with sparse data, by characterizing signal properties as continuous functions of their coordinates in an unsupervised manner. In this work, we propose an INR-based method to improve dynamic photoacoustic image reconstruction from sparse-views and enhance temporal resolution, using only spatiotemporal coordinates as input. Specifically, the proposed INR represents dynamic photoacoustic images as implicit functions and encodes them into a neural network. The weights of the network are learned solely from the acquired sparse sensor data, without the need for external training datasets or prior images. Benefiting from the strong implicit continuity regularization provided by INR, as well as explicit regularization for low-rank and sparsity, our proposed method outperforms traditional reconstruction methods under two different sparsity conditions, effectively suppressing artifacts and ensuring image quality.

ROC Analysis of Biomarker Combinations in Fragile X Syndrome-Specific Clinical Trials: Evaluating Treatment Efficacy via Exploratory Biomarkers

Norris, J. E., Berry-Kravis, E. M., Harnett, M. D., Reines, S. A., Reese, M., Auger, E. K., Outterson, A., Furman, J., Gurney, M. E., Ethridge, L. E.

medrxiv logopreprintMay 29 2025
Fragile X Syndrome (FXS) is a rare neurodevelopmental disorder caused by a trinucleotide repeat expansion on the 5 untranslated region of the FMR1 gene. FXS is characterized by intellectual disability, anxiety, sensory hypersensitivity, and difficulties with executive function. A recent phase 2 placebo-controlled clinical trial assessing BPN14770, a first-in-class phosphodiesterase 4D allosteric inhibitor, in 30 adult males (age 18-41 years) with FXS demonstrated cognitive improvements on the NIH Toolbox Cognitive Battery in domains related to language and caregiver reports of improvement in both daily functioning and language. However, individual physiological measures from electroencephalography (EEG) demonstrated only marginal significance for trial efficacy. A secondary analysis of resting state EEG data collected as part of the phase 2 clinical trial evaluating BPN14770 was conducted using a machine learning classification algorithm to classify trial conditions (i.e., baseline, drug, placebo) via linear EEG variable combinations. The algorithm identified a composite of peak alpha frequencies (PAF) across multiple brain regions as a potential biomarker demonstrating BPN14770 efficacy. Increased PAF from baseline was associated with drug but not placebo. Given the relationship between PAF and cognitive function among typically developed adults and those with intellectual disability, as well as previously reported reductions in alpha frequency and power in FXS, PAF represents a potential physiological measure of BPN14770 efficacy.

Chest Disease Detection In X-Ray Images Using Deep Learning Classification Method

Alanna Hazlett, Naomi Ohashi, Timothy Rodriguez, Sodiq Adewole

arxiv logopreprintMay 28 2025
In this work, we investigate the performance across multiple classification models to classify chest X-ray images into four categories of COVID-19, pneumonia, tuberculosis (TB), and normal cases. We leveraged transfer learning techniques with state-of-the-art pre-trained Convolutional Neural Networks (CNNs) models. We fine-tuned these pre-trained architectures on a labeled medical x-ray images. The initial results are promising with high accuracy and strong performance in key classification metrics such as precision, recall, and F1 score. We applied Gradient-weighted Class Activation Mapping (Grad-CAM) for model interpretability to provide visual explanations for classification decisions, improving trust and transparency in clinical applications.

Comparative Analysis of Machine Learning Models for Lung Cancer Mutation Detection and Staging Using 3D CT Scans

Yiheng Li, Francisco Carrillo-Perez, Mohammed Alawad, Olivier Gevaert

arxiv logopreprintMay 28 2025
Lung cancer is the leading cause of cancer mortality worldwide, and non-invasive methods for detecting key mutations and staging are essential for improving patient outcomes. Here, we compare the performance of two machine learning models - FMCIB+XGBoost, a supervised model with domain-specific pretraining, and Dinov2+ABMIL, a self-supervised model with attention-based multiple-instance learning - on 3D lung nodule data from the Stanford Radiogenomics and Lung-CT-PT-Dx cohorts. In the task of KRAS and EGFR mutation detection, FMCIB+XGBoost consistently outperformed Dinov2+ABMIL, achieving accuracies of 0.846 and 0.883 for KRAS and EGFR mutations, respectively. In cancer staging, Dinov2+ABMIL demonstrated competitive generalization, achieving an accuracy of 0.797 for T-stage prediction in the Lung-CT-PT-Dx cohort, suggesting SSL's adaptability across diverse datasets. Our results emphasize the clinical utility of supervised models in mutation detection and highlight the potential of SSL to improve staging generalization, while identifying areas for enhancement in mutation sensitivity.

Deep Learning-Based BMD Estimation from Radiographs with Conformal Uncertainty Quantification

Long Hui, Wai Lok Yeung

arxiv logopreprintMay 28 2025
Limited DXA access hinders osteoporosis screening. This proof-of-concept study proposes using widely available knee X-rays for opportunistic Bone Mineral Density (BMD) estimation via deep learning, emphasizing robust uncertainty quantification essential for clinical use. An EfficientNet model was trained on the OAI dataset to predict BMD from bilateral knee radiographs. Two Test-Time Augmentation (TTA) methods were compared: traditional averaging and a multi-sample approach. Crucially, Split Conformal Prediction was implemented to provide statistically rigorous, patient-specific prediction intervals with guaranteed coverage. Results showed a Pearson correlation of 0.68 (traditional TTA). While traditional TTA yielded better point predictions, the multi-sample approach produced slightly tighter confidence intervals (90%, 95%, 99%) while maintaining coverage. The framework appropriately expressed higher uncertainty for challenging cases. Although anatomical mismatch between knee X-rays and standard DXA limits immediate clinical use, this method establishes a foundation for trustworthy AI-assisted BMD screening using routine radiographs, potentially improving early osteoporosis detection.

Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

Siyeop Yoon, Sifan Song, Pengfei Jin, Matthew Tivnan, Yujin Oh, Sekeun Kim, Dufan Wu, Xiang Li, Quanzheng Li

arxiv logopreprintMay 28 2025
We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative process. An initial score-based diffusion model synthesizes low-resolution PET/CT volumes from demographic variables alone, providing global anatomical structures and approximate metabolic activity. This is followed by a super-resolution residual diffusion model that refines spatial resolution. Our framework was trained on 18-F FDG PET/CT scans from the AutoPET dataset and evaluated using organ-wise volume and standardized uptake value (SUV) distributions, comparing synthetic and real data between demographic subgroups. The organ-wise comparison demonstrated strong concordance between synthetic and real images. In particular, most deviations in metabolic uptake values remained within 3-5% of the ground truth in subgroup analysis. These findings highlight the potential of cascaded 3D diffusion models to generate anatomically and metabolically accurate PET/CT images, offering a robust alternative to traditional phantoms and enabling scalable, population-informed synthetic imaging for clinical and research applications.
Page 35 of 55546 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.