Sort by:
Page 150 of 1701699 results

MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

Yinghao Zhu, Ziyi He, Haoran Hu, Xiaochen Zheng, Xichen Zhang, Zixiang Wang, Junyi Gao, Liantao Ma, Lequan Yu

arxiv logopreprintMay 18 2025
The rapid advancement of Large Language Models (LLMs) has stimulated interest in multi-agent collaboration for addressing complex medical tasks. However, the practical advantages of multi-agent collaboration approaches remain insufficiently understood. Existing evaluations often lack generalizability, failing to cover diverse tasks reflective of real-world clinical practice, and frequently omit rigorous comparisons against both single-LLM-based and established conventional methods. To address this critical gap, we introduce MedAgentBoard, a comprehensive benchmark for the systematic evaluation of multi-agent collaboration, single-LLM, and conventional approaches. MedAgentBoard encompasses four diverse medical task categories: (1) medical (visual) question answering, (2) lay summary generation, (3) structured Electronic Health Record (EHR) predictive modeling, and (4) clinical workflow automation, across text, medical images, and structured EHR data. Our extensive experiments reveal a nuanced landscape: while multi-agent collaboration demonstrates benefits in specific scenarios, such as enhancing task completeness in clinical workflow automation, it does not consistently outperform advanced single LLMs (e.g., in textual medical QA) or, critically, specialized conventional methods that generally maintain better performance in tasks like medical VQA and EHR-based prediction. MedAgentBoard offers a vital resource and actionable insights, emphasizing the necessity of a task-specific, evidence-based approach to selecting and developing AI solutions in medicine. It underscores that the inherent complexity and overhead of multi-agent collaboration must be carefully weighed against tangible performance gains. All code, datasets, detailed prompts, and experimental results are open-sourced at https://medagentboard.netlify.app/.

Harnessing Artificial Intelligence for Accurate Diagnosis and Radiomics Analysis of Combined Pulmonary Fibrosis and Emphysema: Insights from a Multicenter Cohort Study

Zhang, S., Wang, H., Tang, H., Li, X., Wu, N.-W., Lang, Q., Li, B., Zhu, H., Chen, X., Chen, K., Xie, B., Zhou, A., Mo, C.

medrxiv logopreprintMay 18 2025
Combined Pulmonary Fibrosis and Emphysema (CPFE), formally recognized as a distinct pulmonary syndrome in 2022, is characterized by unique clinical features and pathogenesis that may lead to respiratory failure and death. However, the diagnosis of CPFE presents significant challenges that hinder effective treatment. Here, we assembled three-dimensional (3D) reconstruction data of the chest High-Resolution Computed Tomography (HRCT) of patients from multiple hospitals across different provinces in China, including Xiangya Hospital, West China Hospital, and Fujian Provincial Hospital. Using this dataset, we developed CPFENet, a deep learning-based diagnostic model for CPFE. It accurately differentiates CPFE from COPD, with performance comparable to that of professional radiologists. Additionally, we developed a CPFE score based on radiomic analysis of 3D CT images to quantify disease characteristics. Notably, female patients demonstrated significantly higher CPFE scores than males, suggesting potential sex-specific differences in CPFE. Overall, our study establishes the first diagnostic framework for CPFE, providing a diagnostic model and clinical indicators that enable accurate classification and characterization of the syndrome.

Deep learning feature-based model for predicting lymphovascular invasion in urothelial carcinoma of bladder using CT images.

Xiao B, Lv Y, Peng C, Wei Z, Xv Q, Lv F, Jiang Q, Liu H, Li F, Xv Y, He Q, Xiao M

pubmed logopapersMay 18 2025
Lymphovascular invasion significantly impacts the prognosis of urothelial carcinoma of the bladder. Traditional lymphovascular invasion detection methods are time-consuming and costly. This study aims to develop a deep learning-based model to preoperatively predict lymphovascular invasion status in urothelial carcinoma of bladder using CT images. Data and CT images of 577 patients across four medical centers were retrospectively collected. The largest tumor slices from the transverse, coronal, and sagittal planes were selected and used to train CNN models (InceptionV3, DenseNet121, ResNet18, ResNet34, ResNet50, and VGG11). Deep learning features were extracted and visualized using Grad-CAM. Principal Component Analysis reduced features to 64. Using the extracted features, Decision Tree, XGBoost, and LightGBM models were trained with 5-fold cross-validation and ensembled in a stacking model. Clinical risk factors were identified through logistic regression analyses and combined with DL scores to enhance lymphovascular invasion prediction accuracy. The ResNet50-based model achieved an AUC of 0.818 in the validation set and 0.708 in the testing set. The combined model showed an AUC of 0.794 in the validation set and 0.767 in the testing set, demonstrating robust performance across diverse data. We developed a robust radiomics model based on deep learning features from CT images to preoperatively predict lymphovascular invasion status in urothelial carcinoma of the bladder. This model offers a non-invasive, cost-effective tool to assist clinicians in personalized treatment planning. We developed a robust radiomics model based on deep learning features from CT images to preoperatively predict lymphovascular invasion status in urothelial carcinoma of the bladder. We developed a deep learning feature-based stacking model to predict lymphovascular invasion in urothelial carcinoma of the bladder patients using CT. Max cross sections from three dimensions of the CT image are used to train the CNN model. We made comparisons across six CNN networks, including ResNet50.

ChatGPT-4-Driven Liver Ultrasound Radiomics Analysis: Advantages and Drawbacks Compared to Traditional Techniques.

Sultan L, Venkatakrishna SSB, Anupindi S, Andronikou S, Acord M, Otero H, Darge K, Sehgal C, Holmes J

pubmed logopapersMay 18 2025
Artificial intelligence (AI) is transforming medical imaging, with large language models such as ChatGPT-4 emerging as potential tools for automated image interpretation. While AI-driven radiomics has shown promise in diagnostic imaging, the efficacy of ChatGPT-4 in liver ultrasound analysis remains largely unexamined. This study evaluates the capability of ChatGPT-4 in liver ultrasound radiomics, specifically its ability to differentiate fibrosis, steatosis, and normal liver tissue, compared to conventional image analysis software. Seventy grayscale ultrasound images from a preclinical liver disease model, including fibrosis (n=31), fatty liver (n=18), and normal liver (n=21), were analyzed. ChatGPT-4 extracted texture features, which were compared to those obtained using Interactive Data Language (IDL), a traditional image analysis software. One-way ANOVA was used to identify statistically significant features differentiating liver conditions, and logistic regression models were employed to assess diagnostic performance. ChatGPT-4 extracted nine key textural features-echo intensity, heterogeneity, skewness, kurtosis, contrast, homogeneity, dissimilarity, angular second moment, and entropy-all of which significantly differed across liver conditions (p < 0.05). Among individual features, echo intensity achieved the highest F1-score (0.85). When combined, ChatGPT-4 attained 76% accuracy and 83% sensitivity in classifying liver disease. ROC analysis demonstrated strong discriminatory performance, with AUC values of 0.75 for fibrosis, 0.87 for normal liver, and 0.97 for steatosis. Compared to Interactive Data Language (IDL) image analysis software, ChatGPT-4 exhibited slightly lower sensitivity (0.83 vs. 0.89) but showed moderate correlation (R = 0.68, p < 0.0001) with IDL-derived features. However, it significantly outperformed IDL in processing efficiency, reducing analysis time by 40%, highlighting its potential for high throughput radiomic analysis. Despite slightly lower sensitivity than IDL, ChatGPT-4 demonstrated high feasibility for ultrasound radiomics, offering faster processing, high-throughput analysis, and automated multi-image evaluation. These findings support its potential integration into AI-driven imaging workflows, with further refinements needed to enhance feature reproducibility and diagnostic accuracy.

A self-supervised multimodal deep learning approach to differentiate post-radiotherapy progression from pseudoprogression in glioblastoma.

Gomaa A, Huang Y, Stephan P, Breininger K, Frey B, Dörfler A, Schnell O, Delev D, Coras R, Donaubauer AJ, Schmitter C, Stritzelberger J, Semrau S, Maier A, Bayer S, Schönecker S, Heiland DH, Hau P, Gaipl US, Bert C, Fietkau R, Schmidt MA, Putz F

pubmed logopapersMay 17 2025
Accurate differentiation of pseudoprogression (PsP) from True Progression (TP) following radiotherapy (RT) in glioblastoma patients is crucial for optimal treatment planning. However, this task remains challenging due to the overlapping imaging characteristics of PsP and TP. This study therefore proposes a multimodal deep-learning approach utilizing complementary information from routine anatomical MR images, clinical parameters, and RT treatment planning information for improved predictive accuracy. The approach utilizes a self-supervised Vision Transformer (ViT) to encode multi-sequence MR brain volumes to effectively capture both global and local context from the high dimensional input. The encoder is trained in a self-supervised upstream task on unlabeled glioma MRI datasets from the open BraTS2021, UPenn-GBM, and UCSF-PDGM datasets (n = 2317 MRI studies) to generate compact, clinically relevant representations from FLAIR and T1 post-contrast sequences. These encoded MR inputs are then integrated with clinical data and RT treatment planning information through guided cross-modal attention, improving progression classification accuracy. This work was developed using two datasets from different centers: the Burdenko Glioblastoma Progression Dataset (n = 59) for training and validation, and the GlioCMV progression dataset from the University Hospital Erlangen (UKER) (n = 20) for testing. The proposed method achieved competitive performance, with an AUC of 75.3%, outperforming the current state-of-the-art data-driven approaches. Importantly, the proposed approach relies solely on readily available anatomical MRI sequences, clinical data, and RT treatment planning information, enhancing its clinical feasibility. The proposed approach addresses the challenge of limited data availability for PsP and TP differentiation and could allow for improved clinical decision-making and optimized treatment plans for glioblastoma patients.

Computational modeling of breast tissue mechanics and machine learning in cancer diagnostics: enhancing precision in risk prediction and therapeutic strategies.

Ashi L, Taurin S

pubmed logopapersMay 17 2025
Breast cancer remains a significant global health issue. Despite advances in detection and treatment, its complexity is driven by genetic, environmental, and structural factors. Computational methods like Finite Element Modeling (FEM) have transformed our understanding of breast cancer risk and progression. Advanced computational approaches in breast cancer research are the focus, with an emphasis on FEM's role in simulating breast tissue mechanics and enhancing precision in therapies such as radiofrequency ablation (RFA). Machine learning (ML), particularly Convolutional Neural Networks (CNNs), has revolutionized imaging modalities like mammograms and MRIs, improving diagnostic accuracy and early detection. AI applications in analyzing histopathological images have advanced tumor classification and grading, offering consistency and reducing inter-observer variability. Explainability tools like Grad-CAM, SHAP, and LIME enhance the transparency of AI-driven models, facilitating their integration into clinical workflows. Integrating FEM and ML represents a paradigm shift in breast cancer management. FEM offers precise modeling of tissue mechanics, while ML excels in predictive analytics and image analysis. Despite challenges such as data variability and limited standardization, synergizing these approaches promises adaptive, personalized care. These computational methods have the potential to redefine diagnostics, optimize treatment, and improve patient outcomes.

Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study.

Khazrak I, Zainaee S, M Rezaee M, Ghasemi M, C Green R

pubmed logopapersMay 17 2025
Voice disorders (VD) are often linked to vocal fold structural pathologies (VFSP). Laryngeal imaging plays a vital role in assessing VFSPs and VD in clinical and research settings, but challenges like scarce and imbalanced datasets can limit the generalizability of findings. Denoising Diffusion Probabilistic Models (DDPMs), a subtype of Generative AI, has gained attention for its ability to generate high-quality and realistic synthetic images to address these challenges. This study explores the feasibility of improving VFSP image classification by generating synthetic images using DDPMs. 404 laryngoscopic images depicting VF without and with VFSP were included. DDPMs were used to generate synthetic images to augment the original dataset. Two convolutional neural network architectures, VGG16 and ResNet50, were applied for model training. The models were initially trained only on the original dataset. Then, they were trained on the augmented datasets. Evaluation metrics were analyzed to assess the performance of the models for both binary classification (with/without VFSPs) and multi-class classification (seven specific VFSPs). Realistic and high-quality synthetic images were generated for dataset augmentation. The model first failed to converge when trained only on the original dataset, but they successfully converged and achieved low loss and high accuracy when trained on the augmented datasets. The best performance was gained for both binary and multi-class classification when the models were trained on an augmented dataset. Generating realistic images of VFSP using DDPMs is feasible and can enhance the classification of VFSPs by an AI model and may support VD screening and diagnosis.

ML-Driven Alzheimer 's disease prediction: A deep ensemble modeling approach.

Jumaili MLF, Sonuç E

pubmed logopapersMay 17 2025
Alzheimer's disease (AD) is a progressive neurological disorder characterized by cognitive decline due to brain cell death, typically manifesting later in life.Early and accurate detection is critical for effective disease management and treatment. This study proposes an ensemble learning framework that combines five deep learning architectures (VGG16, VGG19, ResNet50, InceptionV3, and EfficientNetB7) to improve the accuracy of AD diagnosis. We use a comprehensive dataset of 3,714 MRI brain scans collected from specialized clinics in Iraq, categorized into three classes: NonDemented (834 images), MildDemented (1,824 images), and VeryDemented (1,056 images). The proposed voting ensemble model achieves a diagnostic accuracy of 99.32% on our dataset. The effectiveness of the model is further validated on two external datasets: OASIS (achieving 86.6% accuracy) and ADNI (achieving 99.5% accuracy), demonstrating competitive performance compared to existing approaches. Moreover, the proposed model exhibits high precision and recall across all stages of dementia, providing a reliable and robust tool for early AD detection. This study highlights the effectiveness of ensemble learning in AD diagnosis and shows promise for clinical applications.

Prediction of cervical spondylotic myelopathy from a plain radiograph using deep learning with convolutional neural networks.

Tachi H, Kokabu T, Suzuki H, Ishikawa Y, Yabu A, Yanagihashi Y, Hyakumachi T, Shimizu T, Endo T, Ohnishi T, Ukeba D, Sudo H, Yamada K, Iwasaki N

pubmed logopapersMay 17 2025
This study aimed to develop deep learning algorithms (DLAs) utilising convolutional neural networks (CNNs) to classify cervical spondylotic myelopathy (CSM) and cervical spondylotic radiculopathy (CSR) from plain cervical spine radiographs. Data from 300 patients (150 with CSM and 150 with CSR) were used for internal validation (IV) using five-fold cross-validation strategy. Additionally, 100 patients (50 with CSM and 50 with CSR) were included in the external validation (EV). Two DLAs were trained using CNNs on plain radiographs from C3-C6 for the binary classification of CSM and CSR, and for the prediction of the spinal canal area rate using magnetic resonance imaging. Model performance was evaluated on external data using metrics such as area under the curve (AUC), accuracy, and likelihood ratios. For the binary classification, the AUC ranged from 0.84 to 0.96, with accuracy between 78% and 95% during IV. In the EV, the AUC and accuracy were 0.96 and 90%, respectively. For the spinal canal area rate, correlation coefficients during five-fold cross-validation ranged from 0.57 to 0.64, with a mean correlation of 0.61 observed in the EV. DLAs developed with CNNs demonstrated promising accuracy for classifying CSM and CSR from plain radiographs. These algorithms have the potential to assist non-specialists in identifying patients who require further evaluation or referral to spine specialists, thereby reducing delays in the diagnosis and treatment of CSM.

Development of a deep-learning algorithm for etiological classification of subarachnoid hemorrhage using non-contrast CT scans.

Chen L, Wang X, Li Y, Bao Y, Wang S, Zhao X, Yuan M, Kang J, Sun S

pubmed logopapersMay 17 2025
This study aims to develop a deep learning algorithm for differentiating aneurysmal subarachnoid hemorrhage (aSAH) from non-aneurysmal subarachnoid hemorrhage (naSAH) using non-contrast computed tomography (NCCT) scans. This retrospective study included 618 patients diagnosed with SAH. The dataset was divided into a training and internal validation cohort (533 cases: aSAH = 305, naSAH = 228) and an external test cohort (85 cases: aSAH = 55, naSAH = 30). Hemorrhage regions were automatically segmented using a U-Net + + architecture. A ResNet-based deep learning model was trained to classify the etiology of SAH. The model achieved robust performance in distinguishing aSAH from naSAH. In the internal validation cohort, it yielded an average sensitivity of 0.898, specificity of 0.877, accuracy of 0.889, Matthews correlation coefficient (MCC) of 0.777, and an area under the curve (AUC) of 0.948 (95% CI: 0.929-0.967). In the external test cohort, the model demonstrated an average sensitivity of 0.891, specificity of 0.880, accuracy of 0.887, MCC of 0.761, and AUC of 0.914 (95% CI: 0.889-0.940), outperforming junior radiologists (average accuracy: 0.836; MCC: 0.660). The study presents a deep learning architecture capable of accurately identifying SAH etiology from NCCT scans. The model's high diagnostic performance highlights its potential to support rapid and precise clinical decision-making in emergency settings. Question Differentiating aneurysmal from naSAH is crucial for timely treatment, yet existing imaging modalities are not universally accessible or convenient for rapid diagnosis. Findings A ResNet-variant-based deep learning model utilizing non-contrast CT scans demonstrated high accuracy in classifying SAH etiology and enhanced junior radiologists' diagnostic performance. Clinical relevance AI-driven analysis of non-contrast CT scans provides a fast, cost-effective, and non-invasive solution for preoperative SAH diagnosis. This approach facilitates early identification of patients needing aneurysm surgery while minimizing unnecessary angiography in non-aneurysmal cases, enhancing clinical workflow efficiency.
Page 150 of 1701699 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.