Latest Papers on Radiology AI. Tags: Mixed Modality

Predicting hemorrhagic transformation in acute ischemic stroke: a systematic review, meta-analysis, and methodological quality assessment of CT/MRI-based deep learning and radiomics models.

Salimi M, Vadipour P, Bahadori AR, Houshi S, Mirshamsi A, Fatemian H

•papers•Jun 1 2025

Acute ischemic stroke (AIS) is a major cause of mortality and morbidity, with hemorrhagic transformation (HT) as a severe complication. Accurate prediction of HT is essential for optimizing treatment strategies. This review assesses the accuracy and utility of deep learning (DL) and radiomics in predicting HT through imaging, regarding clinical decision-making for AIS patients. A literature search was conducted across five databases (Pubmed, Scopus, Web of Science, Embase, IEEE) up to January 23, 2025. Studies involving DL or radiomics-based ML models for predicting HT in AIS patients were included. Data from training, validation, and clinical-combined models were extracted and analyzed separately. Pooled sensitivity, specificity, and AUC were calculated with a random-effects bivariate model. For the quality assessment of studies, the Methodological Radiomics Score (METRICS) and QUADAS-2 tool were used. 16 studies consisting of 3,083 individual participants were included in the meta-analysis. The pooled AUC for training cohorts was 0.87, sensitivity 0.80, and specificity 0.85. For validation cohorts, AUC was 0.87, sensitivity 0.81, and specificity 0.86. Clinical-combined models showed an AUC of 0.93, sensitivity 0.84, and specificity 0.89. Moderate to severe heterogeneity was noted and addressed. Deep-learning models outperformed radiomics models, while clinical-combined models outperformed deep learning-only and radiomics-only models. The average METRICS score was 62.85%. No publication bias was detected. DL and radiomics models showed great potential in predicting HT in AIS patients. However, addressing methodological issues-such as inconsistent reference standards and limited external validation-is essential for the clinical implementation of these models.

Mixed Modality Classification Neurological Meta Analysis In Silico Academic Lab Benchmark SOTA

Structural and metabolic topological alterations associated with butylphthalide treatment in mild cognitive impairment: Data from a randomized, double-blind, placebo-controlled trial.

Han X, Gong S, Gong J, Wang P, Li R, Chen R, Xu C, Sun W, Li S, Chen Y, Yang Y, Luan H, Wen B, Guo J, Lv S, Wei C

•papers•Jun 1 2025

Effective intervention for mild cognitive impairment (MCI) is key for preventing dementia. As a neuroprotective agent, butylphthalide has the potential to treat MCI due to Alzheimer disease (AD). However, the pharmacological mechanism of butylphthalide from the brain network perspective is not clear. Therefore, we aimed to investigate the multimodal brain network changes associated with butylphthalide treatment in MCI due to AD. A total of 270 patients with MCI due to AD received either butylphthalide or placebo at a ratio of 1:1 for 1 year. Effective treatment was defined as a decrease in the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-cog) > 2.5. Brain networks were constructed using T1-magnetic resonance imaging and fluorodeoxyglucose positron emission tomography. A support vector machine was applied to develop predictive models. Both treatment (drug vs. placebo)-time interactions and efficacy (effective vs. ineffective)-time interactions were detected on some overlapping structural network metrics. Simple effects analyses revealed a significantly increased global efficiency in the structural network under both treatment and effective treatment of butylphthalide. Among the overlapping metrics, an increased degree centrality of left paracentral lobule was significantly related to poorer cognitive improvement. The predictive model based on baseline multimodal network metrics exhibited high accuracy (88.93%) of predicting butylphthalide's efficacy. Butylphthalide may restore abnormal organization in structural networks of patients with MCI due to AD, and baseline network metrics could be predictive markers for therapeutic efficacy of butylphthalide. This study was registered in the Chinese Clinical Trial Registry (Registration Number: ChiCTR1800018362, Registration Date: 2018-09-13).

Mixed Modality Classification Neurological RCT Clinical Pilot Academic Lab Benchmark SOTA

Deep learning driven interpretable and informed decision making model for brain tumour prediction using explainable AI.

Adnan KM, Ghazal TM, Saleem M, Farooq MS, Yeun CY, Ahmad M, Lee SW

•papers•Jun 1 2025

Brain Tumours are highly complex, particularly when it comes to their initial and accurate diagnosis, as this determines patient prognosis. Conventional methods rely on MRI and CT scans and employ generic machine learning techniques, which are heavily dependent on feature extraction and require human intervention. These methods may fail in complex cases and do not produce human-interpretable results, making it difficult for clinicians to trust the model's predictions. Such limitations prolong the diagnostic process and can negatively impact the quality of treatment. The advent of deep learning has made it a powerful tool for complex image analysis tasks, such as detecting brain Tumours, by learning advanced patterns from images. However, deep learning models are often considered "black box" systems, where the reasoning behind predictions remains unclear. To address this issue, the present study applies Explainable AI (XAI) alongside deep learning for accurate and interpretable brain Tumour prediction. XAI enhances model interpretability by identifying key features such as Tumour size, location, and texture, which are crucial for clinicians. This helps build their confidence in the model and enables them to make better-informed decisions. In this research, a deep learning model integrated with XAI is proposed to develop an interpretable framework for brain Tumour prediction. The model is trained on an extensive dataset comprising imaging and clinical data and demonstrates high AUC while leveraging XAI for model explainability and feature selection. The study findings indicate that this approach improves predictive performance, achieving an accuracy of 92.98% and a miss rate of 7.02%. Additionally, interpretability tools such as LIME and Grad-CAM provide clinicians with a clearer understanding of the decision-making process, supporting diagnosis and treatment. This model represents a significant advancement in brain Tumour prediction, with the potential to enhance patient outcomes and contribute to the field of neuro-oncology.

Mixed Modality Classification Neurological Methodology In Silico Academic Lab GenAI

A Large Language Model to Detect Negated Expressions in Radiology Reports.

Su Y, Babore YB, Kahn CE

•papers•Jun 1 2025

Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab GenAI

Deep Conformal Supervision: Leveraging Intermediate Features for Robust Uncertainty Quantification.

Vahdani AM, Faghani S

•papers•Jun 1 2025

Trustworthiness is crucial for artificial intelligence (AI) models in clinical settings, and a fundamental aspect of trustworthy AI is uncertainty quantification (UQ). Conformal prediction as a robust uncertainty quantification (UQ) framework has been receiving increasing attention as a valuable tool in improving model trustworthiness. An area of active research is the method of non-conformity score calculation for conformal prediction. We propose deep conformal supervision (DCS), which leverages the intermediate outputs of deep supervision for non-conformity score calculation, via weighted averaging based on the inverse of mean calibration error for each stage. We benchmarked our method on two publicly available datasets focused on medical image classification: a pneumonia chest radiography dataset and a preprocessed version of the 2019 RSNA Intracranial Hemorrhage dataset. Our method achieved mean coverage errors of 16e-4 (CI: 1e-4, 41e-4) and 5e-4 (CI: 1e-4, 10e-4) compared to baseline mean coverage errors of 28e-4 (CI: 2e-4, 64e-4) and 21e-4 (CI: 8e-4, 3e-4) on the two datasets, respectively (p < 0.001 on both datasets). Based on our findings, the baseline results of conformal prediction already exhibit small coverage errors. However, our method shows a significant improvement on coverage error, particularly noticeable in scenarios involving smaller datasets or when considering smaller acceptable error levels, which are crucial in developing UQ frameworks for healthcare AI applications.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Multi-modal large language models in radiology: principles, applications, and potential.

Shen Y, Xu Y, Ma J, Rui W, Zhao C, Heacock L, Huang C

•papers•Jun 1 2025

Large language models (LLMs) and multi-modal large language models (MLLMs) represent the cutting-edge in artificial intelligence. This review provides a comprehensive overview of their capabilities and potential impact on radiology. Unlike most existing literature reviews focusing solely on LLMs, this work examines both LLMs and MLLMs, highlighting their potential to support radiology workflows such as report generation, image interpretation, EHR summarization, differential diagnosis generation, and patient education. By streamlining these tasks, LLMs and MLLMs could reduce radiologist workload, improve diagnostic accuracy, support interdisciplinary collaboration, and ultimately enhance patient care. We also discuss key limitations, such as the limited capacity of current MLLMs to interpret 3D medical images and to integrate information from both image and text data, as well as the lack of effective evaluation methods. Ongoing efforts to address these challenges are introduced.

Mixed Modality LLM Radiology Report Review Concept Academic Lab GenAI

Preliminary study on detection and diagnosis of focal liver lesions based on a deep learning model using multimodal PET/CT images.

Luo Y, Yang Q, Hu J, Qin X, Jiang S, Liu Y

•papers•Jun 1 2025

To develop and validate a deep learning model using multimodal PET/CT imaging for detecting and classifying focal liver lesions (FLL). This study included 185 patients who underwent <sup>18</sup>F-FDG PET/CT imaging at our institution from March 2022 to February 2023. We analyzed serological data and imaging. Liver lesions were segmented on PET and CT, serving as the "reference standard". Deep learning models were trained using PET and CT images to generate predicted segmentations and classify lesion nature. Model performance was evaluated by comparing the predicted segmentations with the reference segmentations, using metrics such as Dice, Precision, Recall, F1-score, ROC, and AUC, and compared it with physician diagnoses. This study finally included 150 patients, comprising 46 patients with benign liver nodules, 51 patients with malignant liver nodules, and 53 patients with no FLLs. Significant differences were observed among groups for age, AST, ALP, GGT, AFP, CA19-9and CEA. On the validation set, the Dice coefficient of the model was 0.740. For the normal group, the recall was 0.918, precision was 0.904, F1-score was 0.909, and AUC was 0.976. For the benign group, the recall was 0.869, precision was 0.862, F1-score was 0.863, and AUC was 0.928. For the malignant group, the recall was 0.858, precision was 0.914, F1-score was 0.883, and AUC was 0.979. The model's overall diagnostic performance was between that of junior and senior physician. This deep learning model demonstrated high sensitivity in detecting FLLs and effectively differentiated between benign and malignant lesions.

Mixed Modality Detection Abdominal Retrospective Clinical In Silico Academic Lab

Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors.

Hu X, Xu D, Zhang H, Tang M, Gao Q

•papers•Jun 1 2025

In clinical practice, distinguishing between spinal tuberculosis (STB) and spinal tumors (ST) poses a significant diagnostic challenge. The application of AI-driven large language models (LLMs) shows great potential for improving the accuracy of this differential diagnosis. To evaluate the performance of various machine learning models and ChatGPT-4 in distinguishing between STB and ST. A retrospective cohort study. A total of 143 STB cases and 153 ST cases admitted to Xiangya Hospital Central South University, from January 2016 to June 2023 were collected. This study incorporates basic patient information, standard laboratory results, serum tumor markers, and comprehensive imaging records, including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT), for individuals diagnosed with STB and ST. Machine learning techniques and ChatGPT-4 were utilized to distinguish between STB and ST separately. Six distinct machine learning models, along with ChatGPT-4, were employed to evaluate their differential diagnostic effectiveness. Among the 6 machine learning models, the Gradient Boosting Machine (GBM) algorithm model demonstrated the highest differential diagnostic efficiency. In the training cohort, the GBM model achieved a sensitivity of 98.84% and a specificity of 100.00% in distinguishing STB from ST. In the testing cohort, its sensitivity was 98.25%, and specificity was 91.80%. ChatGPT-4 exhibited a sensitivity of 70.37% and a specificity of 90.65% for differential diagnosis. In single-question cases, ChatGPT-4's sensitivity and specificity were 71.67% and 92.55%, respectively, while in re-questioning cases, they were 44.44% and 76.92%. The GBM model demonstrates significant value in the differential diagnosis of STB and ST, whereas the diagnostic performance of ChatGPT-4 remains suboptimal.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

"Advances in biomarker discovery and diagnostics for alzheimer's disease".

Bhatia V, Chandel A, Minhas Y, Kushawaha SK

•papers•Jun 1 2025

Alzheimer's disease (AD) is a progressive neurodegenerative disorder characterized by intracellular neurofibrillary tangles with tau protein and extracellular β-amyloid plaques. Early and accurate diagnosis is crucial for effective treatment and management. The purpose of this review is to investigate new technologies that improve diagnostic accuracy while looking at the current diagnostic criteria for AD, such as clinical evaluations, cognitive testing, and biomarker-based techniques. A thorough review of the literature was done in order to assess both conventional and contemporary diagnostic methods. Multimodal strategies integrating clinical, imaging, and biochemical evaluations were emphasised. The promise of current developments in biomarker discovery was also examined, including mass spectrometry and artificial intelligence. Current diagnostic approaches include cerebrospinal fluid (CSF) biomarkers, imaging tools (MRI, PET), cognitive tests, and new blood-based markers. Integrating these technologies into multimodal diagnostic procedures enhances diagnostic accuracy and distinguishes dementia from other conditions. New technologies that hold promise for improving biomarker identification and diagnostic reliability include mass spectrometry and artificial intelligence. Advancements in AD diagnostics underscore the need for accessible, minimally invasive, and cost-effective techniques to facilitate early detection and intervention. The integration of novel technologies with traditional methods may significantly enhance the accuracy and feasibility of AD diagnosis.

Mixed Modality Classification Neurological Review Concept Academic Lab

Influence of prior probability information on large language model performance in radiological diagnosis.

Fukushima T, Kurokawa R, Hagiwara A, Sonoda Y, Asari Y, Kurokawa M, Kanzawa J, Gonoi W, Abe O

•papers•Jun 1 2025

Large language models (LLMs) show promise in radiological diagnosis, but their performance may be affected by the context of the cases presented. Our purpose is to investigate how providing information about prior probabilities influences the diagnostic performance of an LLM in radiological quiz cases. We analyzed 322 consecutive cases from Radiology's "Diagnosis Please" quiz using Claude 3.5 Sonnet under three conditions: without context (Condition 1), informed as quiz cases (Condition 2), and presented as primary care cases (Condition 3). Diagnostic accuracy was compared using McNemar's test. The overall accuracy rate significantly improved in Condition 2 compared to Condition 1 (70.2% vs. 64.9%, p = 0.029). Conversely, the accuracy rate significantly decreased in Condition 3 compared to Condition 1 (59.9% vs. 64.9%, p = 0.027). Providing information that may influence prior probabilities significantly affects the diagnostic performance of the LLM in radiological cases. This suggests that LLMs may incorporate Bayesian-like principles and adjust the weighting of their diagnostic responses based on prior information, highlighting the potential for optimizing LLM's performance in clinical settings by providing relevant contextual information.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Filter Papers

Tags

Predicting hemorrhagic transformation in acute ischemic stroke: a systematic review, meta-analysis, and methodological quality assessment of CT/MRI-based deep learning and radiomics models.

Structural and metabolic topological alterations associated with butylphthalide treatment in mild cognitive impairment: Data from a randomized, double-blind, placebo-controlled trial.

Deep learning driven interpretable and informed decision making model for brain tumour prediction using explainable AI.

A Large Language Model to Detect Negated Expressions in Radiology Reports.

Deep Conformal Supervision: Leveraging Intermediate Features for Robust Uncertainty Quantification.

Multi-modal large language models in radiology: principles, applications, and potential.

Preliminary study on detection and diagnosis of focal liver lesions based on a deep learning model using multimodal PET/CT images.

Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors.

"Advances in biomarker discovery and diagnostics for alzheimer's disease".

Influence of prior probability information on large language model performance in radiological diagnosis.

Ready to Sharpen Your Edge?