Sort by:
Page 248 of 3853843 results

Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features

Miguel A. Lago, Ghada Zamzmi, Brandon Eich, Jana G. Delfino

arxiv logopreprintJun 16 2025
Explainability features are intended to provide insight into the internal mechanisms of an AI device, but there is a lack of evaluation techniques for assessing the quality of provided explanations. We propose a framework to assess and report explainable AI features. Our evaluation framework for AI explainability is based on four criteria: 1) Consistency quantifies the variability of explanations to similar inputs, 2) Plausibility estimates how close the explanation is to the ground truth, 3) Fidelity assesses the alignment between the explanation and the model internal mechanisms, and 4) Usefulness evaluates the impact on task performance of the explanation. Finally, we developed a scorecard for AI explainability methods that serves as a complete description and evaluation to accompany this type of algorithm. We describe these four criteria and give examples on how they can be evaluated. As a case study, we use Ablation CAM and Eigen CAM to illustrate the evaluation of explanation heatmaps on the detection of breast lesions on synthetic mammographies. The first three criteria are evaluated for clinically-relevant scenarios. Our proposed framework establishes criteria through which the quality of explanations provided by AI models can be evaluated. We intend for our framework to spark a dialogue regarding the value provided by explainability features and help improve the development and evaluation of AI-based medical devices.

Integration of MRI radiomics and germline genetics to predict the IDH mutation status of gliomas.

Nakase T, Henderson GA, Barba T, Bareja R, Guerra G, Zhao Q, Francis SS, Gevaert O, Kachuri L

pubmed logopapersJun 16 2025
The molecular profiling of gliomas for isocitrate dehydrogenase (IDH) mutations currently relies on resected tumor samples, highlighting the need for non-invasive, preoperative biomarkers. We investigated the integration of glioma polygenic risk scores (PRS) and radiographic features for prediction of IDH mutation status. We used 256 radiomic features, a glioma PRS and demographic information in 158 glioma cases within elastic net and neural network models. The integration of glioma PRS with radiomics increased the area under the receiver operating characteristic curve (AUC) for distinguishing IDH-wildtype vs. IDH-mutant glioma from 0.83 to 0.88 (P<sub>ΔAUC</sub> = 6.9 × 10<sup>-5</sup>) in the elastic net model and from 0.91 to 0.92 (P<sub>ΔAUC</sub> = 0.32) in the neural network model. Incorporating age at diagnosis and sex further improved the classifiers (elastic net: AUC = 0.93, neural network: AUC = 0.93). Patients predicted to have IDH-mutant vs. IDH-wildtype tumors had significantly lower mortality risk (hazard ratio (HR) = 0.18, 95% CI: 0.08-0.40, P = 2.1 × 10<sup>-5</sup>), comparable to prognostic trajectories for biopsy-confirmed IDH status. The augmentation of imaging-based classifiers with genetic risk profiles may help delineate molecular subtypes and improve the timely, non-invasive clinical assessment of glioma patients.

A multimodal deep learning model for detecting endoscopic images of near-infrared fluorescence capsules.

Wang J, Zhou C, Wang W, Zhang H, Zhang A, Cui D

pubmed logopapersJun 15 2025
Early screening for gastrointestinal (GI) diseases is critical for preventing cancer development. With the rapid advancement of deep learning technology, artificial intelligence (AI) has become increasingly prominent in the early detection of GI diseases. Capsule endoscopy is a non-invasive medical imaging technique used to examine the gastrointestinal tract. In our previous work, we developed a near-infrared fluorescence capsule endoscope (NIRF-CE) capable of exciting and capturing near-infrared (NIR) fluorescence images to specifically identify subtle mucosal microlesions and submucosal abnormalities while simultaneously capturing conventional white-light images to detect lesions with significant morphological changes. However, limitations such as low camera resolution and poor lighting within the gastrointestinal tract may lead to misdiagnosis and other medical errors. Manually reviewing and interpreting large volumes of capsule endoscopy images is time-consuming and prone to errors. Deep learning models have shown potential in automatically detecting abnormalities in NIRF-CE images. This study focuses on an improved deep learning model called Retinex-Attention-YOLO (RAY), which is based on single-modality image data and built on the YOLO series of object detection models. RAY enhances the accuracy and efficiency of anomaly detection, especially under low-light conditions. To further improve detection performance, we also propose a multimodal deep learning model, Multimodal-Retinex-Attention-YOLO (MRAY), which combines both white-light and fluorescence image data. The dataset used in this study consists of images of pig stomachs captured by our NIRF-CE system, simulating the human GI tract. In conjunction with a targeted fluorescent probe, which accumulates at lesion sites and releases fluorescent signals for imaging when abnormalities are present, a bright spot indicates a lesion. The MRAY model achieved an impressive precision of 96.3%, outperforming similar object detection models. To further validate the model's performance, ablation experiments were conducted, and comparisons were made with publicly available datasets. MRAY shows great promise for the automated detection of GI cancers, ulcers, inflammations, and other medical conditions in clinical practice.

GM-LDM: Latent Diffusion Model for Brain Biomarker Identification through Functional Data-Driven Gray Matter Synthesis

Hu Xu, Yang Jingling, Jia Sihan, Bi Yuda, Calhoun Vince

arxiv logopreprintJun 15 2025
Generative models based on deep learning have shown significant potential in medical imaging, particularly for modality transformation and multimodal fusion in MRI-based brain imaging. This study introduces GM-LDM, a novel framework that leverages the latent diffusion model (LDM) to enhance the efficiency and precision of MRI generation tasks. GM-LDM integrates a 3D autoencoder, pre-trained on the large-scale ABCD MRI dataset, achieving statistical consistency through KL divergence loss. We employ a Vision Transformer (ViT)-based encoder-decoder as the denoising network to optimize generation quality. The framework flexibly incorporates conditional data, such as functional network connectivity (FNC) data, enabling personalized brain imaging, biomarker identification, and functional-to-structural information translation for brain diseases like schizophrenia.

Boundary-Aware Vision Transformer for Angiography Vascular Network Segmentation

Nabil Hezil, Suraj Singh, Vita Vlasova, Oleg Rogov, Ahmed Bouridane, Rifat Hamoudi

arxiv logopreprintJun 15 2025
Accurate segmentation of vascular structures in coronary angiography remains a core challenge in medical image analysis due to the complexity of elongated, thin, and low-contrast vessels. Classical convolutional neural networks (CNNs) often fail to preserve topological continuity, while recent Vision Transformer (ViT)-based models, although strong in global context modeling, lack precise boundary awareness. In this work, we introduce BAVT, a Boundary-Aware Vision Transformer, a ViT-based architecture enhanced with an edge-aware loss that explicitly guides the segmentation toward fine-grained vascular boundaries. Unlike hybrid transformer-CNN models, BAVT retains a minimal, scalable structure that is fully compatible with large-scale vision foundation model (VFM) pretraining. We validate our approach on the DCA-1 coronary angiography dataset, where BAVT achieves superior performance across medical image segmentation metrics outperforming both CNN and hybrid baselines. These results demonstrate the effectiveness of combining plain ViT encoders with boundary-aware supervision for clinical-grade vascular segmentation.

Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence

Maximilian Ferle, Jonas Ader, Thomas Wiemers, Nora Grieb, Adrian Lindenmeyer, Hans-Jonas Meyer, Thomas Neumuth, Markus Kreuz, Kristin Reiche, Maximilian Merz

arxiv logopreprintJun 15 2025
Risk stratification is a key tool in clinical decision-making, yet current approaches often fail to translate sophisticated survival analysis into actionable clinical criteria. We present a novel method for unsupervised machine learning that directly optimizes for survival heterogeneity across patient clusters through a differentiable adaptation of the multivariate logrank statistic. Unlike most existing methods that rely on proxy metrics, our approach represents novel methodology for training any neural network architecture on any data modality to identify prognostically distinct patient groups. We thoroughly evaluate the method in simulation experiments and demonstrate its utility in practice by applying it to two distinct cancer types: analyzing laboratory parameters from multiple myeloma patients and computed tomography images from non-small cell lung cancer patients, identifying prognostically distinct patient subgroups with significantly different survival outcomes in both cases. Post-hoc explainability analyses uncover clinically meaningful features determining the group assignments which align well with established risk factors and thus lend strong weight to the methods utility. This pan-cancer, model-agnostic approach represents a valuable advancement in clinical risk stratification, enabling the discovery of novel prognostic signatures across diverse data types while providing interpretable results that promise to complement treatment personalization and clinical decision-making in oncology and beyond.

FairICP: identifying biases and increasing transparency at the point of care in post-implementation clinical decision support using inductive conformal prediction.

Sun X, Nakashima M, Nguyen C, Chen PH, Tang WHW, Kwon D, Chen D

pubmed logopapersJun 15 2025
Fairness concerns stemming from known and unknown biases in healthcare practices have raised questions about the trustworthiness of Artificial Intelligence (AI)-driven Clinical Decision Support Systems (CDSS). Studies have shown unforeseen performance disparities in subpopulations when applied to clinical settings different from training. Existing unfairness mitigation strategies often struggle with scalability and accessibility, while their pursuit of group-level prediction performance parity does not effectively translate into fairness at the point of care. This study introduces FairICP, a flexible and cost-effective post-implementation framework based on Inductive Conformal Prediction (ICP), to provide users with actionable knowledge of model uncertainty due to subpopulation level biases at the point of care. FairICP applies ICP to identify the model's scope of competence through group specific calibration, ensuring equitable prediction reliability by filtering predictions that fall within the trusted competence boundaries. We evaluated FairICP against four benchmarks on three medical imaging modalities: (1) Cardiac Magnetic Resonance Imaging (MRI), (2) Chest X-ray and (3) Dermatology Imaging, acquired from both private and large public datasets. Frameworks are assessed on prediction performance enhancement and unfairness mitigation capabilities. Compared to the baseline, FairICP improved prediction accuracy by 7.2% and reduced the accuracy gap between the privileged and unprivileged subpopulations by 2.2% on average across all three datasets. Our work provides a robust solution to promote trust and transparency in AI-CDSS, fostering equality and equity in healthcare for diverse patient populations. Such post-process methods are critical to enabling a robust framework for AI-CDSS implementation and monitoring for healthcare settings.

A computed tomography angiography-based radiomics model for prognostic prediction of endovascular abdominal aortic repair.

Huang S, Liu D, Deng K, Shu C, Wu Y, Zhou Z

pubmed logopapersJun 15 2025
This study aims to develop a radiomics machine learning (ML) model that uses preoperative computed tomography angiography (CTA) data to predict the prognosis of endovascular aneurysm repair (EVAR) for abdominal aortic aneurysm (AAA) patients. In this retrospective study, 164 AAA patients underwent EVAR and were categorized into shrinkage (good prognosis) or stable (poor prognosis) groups based on post-EVAR sac regression. From preoperative AAA and perivascular adipose tissue (PVAT) image, radiomics features (RFs) were extracted for model creation. Patients were split into 80 % training and 20 % test sets. A support vector machine model was constructed for prediction. Accuracy is evaluated via the area under the receiver operating characteristic curve (AUC). Demographics and comorbidities showed no significant differences between shrinkage and stable groups. The model containing 5 AAA RFs (which are original_firstorder_InterquartileRange, log-sigma-3-0-mm-3D_glrlm_GrayLevelNonUniformityNormalized, log-sigma-3-0-mm-3D_glrlm_RunPercentage, log-sigma-4-0-mm-3D_glrlm_ShortRunLowGrayLevelEmphasis, wavelet-LLH_glcm_SumEntropy) had AUCs of 0.86 (training) and 0.77 (test). The model containing 7 PVAT RFs (which are log-sigma-3-0-mm-3D_firstorder_InterquartileRange, log-sigma-3-0-mm-3D_glcm_Correlation, wavelet-LHL_firstorder_Energy, wavelet-LHL_firstorder_TotalEnergy, wavelet-LHH_firstorder_Mean, wavelet-LHH_glcm_Idmn, wavelet-LHH_glszm_GrayLevelNonUniformityNormalized) had AUCs of 0.76 (training) and 0.78 (test). Combining AAA and PVAT RFs yielded the highest accuracy: AUCs of 0.93 (training) and 0.87 (test). Radiomics-based CTA model predicts aneurysm sac regression post-EVAR in AAA patients. PVAT RFs from preoperative CTA images were closely related to AAA prognosis after EVAR, enhancing accuracy when combined with AAA RFs. This preliminary study explores a predictive model designed to assist clinicians in optimizing therapeutic strategies during clinical decision-making processes.

Altered resting-state brain activity in patients with major depression disorder and bipolar disorder: A regional homogeneity analysis.

Han W, Su Y, Wang X, Yang T, Zhao G, Mao R, Zhu N, Zhou R, Wang X, Wang Y, Peng D, Wang Z, Fang Y, Chen J, Sun P

pubmed logopapersJun 15 2025
Major Depressive Disorder (MDD) and Bipolar Disorder (BD) exhibit overlapping depressive symptoms, complicating their differentiation in clinical practice. Traditional neuroimaging studies have focused on specific regions of interest, but few have employed whole-brain analyses like regional homogeneity (ReHo). This study aims to differentiate MDD from BD by identifying key brain regions with abnormal ReHo and using advanced machine learning techniques to improve diagnostic accuracy. A total of 63 BD patients, 65 MDD patients, and 70 healthy controls were recruited from the Shanghai Mental Health Center. Resting-state functional MRI (rs-fMRI) was used to analyze ReHo across the brain. We applied Support Vector Machine (SVM) and SVM-Recursive Feature Elimination (SVM-RFE), a robust machine learning model known for its high precision in feature selection and classification, to identify critical brain regions that could serve as biomarkers for distinguishing BD from MDD. SVM-RFE allows for the recursive removal of non-informative features, enhancing the model's ability to accurately classify patients. Correlations between ReHo values and clinical scores were also evaluated. ReHo analysis revealed significant differences in several brain regions. The study results revealed that, compared to healthy controls, both BD and MDD patients exhibited reduced ReHo in the superior parietal gyrus. Additionally, MDD patients showed decreased ReHo values in the Right Lenticular nucleus, putamen (PUT.R), Right Angular gyrus (ANG.R), and Left Superior occipital gyrus (SOG.L). Compared to the MDD group, BD patients exhibited increased ReHo values in the Left Inferior occipital gyrus (IOG.L). In BD patients only, the reduction in ReHo values in the right superior parietal gyrus and the right angular gyrus was positively correlated with Hamilton Depression Scale (HAMD) scores. SVM-RFE identified the IOG.L, SOG.L, and PUT.R as the most critical features, achieving an area under the curve (AUC) of 0.872, with high sensitivity and specificity in distinguishing BD from MDD. This study demonstrates that BD and MDD patients exhibit distinct patterns of regional brain activity, particularly in the occipital and parietal regions. The combination of ReHo analysis and SVM-RFE provides a powerful approach for identifying potential biomarkers, with the left inferior occipital gyrus, left superior occipital gyrus, and right putamen emerging as key differentiating regions. These findings offer valuable insights for improving the diagnostic accuracy between BD and MDD, contributing to more targeted treatment strategies.

Biological age prediction in schizophrenia using brain MRI, gut microbiome and blood data.

Han R, Wang W, Liao J, Peng R, Liang L, Li W, Feng S, Huang Y, Fong LM, Zhou J, Li X, Ning Y, Wu F, Wu K

pubmed logopapersJun 15 2025
The study of biological age prediction using various biological data has been widely explored. However, single biological data may offer limited insights into the pathological process of aging and diseases. Here we evaluated the performance of machine learning models for biological age prediction by using the integrated features from multi-biological data of 140 healthy controls and 43 patients with schizophrenia, including brain MRI, gut microbiome, and blood data. Our results revealed that the models using multi-biological data achieved higher predictive accuracy than those using only brain MRI. Feature interpretability analysis of the optimal model elucidated that the substantial contributions of the frontal lobe, the temporal lobe and the fornix were effective for biological age prediction. Notably, patients with schizophrenia exhibited a pronounced increase in the predicted biological age gap (BAG) when compared to healthy controls. Moreover, the BAG in the SZ group was negatively and positively correlated with the MCCB and PANSS scores, respectively. These findings underscore the potential of BAG as a valuable biomarker for assessing cognitive decline and symptom severity of neuropsychiatric disorders.
Page 248 of 3853843 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.