Sort by:
Page 5 of 876 results

Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features

Miguel A. Lago, Ghada Zamzmi, Brandon Eich, Jana G. Delfino

arxiv logopreprintJun 16 2025
Explainability features are intended to provide insight into the internal mechanisms of an AI device, but there is a lack of evaluation techniques for assessing the quality of provided explanations. We propose a framework to assess and report explainable AI features. Our evaluation framework for AI explainability is based on four criteria: 1) Consistency quantifies the variability of explanations to similar inputs, 2) Plausibility estimates how close the explanation is to the ground truth, 3) Fidelity assesses the alignment between the explanation and the model internal mechanisms, and 4) Usefulness evaluates the impact on task performance of the explanation. Finally, we developed a scorecard for AI explainability methods that serves as a complete description and evaluation to accompany this type of algorithm. We describe these four criteria and give examples on how they can be evaluated. As a case study, we use Ablation CAM and Eigen CAM to illustrate the evaluation of explanation heatmaps on the detection of breast lesions on synthetic mammographies. The first three criteria are evaluated for clinically-relevant scenarios. Our proposed framework establishes criteria through which the quality of explanations provided by AI models can be evaluated. We intend for our framework to spark a dialogue regarding the value provided by explainability features and help improve the development and evaluation of AI-based medical devices.

MAMBO: High-Resolution Generative Approach for Mammography Images

Milica Škipina, Nikola Jovišić, Nicola Dall'Asen, Vanja Švenda, Anil Osman Tur, Slobodan Ilić, Elisa Ricci, Dubravko Ćulibrk

arxiv logopreprintJun 10 2025
Mammography is the gold standard for the detection and diagnosis of breast cancer. This procedure can be significantly enhanced with Artificial Intelligence (AI)-based software, which assists radiologists in identifying abnormalities. However, training AI systems requires large and diverse datasets, which are often difficult to obtain due to privacy and ethical constraints. To address this issue, the paper introduces MAMmography ensemBle mOdel (MAMBO), a novel patch-based diffusion approach designed to generate full-resolution mammograms. Diffusion models have shown breakthrough results in realistic image generation, yet few studies have focused on mammograms, and none have successfully generated high-resolution outputs required to capture fine-grained features of small lesions. To achieve this, MAMBO integrates separate diffusion models to capture both local and global (image-level) contexts. The contextual information is then fed into the final patch-based model, significantly aiding the noise removal process. This thoughtful design enables MAMBO to generate highly realistic mammograms of up to 3840x3840 pixels. Importantly, this approach can be used to enhance the training of classification models and extended to anomaly detection. Experiments, both numerical and radiologist validation, assess MAMBO's capabilities in image generation, super-resolution, and anomaly detection, highlighting its potential to enhance mammography analysis for more accurate diagnoses and earlier lesion detection.

Comparative analysis of semantic-segmentation models for screen film mammograms.

Rani J, Singh J, Virmani J

pubmed logopapersJun 5 2025
Accurate segmentation of mammographic mass is very important as shape characteristics of these masses play a significant role for radiologist to diagnose benign and malignant cases. Recently, various deep learning segmentation algorithms have become popular for segmentation tasks. In the present work, rigorous performance analysis of ten semantic-segmentation models has been performed with 518 images taken from DDSM dataset (digital database for screening mammography) with 208 mass images ϵ BIRAD3, 150 mass images ϵ BIRAD4 and 160 mass images ϵ BIRAD5 classes, respectively. These models are (1) simple convolution series models namely- VGG16/VGG19, (2) simple convolution DAG (directed acyclic graph) models namely- U-Net (3) dilated convolution DAG models namely ResNet18/ResNet50/ShuffleNet/XceptionNet/InceptionV2/MobileNetV2 and (4) hybrid model, i.e. hybrid U-Net. On the basis of exhaustive experimentation, it was observed that dilated convolution DAG models namely- ResNet50, ShuffleNet and MobileNetV2 outperform other network models yielding cumulative JI and F1 score values of 0.87 and 0.92, 0.85 and 91, 0.84 and 0.90, respectively. The segmented images obtained by best performing models were subjectively analyzed by participating radiologist in terms of (a) size (b) margins and (c) shape characteristics. From objective and subjective analysis it was concluded that ResNet50 is the optimal model for segmentation of difficult to delineate breast masses with dense background and masses where both masses and micro-calcifications are simultaneously present. The result of the study indicates that ResNet50 model can be used in routine clinical environment for segmentation of mammographic masses.

A Novel Deep Learning Framework for Nipple Segmentation in Digital Mammography.

Rogozinski M, Hurtado J, Sierra-Franco CA, R Hall Barbosa C, Raposo A

pubmed logopapersJun 3 2025
This study introduces a novel methodology to enhance nipple segmentation in digital mammography, a critical component for accurate medical analysis and computer-aided detection systems. The nipple is a key anatomical landmark for multi-view and multi-modality breast image registration, where accurate localization is vital for ensuring image quality and enabling precise registration of anomalies across different mammographic views. The proposed approach significantly outperforms baseline methods, particularly in challenging cases where previous techniques failed. It achieved successful detection across all cases and reached a mean Intersection over Union (mIoU) of 0.63 in instances where the baseline failed entirely. Additionally, it yielded nearly a tenfold improvement in Hausdorff distance and consistent gains in overlap-based metrics, with the mIoU increasing from 0.7408 to 0.8011 in the craniocaudal (CC) view and from 0.7488 to 0.7767 in the mediolateral oblique (MLO) view. Furthermore, its generalizability suggests the potential for application to other breast imaging modalities and related domains facing challenges such as class imbalance and high variability in object characteristics.

Validation of a Dynamic Risk Prediction Model Incorporating Prior Mammograms in a Diverse Population.

Jiang S, Bennett DL, Colditz GA

pubmed logopapersJun 2 2025
For breast cancer risk prediction to be clinically useful, it must be accurate and applicable to diverse groups of women across multiple settings. To examine whether a dynamic risk prediction model incorporating prior mammograms, previously validated in Black and White women, could predict future risk of breast cancer across a racially and ethnically diverse population in a population-based screening program. This prognostic study included women aged 40 to 74 years with 1 or more screening mammograms drawn from the British Columbia Breast Screening Program from January 1, 2013, to December 31, 2019, with follow-up via linkage to the British Columbia Cancer Registry through June 2023. This provincial, organized screening program offers screening mammography with full field digital mammography (FFDM) every 2 years. Data were analyzed from May to August 2024. FFDM-based, artificial intelligence-generated mammogram risk score (MRS), including up to 4 years of prior mammograms. The primary outcomes were 5-year risk of breast cancer (measured with the area under the receiver operating characteristic curve [AUROC]) and absolute risk of breast cancer calibrated to the US Surveillance, Epidemiology, and End Results incidence rates. Among 206 929 women (mean [SD] age, 56.1 [9.7] years; of 118 093 with data on race, there were 34 266 East Asian; 1946 Indigenous; 6116 South Asian; and 66 742 White women), there were 4168 pathology-confirmed incident breast cancers diagnosed through June 2023. Mean (SD) follow-up time was 5.3 (3.0) years. Using up to 4 years of prior mammogram images in addition to the most current mammogram, a 5-year AUROC of 0.78 (95% CI, 0.77-0.80) was obtained based on analysis of images alone. Performance was consistent across subgroups defined by race and ethnicity in East Asian (AUROC, 0.77; 95% CI, 0.75-0.79), Indigenous (AUROC, 0.77; 95% CI 0.71-0.83), and South Asian (AUROC, 0.75; 95% CI 0.71-0.79) women. Stratification by age gave a 5-year AUROC of 0.76 (95% CI, 0.74-0.78) for women aged 50 years or younger and 0.80 (95% CI, 0.78-0.82) for women older than 50 years. There were 18 839 participants (9.0%) with a 5-year risk greater than 3%, and the positive predictive value was 4.9% with an incidence of 11.8 per 1000 person-years. A dynamic MRS generated from both current and prior mammograms showed robust performance across diverse racial and ethnic populations in a province-wide screening program starting from age 40 years, reflecting improved accuracy for racially and ethnically diverse populations.

Utilizing Pseudo Color Image to Improve the Performance of Deep Transfer Learning-Based Computer-Aided Diagnosis Schemes in Breast Mass Classification.

Jones MA, Zhang K, Faiz R, Islam W, Jo J, Zheng B, Qiu Y

pubmed logopapersJun 1 2025
The purpose of this study is to investigate the impact of using morphological information in classifying suspicious breast lesions. The widespread use of deep transfer learning can significantly improve the performance of the mammogram based CADx schemes. However, digital mammograms are grayscale images, while deep learning models are typically optimized using the natural images containing three channels. Thus, it is needed to convert the grayscale mammograms into three channel images for the input of deep transfer models. This study aims to develop a novel pseudo color image generation method which utilizes the mass contour information to enhance the classification performance. Accordingly, a total of 830 breast cancer cases were retrospectively collected, which contains 310 benign and 520 malignant cases, respectively. For each case, a total of four regions of interest (ROI) are collected from the grayscale images captured for both the CC and MLO views of the two breasts. Meanwhile, a total of seven pseudo color image sets are generated as the input of the deep learning models, which are created through a combination of the original grayscale image, a histogram equalized image, a bilaterally filtered image, and a segmented mass. Accordingly, the output features from four identical pre-trained deep learning models are concatenated and then processed by a support vector machine-based classifier to generate the final benign/malignant labels. The performance of each image set was evaluated and compared. The results demonstrate that the pseudo color sets containing the manually segmented mass performed significantly better than all other pseudo color sets, which achieved an AUC (area under the ROC curve) up to 0.889 ± 0.012 and an overall accuracy up to 0.816 ± 0.020, respectively. At the same time, the performance improvement is also dependent on the accuracy of the mass segmentation. The results of this study support our hypothesis that adding accurately segmented mass contours can provide complementary information, thereby enhancing the performance of the deep transfer model in classifying suspicious breast lesions.

AI image analysis as the basis for risk-stratified screening.

Strand F

pubmed logopapersJun 1 2025
Artificial intelligence (AI) has emerged as a transformative tool in breast cancer screening, with two distinct applications: computer-aided cancer detection (CAD) and risk prediction. While AI CAD systems are slowly finding its way into clinical practice to assist radiologists or make independent reads, this review focuses on AI risk models, which aim to predict a patient's likelihood of being diagnosed with breast cancer within a few years after negative screening. Unlike AI CAD systems, AI risk models are mainly explored in research settings without widespread clinical adoption. This review synthesizes advances in AI-driven risk prediction models, from traditional imaging biomarkers to cutting-edge deep learning methodologies and multimodal approaches. Contributions by leading researchers are explored with critical appraisal of their methods and findings. Ethical, practical, and clinical challenges in implementing AI models are also discussed, with an emphasis on real-world applications. This review concludes by proposing future directions to optimize the adoption of AI tools in breast cancer screening and improve equity and outcomes for diverse populations.

Enhancing detection of previously missed non-palpable breast carcinomas through artificial intelligence.

Mansour S, Kamal R, Hussein SA, Emara M, Kassab Y, Taha SN, Gomaa MMM

pubmed logopapersJun 1 2025
To investigate the impact of artificial intelligence (AI) reading digital mammograms in increasing the chance of detecting missed breast cancer, by studying the AI- flagged early morphology indictors, overlooked by the radiologist, and correlating them with the missed cancer pathology types. Mammograms done in 2020-2023, presenting breast carcinomas (n = 1998), were analyzed in concordance with the prior one year's result (2019-2022) assumed negative or benign. Present mammograms reviewed for the descriptors: asymmetry, distortion, mass, and microcalcifications. The AI presented abnormalities by overlaying color hue and scoring percentage for the degree of suspicion of malignancy. Prior mammogram with AI marking compromised 54 % (n = 555), and in the present mammograms, AI targeted 904 (88 %) carcinomas. The descriptor proportion of "asymmetry" was the common presentation of missed breast carcinoma (64.1 %) in the prior mammograms and the highest detection rate for AI was presented by "distortion" (100 %) followed by "grouped microcalcifications" (80 %). AI performance to predict malignancy in previously assigned negative or benign mammograms showed sensitivity of 73.4 %, specificity of 89 %, and accuracy of 78.4 %. Reading mammograms with AI significantly enhances the detection of early cancerous changes, particularly in dense breast tissues. The AI's detection rate does not correlate with specific pathological types of breast cancer, highlighting its broad utility. Subtle mammographic changes in postmenopausal women, not corroborated by ultrasound but marked by AI, warrant further evaluation by advanced applications of digital mammograms and close interval AI-reading mammogram follow up to minimize the potential for missed breast carcinoma.

Review and reflections on live AI mammographic screen reading in a large UK NHS breast screening unit.

Puri S, Bagnall M, Erdelyi G

pubmed logopapersJun 1 2025
The Radiology team from a large Breast Screening Unit in the UK with a screening population of over 135,000 took part in a service evaluation project using artificial intelligence (AI) for reading breast screening mammograms. To evaluate the clinical benefit AI may provide when implemented as a silent reader in a double reading breast screening programme and to evaluate feasibility and the operational impact of deploying AI into the breast screening programme. The service was one of 14 breast screening sites in the UK to take part in this project and we present our local experience with AI in breast screening. A commercially available AI platform was deployed and worked in real time as a 'silent third reader' so as not to impact standard workflows and patient care. All cases flagged by AI but not recalled by standard double reading (positive discordant cases) were reviewed along with all cases recalled by human readers but not flagged by AI (negative discordant cases). 9,547 cases were included in the evaluation. 1,135 positive discordant cases were reviewed, and one woman was recalled from the reviews who was not found to have cancer on further assessment in the breast assessment clinic. 139 negative discordant cases were reviewed, and eight cancer cases (8.79% of total cancers detected in this period) recalled by human readers were not detected by AI. No additional cancers were detected by AI during the study. Performance of AI was inferior to human readers in our unit. Having missed a significant number of cancers makes it unreliable and not safe to be used in clinical practice. AI is not currently of sufficient accuracy to be considered in the NHS Breast Screening Programme.

Enhancing radiomics features via a large language model for classifying benign and malignant breast tumors in mammography.

Ra S, Kim J, Na I, Ko ES, Park H

pubmed logopapersJun 1 2025
Radiomics is widely used to assist in clinical decision-making, disease diagnosis, and treatment planning for various target organs, including the breast. Recent advances in large language models (LLMs) have helped enhance radiomics analysis. Herein, we sought to improve radiomics analysis by incorporating LLM-learned clinical knowledge, to classify benign and malignant tumors in breast mammography. We extracted radiomics features from the mammograms based on the region of interest and retained the features related to the target task. Using prompt engineering, we devised an input sequence that reflected the selected features and the target task. The input sequence was fed to the chosen LLM (LLaMA variant), which was fine-tuned using low-rank adaptation to enhance radiomics features. This was then evaluated on two mammogram datasets (VinDr-Mammo and INbreast) against conventional baselines. The enhanced radiomics-based method performed better than baselines using conventional radiomics features tested on two mammogram datasets, achieving accuracies of 0.671 for the VinDr-Mammo dataset and 0.839 for the INbreast dataset. Conventional radiomics models require retraining from scratch for an unseen dataset using a new set of features. In contrast, the model developed in this study effectively reused the common features between the training and unseen datasets by explicitly linking feature names with feature values, leading to extensible learning across datasets. Our method performed better than the baseline method in this retraining setting using an unseen dataset. Our method, one of the first to incorporate LLM into radiomics, has the potential to improve radiomics analysis.
Page 5 of 876 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.