Sort by:
Page 1 of 437 results
Next

Robust evaluation of tissue-specific radiomic features for classifying breast tissue density grades.

Dong V, Mankowski W, Silva Filho TM, McCarthy AM, Kontos D, Maidment ADA, Barufaldi B

pubmed logopapersNov 1 2025
Breast cancer risk depends on an accurate assessment of breast density due to lesion masking. Although governed by standardized guidelines, radiologist assessment of breast density is still highly variable. Automated breast density assessment tools leverage deep learning but are limited by model robustness and interpretability. We assessed the robustness of a feature selection methodology (RFE-SHAP) for classifying breast density grades using tissue-specific radiomic features extracted from raw central projections of digital breast tomosynthesis screenings ( <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> <msub><mrow><mi>n</mi></mrow> <mrow><mi>I</mi></mrow> </msub> <mo>=</mo> <mn>651</mn></mrow> </math> , <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> <msub><mrow><mi>n</mi></mrow> <mrow><mi>II</mi></mrow> </msub> <mo>=</mo> <mn>100</mn></mrow> </math> ). RFE-SHAP leverages traditional and explainable AI methods to identify highly predictive and influential features. A simple logistic regression (LR) classifier was used to assess classification performance, and unsupervised clustering was employed to investigate the intrinsic separability of density grade classes. LR classifiers yielded cross-validated areas under the receiver operating characteristic (AUCs) per density grade of [ <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi></mrow> </math> : <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>0.909</mn> <mo>±</mo> <mn>0.032</mn></mrow> </math> , <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>B</mi></mrow> </math> : <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>0.858</mn> <mo>±</mo> <mn>0.027</mn></mrow> </math> , <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>C</mi></mrow> </math> : <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>0.927</mn> <mo>±</mo> <mn>0.013</mn></mrow> </math> , <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>D</mi></mrow> </math> : <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>0.890</mn> <mo>±</mo> <mn>0.089</mn></mrow> </math> ] and an AUC of <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>0.936</mn> <mo>±</mo> <mn>0.016</mn></mrow> </math> for classifying patients as nondense or dense. In external validation, we observed per density grade AUCs of [ <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi></mrow> </math> : 0.880, <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>B</mi></mrow> </math> : 0.779, <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>C</mi></mrow> </math> : 0.878, <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>D</mi></mrow> </math> : 0.673] and nondense/dense AUC of 0.823. Unsupervised clustering highlighted the ability of these features to characterize different density grades. Our RFE-SHAP feature selection methodology for classifying breast tissue density generalized well to validation datasets after accounting for natural class imbalance, and the identified radiomic features properly captured the progression of density grades. Our results potentiate future research into correlating selected radiomic features with clinical descriptors of breast tissue density.

Comparing percent breast density assessments of an AI-based method with expert reader estimates: inter-observer variability.

Romanov S, Howell S, Harkness E, Gareth Evans D, Astley S, Fergie M

pubmed logopapersNov 1 2025
Breast density estimation is an important part of breast cancer risk assessment, as mammographic density is associated with risk. However, density assessed by multiple experts can be subject to high inter-observer variability, so automated methods are increasingly used. We investigate the inter-reader variability and risk prediction for expert assessors and a deep learning approach. Screening data from a cohort of 1328 women, case-control matched, was used to compare between two expert readers and between a single reader and a deep learning model, Manchester artificial intelligence - visual analog scale (MAI-VAS). Bland-Altman analysis was used to assess the variability and matched concordance index to assess risk. Although the mean differences for the two experiments were alike, the limits of agreement between MAI-VAS and a single reader are substantially lower at +SD (standard deviation) 21 (95% CI: 19.65, 21.69) -SD 22 (95% CI: <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>-</mo> <mn>22.71</mn></mrow> </math> , <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>-</mo> <mn>20.68</mn></mrow> </math> ) than between two expert readers +SD 31 (95% CI: 32.08, 29.23) -SD 29 (95% CI: <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>-</mo> <mn>29.94</mn></mrow> </math> , <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>-</mo> <mn>27.09</mn></mrow> </math> ). In addition, breast cancer risk discrimination for the deep learning method and density readings from a single expert was similar, with a matched concordance of 0.628 (95% CI: 0.598, 0.658) and 0.624 (95% CI: 0.595, 0.654), respectively. The automatic method had a similar inter-view agreement to experts and maintained consistency across density quartiles. The artificial intelligence breast density assessment tool MAI-VAS has a better inter-observer agreement with a randomly selected expert reader than that between two expert readers. Deep learning-based density methods provide consistent density scores without compromising on breast cancer risk discrimination.

The evaluation of artificial intelligence in mammography-based breast cancer screening: Is breast-level analysis enough?

Taib AG, Partridge GJW, Yao L, Darker I, Chen Y

pubmed logopapersJun 25 2025
To assess whether the diagnostic performance of a commercial artificial intelligence (AI) algorithm for mammography differs between breast-level and lesion-level interpretations and to compare performance to a large population of specialised human readers. We retrospectively analysed 1200 mammograms from the NHS breast cancer screening programme using a commercial AI algorithm and assessments from 1258 trained human readers from the Personal Performance in Mammographic Screening (PERFORMS) external quality assurance programme. For breasts containing pathologically confirmed malignancies, a breast and lesion-level analysis was performed. The latter considered the locations of marked regions of interest for AI and humans. The highest score per lesion was recorded. For non-malignant breasts, a breast-level analysis recorded the highest score per breast. Area under the curve (AUC), sensitivity and specificity were calculated at the developer's recommended threshold for recall. The study was designed to detect a medium-sized effect (odds ratio 3.5 or 0.29) for sensitivity. The test set contained 882 non-malignant (73%) and 318 malignant breasts (27%), with 328 cancer lesions. The AI AUC was 0.942 at breast level and 0.929 at lesion level (difference -0.013, p < 0.01). The mean human AUC was 0.878 at breast level and 0.851 at lesion level (difference -0.027, p < 0.01). AI outperformed human readers at the breast and lesion level (ps < 0.01, respectively) according to the AUC. AI's diagnostic performance significantly decreased at the lesion level, indicating reduced accuracy in localising malignancies. However, its overall performance exceeded that of human readers. Question AI often recalls mammography cases not recalled by humans; to understand why, we as humans must consider the regions of interest it has marked as cancerous. Findings Evaluations of AI typically occur at the breast level, but performance decreases when AI is evaluated on a lesion level. This also occurs for humans. Clinical relevance To improve human-AI collaboration, AI should be assessed at the lesion level; poor accuracy here may lead to automation bias and unnecessary patient procedures.

Artificial intelligence-based tumor size measurement on mammography: agreement with pathology and comparison with human readers' assessments across multiple imaging modalities.

Kwon MR, Kim SH, Park GE, Mun HS, Kang BJ, Kim YT, Yoon I

pubmed logopapersJun 20 2025
To evaluate the agreement between artificial intelligence (AI)-based tumor size measurements of breast cancer and the final pathology and compare these results with those of other imaging modalities. This retrospective study included 925 women (mean age, 55.3 years ± 11.6) with 936 breast cancers, who underwent digital mammography, breast ultrasound, and magnetic resonance imaging before breast cancer surgery. AI-based tumor size measurement was performed on post-processed mammographic images, outlining areas with AI abnormality scores of 10, 50, and 90%. Absolute agreement between AI-based tumor sizes, image modalities, and histopathology was assessed using intraclass correlation coefficient (ICC) analysis. Concordant and discordant cases between AI measurements and histopathologic examinations were compared. Tumor size with an abnormality score of 50% showed the highest agreement with histopathologic examination (ICC = 0.54, 95% confidential interval [CI]: 0.49-0.59), showing comparable agreement with mammography (ICC = 0.54, 95% CI: 0.48-0.60, p = 0.40). For ductal carcinoma in situ and human epidermal growth factor receptor 2-positive cancers, AI revealed a higher agreement than that of mammography (ICC = 0.76, 95% CI: 0.67-0.84 and ICC = 0.73, 95% CI: 0.52-0.85). Overall, 52.0% (487/936) of cases were discordant, with these cases more commonly observed in younger patients with dense breasts, multifocal malignancies, lower abnormality scores, and different imaging characteristics. AI-based tumor size measurements with abnormality scores of 50% showed moderate agreement with histopathology but demonstrated size discordance in more than half of the cases. While comparable to mammography, its limitations emphasize the need for further refinement and research.

Detection of breast cancer using fractional discrete sinc transform based on empirical Fourier decomposition.

Azmy MM

pubmed logopapersJun 20 2025
Breast cancer is the most common cause of death among women worldwide. Early detection of breast cancer is important; for saving patients' lives. Ultrasound and mammography are the most common noninvasive methods for detecting breast cancer. Computer techniques are used to help physicians diagnose cancer. In most of the previous studies, the classification parameter rates were not high enough to achieve the correct diagnosis. In this study, new approaches were applied to detect breast cancer images from three databases. The programming software used to extract features from the images was MATLAB R2022a. Novel approaches were obtained using new fractional transforms. These fractional transforms were deduced from the fraction Fourier transform and novel discrete transforms. The novel discrete transforms were derived from discrete sine and cosine transforms. The steps of the approaches were described below. First, fractional transforms were applied to the breast images. Then, the empirical Fourier decomposition (EFD) was obtained. The mean, variance, kurtosis, and skewness were subsequently calculated. Finally, RNN-BILSTM (recurrent neural network-bidirectional-long short-term memory) was used as a classification phase. The proposed approaches were compared to obtain the highest accuracy rate during the classification phase based on different fractional transforms. The highest accuracy rate was obtained when the fractional discrete sinc transform of approach 4 was applied. The area under the receiver operating characteristic curve (AUC) was 1. The accuracy, sensitivity, specificity, precision, G-mean, and F-measure rates were 100%. If traditional machine learning methods, such as support vector machines (SVMs) and artificial neural networks (ANNs), were used, the classification parameter rates would be low. Therefore, the fourth approach used RNN-BILSTM to extract the features of breast images perfectly. This approach can be programed on a computer to help physicians correctly classify breast images.

Artificial Intelligence Language Models to Translate Professional Radiology Mammography Reports Into Plain Language - Impact on Interpretability and Perception by Patients.

Pisarcik D, Kissling M, Heimer J, Farkas M, Leo C, Kubik-Huch RA, Euler A

pubmed logopapersJun 19 2025
This study aimed to evaluate the interpretability and patient perception of AI-translated mammography and sonography reports, focusing on comprehensibility, follow-up recommendations, and conveyed empathy using a survey. In this observational study, three fictional mammography and sonography reports with BI-RADS categories 3, 4, and 5 were created. These reports were repeatedly translated to plain language by three different large language models (LLM: ChatGPT-4, ChatGPT-4o, Google Gemini). In a first step, the best of these repeatedly translated reports for each BI-RADS category and LLM was selected by two experts in breast imaging considering factual correctness, completeness, and quality. In a second step, female participants compared and rated the translated reports regarding comprehensibility, follow-up recommendations, conveyed empathy, and additional value of each report using a survey with Likert scales. Statistical analysis included cumulative link mixed models and the Plackett-Luce model for ranking preferences. 40 females participated in the survey. GPT-4 and GPT-4o were rated significantly higher than Gemini across all categories (P<.001). Participants >50 years of age rated the reports significantly higher as compared to participants of 18-29 years of age (P<.05). Higher education predicted lower ratings (P=.02). No prior mammography increased scores (P=.03), and AI-experience had no effect (P=.88). Ranking analysis showed GPT-4o as the most preferred (P=.48), followed by GPT-4 (P=.37), with Gemini ranked last (P=.15). Patient preference differed among AI-translated radiology reports. Compared to a traditional report using radiological language, AI-translated reports add value for patients, enhance comprehensibility and empathy and therefore hold the potential to improve patient communication in breast imaging.

Appropriateness of acute breast symptom recommendations provided by ChatGPT.

Byrd C, Kingsbury C, Niell B, Funaro K, Bhatt A, Weinfurtner RJ, Ataya D

pubmed logopapersJun 16 2025
We evaluated the accuracy of ChatGPT-3.5's responses to common questions regarding acute breast symptoms and explored whether using lay language, as opposed to medical language, affected the accuracy of the responses. Questions were formulated addressing acute breast conditions, informed by the American College of Radiology (ACR) Appropriateness Criteria (AC) and our clinical experience at a tertiary referral breast center. Of these, seven addressed the most common acute breast symptoms, nine addressed pregnancy-associated breast symptoms, and four addressed specific management and imaging recommendations for a palpable breast abnormality. Questions were submitted three times to ChatGPT-3.5 and all responses were assessed by five fellowship-trained breast radiologists. Evaluation criteria included clinical judgment and adherence to the ACR guidelines, with responses scored as: 1) "appropriate," 2) "inappropriate" if any response contained inappropriate information, or 3) "unreliable" if responses were inconsistent. A majority vote determined the appropriateness for each question. ChatGPT-3.5 generated responses were appropriate for 7/7 (100 %) questions regarding common acute breast symptoms when phrased both colloquially and using standard medical terminology. In contrast, ChatGPT-3.5 generated responses were appropriate for 3/9 (33 %) questions about pregnancy-associated breast symptoms and 3/4 (75 %) questions about management and imaging recommendations for a palpable breast abnormality. ChatGPT-3.5 can automate healthcare information related to appropriate management of acute breast symptoms when prompted with both standard medical terminology or lay phrasing of the questions. However, physician oversight remains critical given the presence of inappropriate recommendations for pregnancy associated breast symptoms and management of palpable abnormalities.

Comparative analysis of semantic-segmentation models for screen film mammograms.

Rani J, Singh J, Virmani J

pubmed logopapersJun 5 2025
Accurate segmentation of mammographic mass is very important as shape characteristics of these masses play a significant role for radiologist to diagnose benign and malignant cases. Recently, various deep learning segmentation algorithms have become popular for segmentation tasks. In the present work, rigorous performance analysis of ten semantic-segmentation models has been performed with 518 images taken from DDSM dataset (digital database for screening mammography) with 208 mass images ϵ BIRAD3, 150 mass images ϵ BIRAD4 and 160 mass images ϵ BIRAD5 classes, respectively. These models are (1) simple convolution series models namely- VGG16/VGG19, (2) simple convolution DAG (directed acyclic graph) models namely- U-Net (3) dilated convolution DAG models namely ResNet18/ResNet50/ShuffleNet/XceptionNet/InceptionV2/MobileNetV2 and (4) hybrid model, i.e. hybrid U-Net. On the basis of exhaustive experimentation, it was observed that dilated convolution DAG models namely- ResNet50, ShuffleNet and MobileNetV2 outperform other network models yielding cumulative JI and F1 score values of 0.87 and 0.92, 0.85 and 91, 0.84 and 0.90, respectively. The segmented images obtained by best performing models were subjectively analyzed by participating radiologist in terms of (a) size (b) margins and (c) shape characteristics. From objective and subjective analysis it was concluded that ResNet50 is the optimal model for segmentation of difficult to delineate breast masses with dense background and masses where both masses and micro-calcifications are simultaneously present. The result of the study indicates that ResNet50 model can be used in routine clinical environment for segmentation of mammographic masses.

A Novel Deep Learning Framework for Nipple Segmentation in Digital Mammography.

Rogozinski M, Hurtado J, Sierra-Franco CA, R Hall Barbosa C, Raposo A

pubmed logopapersJun 3 2025
This study introduces a novel methodology to enhance nipple segmentation in digital mammography, a critical component for accurate medical analysis and computer-aided detection systems. The nipple is a key anatomical landmark for multi-view and multi-modality breast image registration, where accurate localization is vital for ensuring image quality and enabling precise registration of anomalies across different mammographic views. The proposed approach significantly outperforms baseline methods, particularly in challenging cases where previous techniques failed. It achieved successful detection across all cases and reached a mean Intersection over Union (mIoU) of 0.63 in instances where the baseline failed entirely. Additionally, it yielded nearly a tenfold improvement in Hausdorff distance and consistent gains in overlap-based metrics, with the mIoU increasing from 0.7408 to 0.8011 in the craniocaudal (CC) view and from 0.7488 to 0.7767 in the mediolateral oblique (MLO) view. Furthermore, its generalizability suggests the potential for application to other breast imaging modalities and related domains facing challenges such as class imbalance and high variability in object characteristics.
Page 1 of 437 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.