Sort by:
Page 1 of 28277 results
Next

Analysis of intra- and inter-observer variability in 4D liver ultrasound landmark labeling.

Wulff D, Ernst F

pubmed logopapersSep 1 2025
Four-dimensional (4D) ultrasound imaging is widely used in clinics for diagnostics and therapy guidance. Accurate target tracking in 4D ultrasound is crucial for autonomous therapy guidance systems, such as radiotherapy, where precise tumor localization ensures effective treatment. Supervised deep learning approaches rely on reliable ground truth, making accurate labels essential. We investigate the reliability of expert-labeled ground truth data by evaluating intra- and inter-observer variability in landmark labeling for 4D ultrasound imaging in the liver. Eight 4D liver ultrasound sequences were labeled by eight expert observers, each labeling eight landmarks three times. Intra- and inter-observer variability was quantified, and observer survey and motion analysis were conducted to determine factors influencing labeling accuracy, such as ultrasound artifacts and motion amplitude. The mean intra-observer variability ranged from <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>1.58</mn> <mtext>  </mtext> <mi>mm</mi> <mo>±</mo> <mn>0.90</mn> <mtext>  </mtext> <mi>mm</mi></mrow> </math> to <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>2.05</mn> <mtext>  </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.22</mn> <mtext>  </mtext> <mi>mm</mi></mrow> </math> depending on the observer. The inter-observer variability for the two observer groups was <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>2.68</mn> <mtext>  </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.69</mn> <mtext>  </mtext> <mi>mm</mi></mrow> </math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>3.06</mn> <mtext>  </mtext> <mi>mm</mi> <mo>±</mo> <mn>1.74</mn> <mtext>  </mtext> <mi>mm</mi></mrow> </math> . The observer survey and motion analysis revealed that ultrasound artifacts significantly affected labeling accuracy due to limited landmark visibility, whereas motion amplitude had no measurable effect. Our measured mean landmark motion was <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>11.56</mn> <mtext>  </mtext> <mi>mm</mi> <mo>±</mo> <mn>5.86</mn> <mtext>  </mtext> <mi>mm</mi></mrow> </math> . We highlight variability in expert-labeled ground truth data for 4D ultrasound imaging and identify ultrasound artifacts as a major source of labeling inaccuracies. These findings underscore the importance of addressing observer variability and artifact-related challenges to improve the reliability of ground truth data for evaluating target tracking algorithms in 4D ultrasound applications.

Enhancing Diagnostic Accuracy of Fresh Vertebral Compression Fractures With Deep Learning Models.

Li KY, Ye HB, Zhang YL, Huang JW, Li HL, Tian NF

pubmed logopapersAug 15 2025
Retrospective study. The study aimed to develop and authenticated a deep learning model based on X-ray images to accurately diagnose fresh thoracolumbar vertebral compression fractures. In clinical practice, diagnosing fresh vertebral compression fractures often requires MRI. However, due to the scarcity of MRI resources and the high time and economic costs involved, some patients may not receive timely diagnosis and treatment. Using a deep learning model combined with X-rays for diagnostic assistance could potentially serve as an alternative to MRI. In this study, the main collection included X-ray images suspected of thoracolumbar vertebral compression fractures from the municipal shared database between December 2012 and February 2024. Deep learning models were constructed using frameworks of EfficientNet, MobileNet, and MnasNet, respectively. We conducted a preliminary evaluation of the deep learning model using the validation set. The diagnostic performance of the models was evaluated using metrics such as AUC value, accuracy, sensitivity, specificity, F1 score, precision, and ROC curve. Finally, the deep learning models were compared with evaluations from two spine surgeons of different experience levels on the control set. This study included a total of 3025 lateral X-ray images from 2224 patients. The data set was divided into a training set of 2388 cases, a validation set of 482 cases, and a control set of 155 cases. In the validation set, the three groups of DL models had accuracies of 83.0%, 82.4%, and 82.2%, respectively. The AUC values were 0.861, 0.852, and 0.865, respectively. In the control set, the accuracies of the three groups of DL models were 78.1%, 78.1%, and 80.7%, respectively, all higher than spinal surgeons and significantly higher than junior spine surgeon. This study developed deep learning models for detecting fresh vertebral compression fractures, demonstrating high accuracy.

Performance Evaluation of Deep Learning for the Detection and Segmentation of Thyroid Nodules: Systematic Review and Meta-Analysis.

Ni J, You Y, Wu X, Chen X, Wang J, Li Y

pubmed logopapersAug 14 2025
Thyroid cancer is one of the most common endocrine malignancies. Its incidence has steadily increased in recent years. Distinguishing between benign and malignant thyroid nodules (TNs) is challenging due to their overlapping imaging features. The rapid advancement of artificial intelligence (AI) in medical image analysis, particularly deep learning (DL) algorithms, has provided novel solutions for automated TN detection. However, existing studies exhibit substantial heterogeneity in diagnostic performance. Furthermore, no systematic evidence-based research comprehensively assesses the diagnostic performance of DL models in this field. This study aimed to execute a systematic review and meta-analysis to appraise the performance of DL algorithms in diagnosing TN malignancy, identify key factors influencing their diagnostic efficacy, and compare their accuracy with that of clinicians in image-based diagnosis. We systematically searched multiple databases, including PubMed, Cochrane, Embase, Web of Science, and IEEE, and identified 41 eligible studies for systematic review and meta-analysis. Based on the task type, studies were categorized into segmentation (n=14) and detection (n=27) tasks. The pooled sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC) were calculated for each group. Subgroup analyses were performed to examine the impact of transfer learning and compare model performance against clinicians. For segmentation tasks, the pooled sensitivity, specificity, and AUC were 82% (95% CI 79%-84%), 95% (95% CI 92%-96%), and 0.91 (95% CI 0.89-0.94), respectively. For detection tasks, the pooled sensitivity, specificity, and AUC were 91% (95% CI 89%-93%), 89% (95% CI 86%-91%), and 0.96 (95% CI 0.93-0.97), respectively. Some studies demonstrated that DL models could achieve diagnostic performance comparable with, or even exceeding, that of clinicians in certain scenarios. The application of transfer learning contributed to improved model performance. DL algorithms exhibit promising diagnostic accuracy in TN imaging, highlighting their potential as auxiliary diagnostic tools. However, current studies are limited by suboptimal methodological design, inconsistent image quality across datasets, and insufficient external validation, which may introduce bias. Future research should enhance methodological standardization, improve model interpretability, and promote transparent reporting to facilitate the sustainable clinical translation of DL-based solutions.

Development and validation of deep learning model for detection of obstructive coronary artery disease in patients with acute chest pain: a multi-center study.

Kim JY, Park J, Lee KH, Lee JW, Park J, Kim PK, Han K, Baek SE, Im DJ, Choi BW, Hur J

pubmed logopapersAug 14 2025
This study aimed to develop and validate a deep learning (DL) model to detect obstructive coronary artery disease (CAD, ≥ 50% stenosis) in coronary CT angiography (CCTA) among patients presenting to the emergency department (ED) with acute chest pain. The training dataset included 378 patients with acute chest pain who underwent CCTA (10,060 curved multiplanar reconstruction [MPR] images) from a single-center ED between January 2015 and December 2022. The external validation dataset included 298 patients from 3 ED centers between January 2021 and December 2022. A DL model based on You Only Look Once v4, requires manual preprocessing for curved MPR extraction and was developed using 15 manually preprocessed MPR images per major coronary artery. Model performance was evaluated per artery and per patient. The training dataset included 378 patients (mean age 61.3 ± 12.2 years, 58.2% men); the external dataset included 298 patients (mean age 58.3 ± 13.8 years, 54.6% men). Obstructive CAD prevalence in the external dataset was 27.5% (82/298). The DL model achieved per-artery sensitivity, specificity, positive predictive value, negative predictive value (NPV), and area under the curve (AUC) of 92.7%, 89.9%, 62.6%, 98.5%, and 0.919, respectively; and per-patient values of 93.3%, 80.7%, 67.7%, 96.6%, and 0.871, respectively. The DL model demonstrated high sensitivity and NPV for identifying obstructive CAD in patients with acute chest pain undergoing CCTA, indicating its potential utility in aiding ED physicians in CAD detection.

MammosighTR: Nationwide Breast Cancer Screening Mammogram Dataset with BI-RADS Annotations for Artificial Intelligence Applications.

Koç U, Beşler MS, Sezer EA, Karakaş E, Özkaya YA, Evrimler Ş, Yalçın A, Kızıloğlu A, Kesimal U, Oruç M, Çankaya İ, Koç Keleş D, Merd N, Özkan E, Çevik Nİ, Gökhan MB, Boyraz Hayat B, Özer M, Tokur O, Işık F, Tezcan A, Battal F, Yüzkat M, Sebik NB, Karademir F, Topuz Y, Sezer Ö, Varlı S, Ülgü MM, Akdoğan E, Birinci Ş

pubmed logopapersAug 13 2025
<i>"Just Accepted" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content</i>. The MammosighTR dataset, derived from Türkiye's national breast cancer screening mammography program, provides BI-RADS-labeled mammograms with detailed annotations on breast composition and lesion quadrant location, which may be useful for developing and testing AI models in breast cancer detection. ©RSNA, 2025.

PPEA: Personalized positioning and exposure assistant based on multi-task shared pose estimation transformer.

Zhao J, Liu J, Yang C, Tang H, Chen Y, Zhang Y

pubmed logopapersAug 13 2025
Hand and foot digital radiography (DR) is an indispensable tool in medical imaging, with varying diagnostic requirements necessitating different hand and foot positionings. Accurate positioning is crucial for obtaining diagnostically valuable images. Furthermore, adjusting exposure parameters such as exposure area based on patient conditions helps minimize the likelihood of image retakes. We propose a personalized positioning and exposure assistant capable of automatically recognizing hand and foot positionings and recommending appropriate exposure parameters to achieve these objectives. The assistant comprises three modules: (1) Progressive Iterative Hand-Foot Tracker (PIHFT) to iteratively locate hands or feet in RGB images, providing the foundation for accurate pose estimation; (2) Multi-Task Shared Pose Estimation Transformer (MTSPET), a Transformer-based model that encompasses hand and foot estimation branches with similar network architectures, sharing a common backbone. MTSPET outperformed MediaPipe in the hand pose estimation task and successfully transferred this capability to the foot pose estimation task; (3) Domain Expertise-embedded Positioning and Exposure Assistant (DEPEA), which combines the key-point coordinates of hands and feet with specific positioning and exposure parameter requirements, capable of checking patient positioning and inferring exposure areas and Regions of Interest (ROIs) of Digital Automatic Exposure Control (DAEC). Additionally, two datasets were collected and used to train MTSPET. A preliminary clinical trial showed strong agreement between PPEA's outputs and manual annotations, indicating the system's effectiveness in typical clinical scenarios. The contributions of this study lay the foundation for personalized, patient-specific imaging strategies, ultimately enhancing diagnostic outcomes and minimizing the risk of errors in clinical settings.

Automatic detection of arterial input function for brain DCE-MRI in multi-site cohorts.

Saca L, Gaggar R, Pappas I, Benzinger T, Reiman EM, Shiroishi MS, Joe EB, Ringman JM, Yassine HN, Schneider LS, Chui HC, Nation DA, Zlokovic BV, Toga AW, Chakhoyan A, Barnes S

pubmed logopapersAug 13 2025
Arterial input function (AIF) extraction is a crucial step in quantitative pharmacokinetic modeling of DCE-MRI. This work proposes a robust deep learning model that can precisely extract an AIF from DCE-MRI images. A diverse dataset of human brain DCE-MRI images from 289 participants, totaling 384 scans, from five different institutions with extracted gadolinium-based contrast agent curves from large penetrating arteries, and with most data collected for blood-brain barrier (BBB) permeability measurement, was retrospectively analyzed. A 3D UNet model was implemented and trained on manually drawn AIF regions. The testing cohort was compared using proposed AIF quality metric AIFitness and K<sup>trans</sup> values from a standard DCE pipeline. This UNet was then applied to a separate dataset of 326 participants with a total of 421 DCE-MRI images with analyzed AIF quality and K<sup>trans</sup> values. The resulting 3D UNet model achieved an average AIFitness score of 93.9 compared to 99.7 for manually selected AIFs, and white matter K<sup>trans</sup> values were 0.45/min × 10<sup>-3</sup> and 0.45/min × 10<sup>-3</sup>, respectively. The intraclass correlation between automated and manual K<sup>trans</sup> values was 0.89. The separate replication dataset yielded an AIFitness score of 97.0 and white matter K<sup>trans</sup> of 0.44/min × 10<sup>-3</sup>. Findings suggest a 3D UNet model with additional convolutional neural network kernels and a modified Huber loss function achieves superior performance for identifying AIF curves from DCE-MRI in a diverse multi-center cohort. AIFitness scores and DCE-MRI-derived metrics, such as K<sup>trans</sup> maps, showed no significant differences in gray and white matter between manually drawn and automated AIFs.

A Deep Learning-Based Automatic Recognition Model for Polycystic Ovary Ultrasound Images.

Zhao B, Wen L, Huang Y, Fu Y, Zhou S, Liu J, Liu M, Li Y

pubmed logopapersAug 11 2025
Polycystic ovary syndrome (PCOS) has a significant impact on endocrine metabolism, reproductive function, and mental health in women of reproductive age. Ultrasound remains an essential diagnostic tool for PCOS, particularly in individuals presenting with oligomenorrhea or ovulatory dysfunction accompanied by polycystic ovaries, as well as hyperandrogenism associated with polycystic ovaries. However, the accuracy of ultrasound in identifying polycystic ovarian morphology remains variable. To develop a deep learning model capable of rapidly and accurately identifying PCOS using ovarian ultrasound images. Prospective diagnostic accuracy study. This prospective study included data from 1,751 women with suspected PCOS who presented at two affiliated hospitals at Central South University, with clinical and ultrasound information collected and archived. Patients from center 1 were randomly divided into a training set and an internal validation set in a 7:3 ratio, while patients from center 2 served as the external validation set. Using the YOLOv11 deep learning framework, an automated recognition model for ovarian ultrasound images in PCOS cases was constructed, and its diagnostic performance was evaluated. Ultrasound images from 933 patients (781 from center 1 and 152 from center 2) were analyzed. The mean average precision of the YOLOv11 model in detecting the target ovary was 95.7%, 97.6%, and 97.8% for the training, internal validation, and external validation sets, respectively. For diagnostic classification, the model achieved an F1 score of 95.0% in the training set and 96.9% in both validation sets. The area under the curve values were 0.953, 0.973, and 0.967 for the training, internal validation, and external validation sets respectively. The model also demonstrated significantly faster evaluation of a single ovary compared to clinicians (doctor, 5.0 seconds; model, 0.1 seconds; <i>p</i> < 0.01). The YOLOv11-based automatic recognition model for PCOS ovarian ultrasound images exhibits strong target detection and diagnostic performance. This approach can streamline the follicle counting process in conventional ultrasound and enhance the efficiency and generalizability of ultrasound-based PCOS assessment.

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT.

Rohren E, Ahmadzade M, Colella S, Kottler N, Krishnan S, Poff J, Rastogi N, Wiggins W, Yee J, Zuluaga C, Ramis P, Ghasemi-Rad M

pubmed logopapersAug 11 2025
To evaluate the post-deployment performance of an artificial intelligence (AI) system (Aidoc) for intracranial hemorrhage (ICH) detection and assess the utility of ChatGPT-4 Turbo for automated AI monitoring. This retrospective study evaluated 332,809 head CT examinations from 37 radiology practices across the United States (December 2023-May 2024). Of these, 13,569 cases were flagged as positive for ICH by the Aidoc AI system. A HIPAA (Health Insurance Portability and Accountability Act) -compliant version of ChatGPT-4 Turbo was used to extract data from radiology reports. Ground truth was established through radiologists' review of 200 randomly selected cases. Performance metrics were calculated for ChatGPT, Aidoc and radiologists. ChatGPT-4 Turbo demonstrated high diagnostic accuracy in identifying intracranial hemorrhage (ICH) from radiology reports, with a positive predictive value of 1 and a negative predictive value of 0.988 (AUC:0.996). Aidoc's false positive classifications were influenced by scanner manufacturer, midline shift, mass effect, artifacts, and neurologic symptoms. Multivariate analysis identified Philips scanners (OR: 6.97, p=0.003) and artifacts (OR: 3.79, p=0.029) as significant contributors to false positives, while midline shift (OR: 0.08, p=0.021) and mass effect (OR: 0.18, p=0.021) were associated with a reduced false positive rate. Aidoc-assisted radiologists achieved a sensitivity of 0.936 and a specificity of 1. This study underscores the importance of continuous performance monitoring for AI systems in clinical practice. The integration of LLMs offers a scalable solution for evaluating AI performance, ensuring reliable deployment and enhancing diagnostic workflows.

Spinal-QDCNN: advanced feature extraction for brain tumor detection using MRI images.

T L, J JJ, Rani VV, Saini ML

pubmed logopapersAug 9 2025
Brain tumor occurs due to the abnormal development of cells in the brain. It has adversely affected human health, and early diagnosis is required to improve the survival rate of the patient. Hence, various brain tumor detection models have been developed to detect brain tumors. However, the existing methods often suffer from limited accuracy and inefficient learning architecture. The traditional approaches cannot effectively detect the small and subtle changes in the brain cells. To overcome these limitations, a SpinalNet-Quantum Dilated Convolutional Neural Network (Spinal-QDCNN) model is proposed for detecting brain tumors using MRI images. The Spinal-QDCNN method is developed by the combination of QDCNN and SpinalNet for brain tumor detection using MRI. At first, the input brain image is pre-processed using RoI extraction. Then, image enhancement is done by using the thresholding transformation, which is followed by segmentation using Projective Adversarial Networks (PAN). Then, different processes, like random erasing, flipping, and resizing, are applied in the image augmentation phase. This is followed by feature extraction, where statistical features such as average contrast, kurtosis and skewness, and mean, Gabor wavelet features, Discrete Wavelet Transform (DWT) with Gradient Binary Pattern (GBP) are extracted, and finally detection is done using Spinal-QDCNN. Moreover, the proposed method attained a maximum accuracy of 86.356%, sensitivity of 87.37%, and specificity of 88.357%.
Page 1 of 28277 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.