Latest Papers on Radiology AI. Tags: Detection

Breast Arterial Calcifications on Mammography: A Review of the Literature.

Rossi J, Cho L, Newell MS, Venta LA, Montgomery GH, Destounis SV, Moy L, Brem RF, Parghi C, Margolies LR

•papers•May 17 2025

Identifying systemic disease with medical imaging studies may improve population health outcomes. Although the pathogenesis of peripheral arterial calcification and coronary artery calcification differ, breast arterial calcification (BAC) on mammography is associated with cardiovascular disease (CVD), a leading cause of death in women. While professional society guidelines on the reporting or management of BAC have not yet been established, and assessment and quantification methods are not yet standardized, the value of reporting BAC is being considered internationally as a possible indicator of subclinical CVD. Furthermore, artificial intelligence (AI) models are being developed to identify and quantify BAC on mammography, as well as to predict the risk of CVD. This review outlines studies evaluating the association of BAC and CVD, introduces the role of preventative cardiology in clinical management, discusses reasons to consider reporting BAC, acknowledges current knowledge gaps and barriers to assessing and reporting calcifications, and provides examples of how AI can be utilized to measure BAC and contribute to cardiovascular risk assessment. Ultimately, reporting BAC on mammography might facilitate earlier mitigation of cardiovascular risk factors in asymptomatic women.

Mammography Detection Breast Review Concept Academic Lab

MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

Jingkun Yue, Siqi Zhang, Zinan Jia, Huihuan Xu, Zongbo Han, Xiaohong Liu, Guangyu Wang

•preprint•May 17 2025

Visual grounding is essential for precise perception and reasoning in multimodal large language models (MLLMs), especially in medical imaging domains. While existing medical visual grounding benchmarks primarily focus on single-image scenarios, real-world clinical applications often involve sequential images, where accurate lesion localization across different modalities and temporal tracking of disease progression (e.g., pre- vs. post-treatment comparison) require fine-grained cross-image semantic alignment and context-aware reasoning. To remedy the underrepresentation of image sequences in existing medical visual grounding benchmarks, we propose MedSG-Bench, the first benchmark tailored for Medical Image Sequences Grounding. It comprises eight VQA-style tasks, formulated into two paradigms of the grounding tasks, including 1) Image Difference Grounding, which focuses on detecting change regions across images, and 2) Image Consistency Grounding, which emphasizes detection of consistent or shared semantics across sequential images. MedSG-Bench covers 76 public datasets, 10 medical imaging modalities, and a wide spectrum of anatomical structures and diseases, totaling 9,630 question-answer pairs. We benchmark both general-purpose MLLMs (e.g., Qwen2.5-VL) and medical-domain specialized MLLMs (e.g., HuatuoGPT-vision), observing that even the advanced models exhibit substantial limitations in medical sequential grounding tasks. To advance this field, we construct MedSG-188K, a large-scale instruction-tuning dataset tailored for sequential visual grounding, and further develop MedSeq-Grounder, an MLLM designed to facilitate future research on fine-grained understanding across medical sequential images. The benchmark, dataset, and model are available at https://huggingface.co/MedSG-Bench

Mixed Modality Detection Whole Body Dataset Release In Silico Academic Lab Open Dataset Open Code GenAI

Comparative analysis of deep learning methods for breast ultrasound lesion detection and classification.

Vallez N, Mateos-Aparicio-Ruiz I, Rienda MA, Deniz O, Bueno G

•papers•May 16 2025

Breast ultrasound (BUS) computer-aided diagnosis (CAD) systems aims to perform two major steps: detecting lesions and classifying them as benign or malignant. However, the impact of combining both steps has not been previously addressed. Moreover, the specific method employed can influence the final outcome of the system. In this work, a comparison of the effects of using object detection, semantic segmentation and instance segmentation to detect lesions in BUS images was conducted. To this end, four approaches were examined: a) multi-class object detection, b) one-class object detection followed by localized region classification, c) multi-class segmentation, and d) one-class segmentation followed by segmented region classification. Additionally, a novel dataset for BUS segmentation, called BUS-UCLM, has been gathered, annotated and shared publicly. The evaluation of the methods proposed was carried out with this new dataset and four publicly available datasets: BUSI, OASBUD, RODTOOK and UDIAT. Among the four approaches compared, multi-class detection and multi-class segmentation achieved the best results when instance segmentation CNNs are used. The best results in detection were obtained with a multi-class Mask R-CNN with a COCO AP50 metric of 72.9%. In the multi-class segmentation scenario, Poolformer achieved the best results with a Dice score of 77.7%. The analysis of detection and segmentation models in BUS highlights several key challenges, emphasizing the complexity of accurately identifying and segmenting lesions. Among the methods evaluated, instance segmentation has proven to be the most effective for BUS images, offering superior performance in delineating individual lesions.

Ultrasound Detection Breast Methodology In Silico Academic Lab Open Dataset

A deep learning-based approach to automated rib fracture detection and CWIS classification.

Marting V, Borren N, van Diepen MR, van Lieshout EMM, Wijffels MME, van Walsum T

•papers•May 16 2025

Trauma-induced rib fractures are a common injury. The number and characteristics of these fractures influence whether a patient is treated nonoperatively or surgically. Rib fractures are typically diagnosed using CT scans, yet 19.2-26.8% of fractures are still missed during assessment. Another challenge in managing rib fractures is the interobserver variability in their classification. Purpose of this study was to develop and assess an automated method that detects rib fractures in CT scans, and classifies them according to the Chest Wall Injury Society (CWIS) classification. 198 CT scans were collected, of which 170 were used for training and internal validation, and 28 for external validation. Fractures and their classifications were manually annotated in each of the scans. A detection and classification network was trained for each of the three components of the CWIS classifications. In addition, a rib number labeling network was trained for obtaining the rib number of a fracture. Experiments were performed to assess the method performance. On the internal test set, the method achieved a detection sensitivity of 80%, at a precision of 87%, and an F1-score of 83%, with a mean number of FPPS (false positives per scan) of 1.11. Classification sensitivity varied, with the lowest being 25% for complex fractures and the highest being 97% for posterior fractures. The correct rib number was assigned to 94% of the detected fractures. The custom-trained nnU-Net correctly labeled 95.5% of all ribs and 98.4% of fractured ribs in 30 patients. The detection and classification performance on the external validation dataset was slightly better, with a fracture detection sensitivity of 84%, precision of 85%, F1-score of 84%, FPPS of 0.96 and 95% of the fractures were assigned the correct rib number. The method developed is able to accurately detect and classify rib fractures in CT scans, there is room for improvement in the (rare and) underrepresented classes in the training set.

CT Detection Chest Retrospective Clinical In Silico Academic Lab

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

Ramadanov N, John P, Hable R, Schreyer AG, Shabo S, Prill R, Salzmann M

•papers•May 16 2025

The aim of this study was to compare the performance of artificial intelligence (AI) in detecting distal radius fractures (DRFs) on plain radiographs with the performance of human raters. We retrospectively analysed all wrist radiographs taken in our hospital since the introduction of AI-guided fracture detection from 11 September 2023 to 10 September 2024. The ground truth was defined by the radiological report of a board-certified radiologist based solely on conventional radiographs. The following parameters were calculated: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), accuracy (%), Cohen's Kappa coefficient, F1 score, sensitivity (%), specificity (%), Youden Index (J Statistic). In total 1145 plain radiographs of the wrist were taken between 11 September 2023 and 10 September 2024. The mean age of the included patients was 46.6 years (± 27.3), ranging from 2 to 99 years and 59.0% were female. According to the ground truth, of the 556 anteroposterior (AP) radiographs, 225 cases (40.5%) had a DRF, and of the 589 lateral view radiographs, 240 cases (40.7%) had a DRF. The AI system showed the following results on AP radiographs: accuracy (%): 95.90; Cohen's Kappa: 0.913; F1 score: 0.947; sensitivity (%): 92.02; specificity (%): 98.45; Youden Index: 90.47. The orthopedic surgeon achieved a sensitivity of 91.5%, specificity of 97.8%, an overall accuracy of 95.1%, F1 score of 0.943, and Cohen's kappa of 0.901. These results were comparable to those of the AI model. AI-guided detection of DRF demonstrated diagnostic performance nearly identical to that of an experienced orthopedic surgeon across all key metrics. The marginal differences observed in sensitivity and specificity suggest that AI can reliably support clinical fracture assessment based solely on conventional radiographs.

X-Ray Detection Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab

Automated Real-time Assessment of Intracranial Hemorrhage Detection AI Using an Ensembled Monitoring Model (EMM)

Zhongnan Fang, Andrew Johnston, Lina Cheuy, Hye Sun Na, Magdalini Paschali, Camila Gonzalez, Bonnie A. Armstrong, Arogya Koirala, Derrick Laurel, Andrew Walker Campion, Michael Iv, Akshay S. Chaudhari, David B. Larson

•preprint•May 16 2025

Artificial intelligence (AI) tools for radiology are commonly unmonitored once deployed. The lack of real-time case-by-case assessments of AI prediction confidence requires users to independently distinguish between trustworthy and unreliable AI predictions, which increases cognitive burden, reduces productivity, and potentially leads to misdiagnoses. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black-box commercial AI products, EMM operates independently without requiring access to internal AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM successfully categorizes confidence in the AI-generated prediction, suggesting different actions and helping improve the overall performance of AI tools to ultimately reduce cognitive burden. Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings.

CT Detection Neurological Retrospective Clinical In Silico

Computer-aided assessment for enlarged fetal heart with deep learning model.

Nurmaini S, Sapitri AI, Roseno MT, Rachmatullah MN, Mirani P, Bernolian N, Darmawahyuni A, Tutuko B, Firdaus F, Islami A, Arum AW, Bastian R

•papers•May 16 2025

Enlarged fetal heart conditions may indicate congenital heart diseases or other complications, making early detection through prenatal ultrasound essential. However, manual assessments by sonographers are often subjective, time-consuming, and inconsistent. This paper proposes a deep learning approach using the You Only Look Once (YOLO) architecture to automate fetal heart enlargement assessment. Using a set of ultrasound videos, YOLOv8 with a CBAM module demonstrated superior performance compared to YOLOv11 with self-attention. Incorporating the ResNeXtBlock-a residual network with cardinality-additionally enhanced accuracy and prediction consistency. The model exhibits strong capability in detecting fetal heart enlargement, offering a reliable computer-aided tool for sonographers during prenatal screenings. Further validation is required to confirm its clinical applicability. By improving early and accurate detection, this approach has the potential to enhance prenatal care, facilitate timely interventions, and contribute to better neonatal health outcomes.

Ultrasound Detection Cardiac Methodology In Silico Academic Lab

How early can we detect diabetic retinopathy? A narrative review of imaging tools for structural assessment of the retina.

Vaughan M, Denmead P, Tay N, Rajendram R, Michaelides M, Patterson E

•papers•May 16 2025

Despite current screening models, enhanced imaging modalities, and treatment regimens, diabetic retinopathy (DR) remains one of the leading causes of vision loss in working age adults. DR can result in irreversible structural and functional retinal damage, leading to visual impairment and reduced quality of life. Given potentially irreversible photoreceptor damage, diagnosis and treatment at the earliest stages will provide the best opportunity to avoid visual disturbances or retinopathy progression. We will review herein the current structural imaging methods used for DR assessment and their capability of detecting DR in the first stages of disease. Imaging tools, such as fundus photography, optical coherence tomography, fundus fluorescein angiography, optical coherence tomography angiography and adaptive optics-assisted imaging will be reviewed. Finally, we describe the future of DR screening programmes and the introduction of artificial intelligence as an innovative approach to detecting subtle changes in the diabetic retina. CLINICAL TRIAL REGISTRATION NUMBER: N/A.

OCT Detection Review Academic Lab

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Till T, Scherkl M, Stranger N, Singer G, Hankel S, Flucher C, Hržić F, Štajduhar I, Tschauner S

•papers•May 16 2025

To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design. This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests. Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested. AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings. Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab Open Dataset

From error to prevention of wrong-level spine surgery: a review.

Javadnia P, Gohari H, Salimi N, Alimohammadi E

•papers•May 15 2025

Wrong-level spine surgery remains a significant concern in spine surgery, leading to devastating consequences for patients and healthcare systems alike. This comprehensive review aims to analyze the existing literature on wrong-level spine surgery in spine procedures, identifying key factors that contribute to these errors and exploring advanced strategies and technologies designed to prevent them. A systematic literature search was conducted across multiple databases, including PubMed, Scopus, EMBASE, and CINAHL. The selection criteria focused on preclinical and clinical studies that specifically addressed wrong site and wrong level surgeries in the context of spine surgery. The findings reveal a range of contributing factors to wrong-level spine surgeries, including communication failures, inadequate preoperative planning, and insufficient surgical protocols. The review emphasizes the critical role of innovative technologies-such as artificial intelligence, advanced imaging techniques, and surgical navigation systems-alongside established safety protocols like digital checklists and simulation training in enhancing surgical accuracy and preventing errors. In conclusion, integrating advanced technologies and systematic safety protocols is instrumental in reducing the incidence of wrong-level spine surgeries. This review underscores the importance of continuous education and the adoption of innovative solutions to foster a culture of safety and improve surgical outcomes. By addressing the multifaceted challenges associated with these errors, the field can work towards minimizing their occurrence and enhancing patient care.

Mixed Modality Detection Musculoskeletal Review Concept Academic Lab

Filter Papers

Tags

Breast Arterial Calcifications on Mammography: A Review of the Literature.

MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

Comparative analysis of deep learning methods for breast ultrasound lesion detection and classification.

A deep learning-based approach to automated rib fracture detection and CWIS classification.

Artificial intelligence-guided distal radius fracture detection on plain radiographs in comparison with human raters.

Automated Real-time Assessment of Intracranial Hemorrhage Detection AI Using an Ensembled Monitoring Model (EMM)

Computer-aided assessment for enlarged fetal heart with deep learning model.

How early can we detect diabetic retinopathy? A narrative review of imaging tools for structural assessment of the retina.

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

From error to prevention of wrong-level spine surgery: a review.

Ready to Sharpen Your Edge?