Sort by:
Page 2 of 78773 results

Dolphin v1.0 Technical Report

Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun zheng, Anjie Le, Hongcheng Guo

arxiv logopreprintSep 30 2025
Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.

Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation

Longzhen Yang, Zhangkai Ni, Ying Wen, Yihang Liu, Lianghua He, Heng Tao Shen

arxiv logopreprintSep 30 2025
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images, anchored in explicit visual evidence to improve interpretability and facilitate integration into clinical workflows. However, existing methods often rely on separately trained detection modules that require extensive expert annotations, introducing high labeling costs and limiting generalizability due to pathology distribution bias across datasets. To address these challenges, we propose Self-Supervised Anatomical Consistency Learning (SS-ACL) -- a novel and annotation-free framework that aligns generated reports with corresponding anatomical regions using simple textual prompts. SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy, organizing entities by spatial location. It recursively reconstructs fine-grained anatomical regions to enforce intra-sample spatial alignment, inherently guiding attention maps toward visually relevant areas prompted by text. To further enhance inter-sample semantic alignment for abnormality recognition, SS-ACL introduces a region-level contrastive learning based on anatomical consistency. These aligned embeddings serve as priors for report generation, enabling attention maps to provide interpretable visual evidence. Extensive experiments demonstrate that SS-ACL, without relying on expert annotations, (i) generates accurate and visually grounded reports -- outperforming state-of-the-art methods by 10\% in lexical accuracy and 25\% in clinical efficacy, and (ii) achieves competitive performance on various downstream visual tasks, surpassing current leading visual foundation models by 8\% in zero-shot visual grounding.

Automating prostate volume acquisition using abdominal ultrasound scans for prostate-specific antigen density calculations.

Bennett RD, Barrett T, Sushentsev N, Sanmugalingam N, Lee KL, Gnanapragasam VJ, Tse ZTH

pubmed logopapersSep 30 2025
Proposed methods for prostate cancer screening are currently prohibitively expensive (due to the high costs of imaging equipment such as magnetic resonance imaging and traditional ultrasound systems), inadequate in their detection rates, require highly trained specialists, and/or are invasive, resulting in patient discomfort. These limitations make population-wide screening for prostate cancer challenging. Machine learning techniques applied to abdominal ultrasound scanning may help alleviate some of these disadvantages. Abdominal ultrasound scans are comparatively low cost and exhibit minimal patient discomfort, and machine learning can be applied to mitigate against the high operator-dependent variability of ultrasound scanning. In this study, a state-of-the-art machine learning model was compared to an expert radiologist and trainee radiologist registrars of varying experience when estimating prostate volume from abdominal ultrasound images, a crucial step in detecting prostate cancer using prostate-specific antigen density. The machine learning model calculated prostatic volume by marking out dimensions of the prolate ellipsoid formula from two orthogonal images of the prostate acquired with abdominal ultrasound scans (which could be conducted by operators with minimal experience in a primary care setting). While both the algorithm and the registrars showed high correlation with the expert ([Formula: see text]) it was found that the model outperformed the trainees in both accuracy (lowest average volume error of [Formula: see text]) and consistency (lowest IQR of [Formula: see text] and lowest average volume standard deviation of [Formula: see text]). The results are promising for the future development of an automated prostate cancer screening workflow using machine learning and abdominal ultrasound scans.

Optimizing retinal images based carotid atherosclerosis prediction with explainable foundation models.

Lee H, Kim J, Kwak S, Rehman A, Park SM, Chang J

pubmed logopapersSep 30 2025
Carotid atherosclerosis is a key predictor of cardiovascular disease (CVD), necessitating early detection. While foundation models (FMs) show promise in medical imaging, their optimal selection and fine-tuning strategies for classifying carotid atherosclerosis from retinal images remain unclear. Using data from 39,620 individuals, we evaluated four vision FMs with three fine-tuning methods. Performance was evaluated by predictive performance, clinical utility by survival analysis for future CVD mortality, and explainability by Grad-CAM with vessel segmentation. DINOv2 with low-rank adaptation showed the best overall performance (area under the receiver operating characteristic curve = 0.71; sensitivity = 0.87; specificity = 0.44), prognostic relevance (hazard ratio = 2.20, P-trend < 0.05), and vascular alignment. While further external validation on a broader clinical context is necessary to improve the model's generalizability, these findings support the feasibility of opportunistic atherosclerosis and CVD screening using retinal imaging and highlight the importance of a multi-dimensional evaluation framework for optimal FM selection in medical artificial intelligence.

Attention-enhanced hybrid U-Net for prostate cancer grading and explainability.

Zaheer AN, Farhan M, Min G, Alotaibi FA, Alnfiai MM

pubmed logopapersSep 30 2025
Prostate cancer remains a leading cause of mortality, necessitating precise histopathological segmentation for accurate Gleason Grade assessment. However, existing deep learning-based segmentation models lack contextual awareness and explainability, leading to inconsistent performance across heterogeneous tissue structures. Conventional U-Net architectures and CNN-based approaches struggle with capturing long-range dependencies and fine-grained histopathological patterns, resulting in suboptimal boundary delineation and model generalizability. To address these limitations, we propose a transformer-attention hybrid U-Net (TAH U-Net), integrating hybrid CNN-transformer encoding, attention-guided skip connections, and a multi-stage guided loss mechanism for enhanced segmentation accuracy and model interpretability. The ResNet50-based convolutional layers efficiently capture local spatial features, while Vision Transformer (ViT) blocks model global contextual dependencies, improving segmentation consistency. Attention mechanisms are incorporated into skip connections and decoder pathways, refining feature propagation by suppressing irrelevant tissue noise while enhancing diagnostically significant regions. A novel hierarchical guided loss function optimizes segmentation masks at multiple decoder stages, improving boundary refinement and gradient stability. Additionally, Explainable AI (XAI) techniques such as LIME, Occlusion Sensitivity, and Partial Dependence Analysis (PDP), validate the model's decision-making transparency, ensuring clinical reliability. The experimental evaluation on the SICAPv2 dataset demonstrates state-of-the-art performance, surpassing traditional U-Net architectures with a 4.6% increase in Dice Score, 5.1% gain in IoU, along with notable improvements in Precision (+ 4.2%) and Recall (+ 3.8%). This research significantly advances AI-driven prostate cancer diagnostics by providing an interpretable and highly accurate segmentation framework, enhancing clinical trust in histopathology-based grading within medical imaging and computational pathology.

Dolphin v1.0 Technical Report

Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun Zheng, Zhenyu Liu, Duo Zhang, Xiaoqing Guo, Anjie Le, Hongcheng Guo

arxiv logopreprintSep 30 2025
Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.

Recent technological advances in video capsule endoscopy: a comprehensive review.

Kim M, Jang HJ

pubmed logopapersSep 29 2025
Video capsule endoscopy (VCE) originally revolutionized gastrointestinal imaging by providing a noninvasive method for evaluating small bowel diseases. Recent technological innovations, including enhanced imaging systems, artificial intelligence (AI), and improved localization, have significantly improved VCE's diagnostic accuracy, efficiency, and clinical utility. This review aims to summarize and evaluate recent technological advances in VCE, focusing on system comparisons, image enhancement, localization technologies, and AI-assisted lesion detection.

Global mapping of artificial intelligence applications in breast cancer from 1988-2024: a machine learning approach.

Nguyen THT, Jeon S, Yoon J, Park B

pubmed logopapersSep 29 2025
Artificial intelligence (AI) has become increasingly integral to various aspects of breast cancer care, including screening, diagnosis, and treatment. This study aimed to critically examine the application of AI throughout the breast cancer care continuum to elucidate key research developments, emerging trends, and prevalent patterns. English articles and reviews published between 1988 and 2024 were retrieved from the Web of Science database, focusing on studies that applied AI in breast cancer research. Collaboration among countries was analyzed using co-authorship networks and co-occurrence mapping. Additionally, clustering analysis using Latent Dirichlet Allocation (LDA) was conducted for topic modeling, whereas linear regression was employed to assess trends in research outputs over time. A total of 8,711 publications were included in the analysis. The United States has led the research in applying AI to the breast cancer care continuum, followed by China and India. Recent publications have increasingly focused on the utilization of deep learning and machine learning (ML) algorithms for automated breast cancer detection in mammography and histopathology. Moreover, the integration of multi-omics data and molecular profiling with AI has emerged as a significant trend. However, research on the applications of robotic and ML technologies in surgical oncology and postoperative care remains limited. Overall, the volume of research addressing AI for early detection, diagnosis, and classification of breast cancer has markedly increased over the past five years. The rapid expansion of AI-related research on breast cancer underscores its potential impact. However, significant challenges remain. Ongoing rigorous investigations are essential to ensure that AI technologies yield evidence-based benefits across diverse patient populations, thereby avoiding the inadvertent exacerbation of existing healthcare disparities.

Development of a High-Performance Ultrasound Prediction Model for the Diagnosis of Endometrial Cancer: An Interpretable XGBoost Algorithm Utilizing SHAP Analysis.

Lai H, Wu Q, Weng Z, Lyu G, Yang W, Ye F

pubmed logopapersSep 29 2025
To develop and validate an ultrasonography-based machine learning (ML) model for predicting malignant endometrial and cavitary lesions. This retrospective study was conducted on patients with pathologically confirmed results following transvaginal or transrectal ultrasound from 2021 to 2023. Endometrial ultrasound features were characterized using the International Endometrial Tumor Analysis (IETA) terminology. The dataset was ranomly divided (7:3) into training and validation sets. LASSO (least absolute shrinkage and selection operator) regression was applied for feature selection, and an extreme gradient boosting (XGBoost) model was developed. Performance was assessed via receiver operating characteristic (ROC) analysis, calibration, decision curve analysis, sensitivity, specificity, and accuracy. Among 1080 patients, 6 had a non-measurable endometrium. Of the remaining 1074 cases, 641 were premenopausal and 433 postmenopausal. Performance of the XGBoost model on the test set: The area under the curve (AUC) for the premenopausal group was 0.845 (0.781-0.909), with a relatively low sensitivity (0.588, 0.442-0.722) and a relatively high specificity (0.923, 0.863-0.959); the AUC for the postmenopausal group was 0.968 (0.944-0.992), with both sensitivity (0.895, 0.778-0.956) and specificity (0.931, 0.839-0.974) being relatively high. SHapley Additive exPlanations (SHAP) analysis identified key predictors: endometrial-myometrial junction, endometrial thickness, endometrial echogenicity, color Doppler flow score, and vascular pattern in premenopausal women; endometrial thickness, endometrial-myometrial junction, endometrial echogenicity, and color Doppler flow score in postmenopausal women. The XGBoost-based model exhibited excellent predictive performance, particularly in postmenopausal patients. SHAP analysis further enhances interpretability by identifying key ultrasonographic predictors of malignancy.

Evaluation of Context-Aware Prompting Techniques for Classification of Tumor Response Categories in Radiology Reports Using Large Language Model.

Park J, Sim WS, Yu JY, Park YR, Lee YH

pubmed logopapersSep 29 2025
Radiology reports are essential for medical decision-making, providing crucial data for diagnosing diseases, devising treatment plans, and monitoring disease progression. While large language models (LLMs) have shown promise in processing free-text reports, research on effective prompting techniques for radiologic applications remains limited. To evaluate the effectiveness of LLM-driven classification based on radiology reports in terms of tumor response category (TRC), and to optimize the model through a comparison of four different prompt engineering techniques for effectively performing this classification task in clinical applications, we included 3062 whole-spine contrast-enhanced magnetic resonance imaging (MRI) radiology reports for prompt engineering and validation. TRCs were labeled by two radiologists based on criteria modified from the Response Evaluation Criteria in Solid Tumors (RECIST) guidelines. The Llama3 instruct model was used to classify TRCs in this study through four different prompts: General, In-Context Learning (ICL), Chain-of-Thought (CoT), and ICL with CoT. AUROC, accuracy, precision, recall, and F1-score were calculated against each prompt and model (8B, 70B) with the test report dataset. The average AUROC for ICL (0.96 internally, 0.93 externally) and ICL with CoT prompts (0.97 internally, 0.94 externally) outperformed other prompts. Error increased with prompt complexity, including 0.8% incomplete sentence errors and 11.3% probability-classification inconsistencies. This study demonstrates that context-aware LLM prompts substantially improved the efficiency and effectiveness of classifying TRCs from radiology reports, despite potential intrinsic hallucinations. While further improvements are required for real-world application, our findings suggest that context-aware prompts have significant potential for segmenting complex radiology reports and enhancing oncology clinical workflows.
Page 2 of 78773 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.