Sort by:
Page 6 of 1401393 results

Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation

Longzhen Yang, Zhangkai Ni, Ying Wen, Yihang Liu, Lianghua He, Heng Tao Shen

arxiv logopreprintSep 30 2025
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images, anchored in explicit visual evidence to improve interpretability and facilitate integration into clinical workflows. However, existing methods often rely on separately trained detection modules that require extensive expert annotations, introducing high labeling costs and limiting generalizability due to pathology distribution bias across datasets. To address these challenges, we propose Self-Supervised Anatomical Consistency Learning (SS-ACL) -- a novel and annotation-free framework that aligns generated reports with corresponding anatomical regions using simple textual prompts. SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy, organizing entities by spatial location. It recursively reconstructs fine-grained anatomical regions to enforce intra-sample spatial alignment, inherently guiding attention maps toward visually relevant areas prompted by text. To further enhance inter-sample semantic alignment for abnormality recognition, SS-ACL introduces a region-level contrastive learning based on anatomical consistency. These aligned embeddings serve as priors for report generation, enabling attention maps to provide interpretable visual evidence. Extensive experiments demonstrate that SS-ACL, without relying on expert annotations, (i) generates accurate and visually grounded reports -- outperforming state-of-the-art methods by 10\% in lexical accuracy and 25\% in clinical efficacy, and (ii) achieves competitive performance on various downstream visual tasks, surpassing current leading visual foundation models by 8\% in zero-shot visual grounding.

Centiloid values from deep learning-based CT parcellation: a valid alternative to freesurfer.

Yoon YJ, Seo S, Lee S, Lim H, Choo K, Kim D, Han H, So M, Kang H, Kang S, Kim D, Lee YG, Shin D, Jeon TJ, Yun M

pubmed logopapersSep 30 2025
Amyloid PET/CT is essential for quantifying amyloid-beta (Aβ) deposition in Alzheimer's disease (AD), with the Centiloid (CL) scale standardizing measurements across imaging centers. However, MRI-based CL pipelines face challenges: high cost, contraindications, and patient burden. To address these challenges, we developed a deep learning-based CT parcellation pipeline calibrated to the standard CL scale using CT images from PET/CT scans and evaluated its performance relative to standard pipelines. A total of 306 participants (23 young controls [YCs] and 283 patients) underwent 18 F-florbetaben (FBB) PET/CT and MRI. Based on visual assessment, 207 patients were classified as Aβ-positive and 76 as Aβ-negative. PET images were processed using the CT parcellation pipeline and compared to FreeSurfer (FS) and standard pipelines. Agreement was assessed via regression analyses. Effect size, variance, and ROC analyses were used to compare pipelines and determine the optimal CL threshold relative to visual Aβ assessment. The CT parcellation showed high concordance with the FS and provided reliable CL quantification (R² = 0.99). Both pipelines demonstrated similar variance in YCs and effect sizes between YCs and ADCI. ROC analyses confirmed comparable accuracy and similar CL thresholds, supporting CT parcellation as a viable MRI-free alternative. Our findings indicate that the CT parcellation pipeline achieves a level of accuracy similar to FS in CL quantification, demonstrating its reliability as an MRI-free alternative. In PET/CT, CT and PET are acquired sequentially within the same session on a shared bed and headrest, which helps maintain consistent positioning and adequate spatial alignment, reducing registration errors and supporting more reliable and precise quantification.

Deep transfer learning based feature fusion model with Bonobo optimization algorithm for enhanced brain tumor segmentation and classification through biomedical imaging.

Gurunathan P, Srinivasan PS, S R

pubmed logopapersSep 30 2025
The brain tumour (BT) is an aggressive disease among others, which leads to a very short life expectancy. Therefore, early and prompt treatment is the main stage in enhancing patients' quality of life. Biomedical imaging permits the non-invasive evaluation of diseases, depending upon visual assessments that lead to better medical outcome expectations and therapeutic planning. Numerous image techniques like computed tomography (CT), magnetic resonance imaging (MRI), etc., are employed for evaluating cancer in the brain. The detection, segmentation and extraction of diseased tumour regions from biomedical images are a primary concern, but are tiresome and time-consuming tasks done by clinical specialists, and their outcome depends on their experience only. Therefore, the use of computer-aided technologies is essential to overcoming these limitations. Recently, artificial intelligence (AI) models have been very effective in enhancing performance and improving the method of medical image diagnosis. This paper proposes an Enhanced Brain Tumour Segmentation through Biomedical Imaging and Feature Model Fusion with Bonobo Optimiser (EBTS-BIFMFBO) model. The main intention of the EBTS-BIFMFBO model relies on enhancing the segmentation and classification model of BTs utilizing advanced models. Initially, the EBTS-BIFMFBO technique follows bilateral filter (BF)-based noise elimination and CLAHE-based contrast enhancement. Furthermore, the proposed EBTS-BIFMFBO model involves a segmentation process by the DeepLabV3 + model to identify tumour regions for accurate diagnosis. Moreover, the fusion models such as InceptionResNetV2, MobileNet, and DenseNet201 are employed for the feature extraction. Additionally, the convolutional sparse autoencoder (CSAE) method is implemented for the classification process of BT. Finally, the hyper-parameter selection of CSAE is performed by the bonobo optimizer (BO) method. A vast experiment is conducted to highlight the performance of the EBTS-BIFMFBO approach under the Figshare BT dataset. The comparison results of the EBTS-BIFMFBO approach portrayed a superior accuracy value of 99.16% over existing models.

Multi scale self supervised learning for deep knowledge transfer in diabetic retinopathy grading.

Almattar W, Anwar S, Al-Azani S, Khan FA

pubmed logopapersSep 30 2025
Diabetic retinopathy is a leading cause of vision loss, necessitating early, accurate detection. Automated deep learning models show promise but struggle with the complexity of retinal images and limited labeled data. Due to domain differences, traditional transfer learning from datasets like ImageNet often fails in medical imaging. Self-supervised learning (SSL) offers a solution by enabling models to learn directly from medical data, but its success depends on the backbone architecture. Convolutional Neural Networks (CNNs) focus on local features, which can be limiting. To address this, we propose the Multi-scale Self-Supervised Learning (MsSSL) model, combining Vision Transformers (ViTs) for global context and CNNs with a Feature Pyramid Network (FPN) for multi-scale feature extraction. These features are refined through a Deep Learner module, improving spatial resolution and capturing high-level and fine-grained information. The MsSSL model significantly enhances DR grading, outperforming traditional methods, and underscores the value of domain-specific pretraining and advanced model integration in medical imaging.

Optimizing retinal images based carotid atherosclerosis prediction with explainable foundation models.

Lee H, Kim J, Kwak S, Rehman A, Park SM, Chang J

pubmed logopapersSep 30 2025
Carotid atherosclerosis is a key predictor of cardiovascular disease (CVD), necessitating early detection. While foundation models (FMs) show promise in medical imaging, their optimal selection and fine-tuning strategies for classifying carotid atherosclerosis from retinal images remain unclear. Using data from 39,620 individuals, we evaluated four vision FMs with three fine-tuning methods. Performance was evaluated by predictive performance, clinical utility by survival analysis for future CVD mortality, and explainability by Grad-CAM with vessel segmentation. DINOv2 with low-rank adaptation showed the best overall performance (area under the receiver operating characteristic curve = 0.71; sensitivity = 0.87; specificity = 0.44), prognostic relevance (hazard ratio = 2.20, P-trend < 0.05), and vascular alignment. While further external validation on a broader clinical context is necessary to improve the model's generalizability, these findings support the feasibility of opportunistic atherosclerosis and CVD screening using retinal imaging and highlight the importance of a multi-dimensional evaluation framework for optimal FM selection in medical artificial intelligence.

Attention-enhanced hybrid U-Net for prostate cancer grading and explainability.

Zaheer AN, Farhan M, Min G, Alotaibi FA, Alnfiai MM

pubmed logopapersSep 30 2025
Prostate cancer remains a leading cause of mortality, necessitating precise histopathological segmentation for accurate Gleason Grade assessment. However, existing deep learning-based segmentation models lack contextual awareness and explainability, leading to inconsistent performance across heterogeneous tissue structures. Conventional U-Net architectures and CNN-based approaches struggle with capturing long-range dependencies and fine-grained histopathological patterns, resulting in suboptimal boundary delineation and model generalizability. To address these limitations, we propose a transformer-attention hybrid U-Net (TAH U-Net), integrating hybrid CNN-transformer encoding, attention-guided skip connections, and a multi-stage guided loss mechanism for enhanced segmentation accuracy and model interpretability. The ResNet50-based convolutional layers efficiently capture local spatial features, while Vision Transformer (ViT) blocks model global contextual dependencies, improving segmentation consistency. Attention mechanisms are incorporated into skip connections and decoder pathways, refining feature propagation by suppressing irrelevant tissue noise while enhancing diagnostically significant regions. A novel hierarchical guided loss function optimizes segmentation masks at multiple decoder stages, improving boundary refinement and gradient stability. Additionally, Explainable AI (XAI) techniques such as LIME, Occlusion Sensitivity, and Partial Dependence Analysis (PDP), validate the model's decision-making transparency, ensuring clinical reliability. The experimental evaluation on the SICAPv2 dataset demonstrates state-of-the-art performance, surpassing traditional U-Net architectures with a 4.6% increase in Dice Score, 5.1% gain in IoU, along with notable improvements in Precision (+ 4.2%) and Recall (+ 3.8%). This research significantly advances AI-driven prostate cancer diagnostics by providing an interpretable and highly accurate segmentation framework, enhancing clinical trust in histopathology-based grading within medical imaging and computational pathology.

Dolphin v1.0 Technical Report

Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun Zheng, Zhenyu Liu, Duo Zhang, Xiaoqing Guo, Anjie Le, Hongcheng Guo

arxiv logopreprintSep 30 2025
Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.

Non-contrast CT-based pulmonary embolism detection using GAN-generated synthetic contrast enhancement: Development and validation of an AI framework.

Kim YT, Bak SH, Han SS, Son Y, Park J

pubmed logopapersSep 30 2025
Acute pulmonary embolism (PE) is a life-threatening condition often diagnosed using CT pulmonary angiography (CTPA). However, CTPA is contraindicated in patients with contrast allergies or at risk for contrast-induced nephropathy. This study explores an AI-driven approach to generate synthetic contrast-enhanced images from non-contrast CT scans for accurate diagnosis of acute PE without contrast agents. This retrospective study used dual-energy and standard CT datasets from two institutions. The internal dataset included 84 patients: 41 PE-negative cases for generative model training and 43 patients (30 PE-positive) for diagnostic evaluation. An external dataset of 62 patients (26 PE-positive) was used for further validation. We developed a generative adversarial network (GAN) based on U-Net, trained on paired non-contrast and contrast-enhanced images. The model was optimized using contrast-enhanced L1-loss with hyperparameter λ to improve anatomical accuracy. A ConvNeXt-based classifier trained on the RSNA dataset (N = 7,122) generated per-slice PE probabilities, which were aggregated for patient-level prediction via a Random Forest model. Diagnostic performance was assessed using five-fold cross-validation on both internal and external datasets. The GAN achieved optimal image similarity at λ = 0.5, with the lowest mean absolute error (0.0089) and highest MS-SSIM (0.9674). PE classification yielded AUCs of 0.861 and 0.836 in the internal dataset, and 0.787 and 0.680 in the external dataset, using real and synthetic images, respectively. No statistically significant differences were observed. Our findings demonstrate that synthetic contrast CT can serve as a viable alternative for PE diagnosis in patients contraindicated for CTPA, supporting safe and accessible imaging strategies.

Empowering Radiologists With ChatGPT-4o: Comparative Evaluation of Large Language Models and Radiologists in Cardiac Cases.

Cesur T, Gunes YC, Camur E, Dağli M

pubmed logopapersSep 30 2025
This study evaluated the diagnostic accuracy and differential diagnostic capabilities of 12 Large Language Models (LLMs), one cardiac radiologist, and 3 general radiologists in cardiac radiology. The impact of the ChatGPT-4o assistance on radiologist performance was also investigated. We collected publicly available 80 "Cardiac Case of the Month" from the Society of Thoracic Radiology website. LLMs and Radiologist-III were provided with text-based information, whereas other radiologists visually assessed the cases with and without the ChatGPT-4o assistance. Diagnostic accuracy and differential diagnosis scores (DDx scores) were analyzed using the χ2, Kruskal-Wallis, Wilcoxon, McNemar, and Mann-Whitney U tests. The unassisted diagnostic accuracy of the cardiac radiologist was 72.5%, general radiologist-I was 53.8%, and general radiologist-II was 51.3%. With ChatGPT-4o, the accuracy improved to 78.8%, 70.0%, and 63.8%, respectively. The improvements for general radiologists-I and II were statistically significant (P≤0.006). All radiologists' DDx scores improved significantly with ChatGPT-4o assistance (P≤0.05). Remarkably, Radiologist-I's GPT-4o-assisted diagnostic accuracy and DDx score were not significantly different from the Cardiac Radiologist's unassisted performance (P>0.05).Among the LLMs, Claude 3 Opus and Claude 3.5 Sonnet had the highest accuracy (81.3%), followed by Claude 3 Sonnet (70.0%). Regarding the DDx score, Claude 3 Opus outperformed all models and radiologist-III (P<0.05). The accuracy of the general radiologist-III significantly improved from 48.8% to 63.8% with GPT4o assistance (P<0.001). ChatGPT-4o may enhance the diagnostic performance of general radiologists in cardiac imaging, suggesting its potential as a diagnostic support tool. Further studies are required to assess the clinical integration.

Transformer Classification of Breast Lesions: The BreastDCEDL_AMBL Benchmark Dataset and 0.92 AUC Baseline

Naomi Fridman, Anat Goldstein

arxiv logopreprintSep 30 2025
Breast magnetic resonance imaging is a critical tool for cancer detection and treatment planning, but its clinical utility is hindered by poor specificity, leading to high false-positive rates and unnecessary biopsies. This study introduces a transformer-based framework for automated classification of breast lesions in dynamic contrast-enhanced MRI, addressing the challenge of distinguishing benign from malignant findings. We implemented a SegFormer architecture that achieved an AUC of 0.92 for lesion-level classification, with 100% sensitivity and 67% specificity at the patient level - potentially eliminating one-third of unnecessary biopsies without missing malignancies. The model quantifies malignant pixel distribution via semantic segmentation, producing interpretable spatial predictions that support clinical decision-making. To establish reproducible benchmarks, we curated BreastDCEDL_AMBL by transforming The Cancer Imaging Archive's AMBL collection into a standardized deep learning dataset with 88 patients and 133 annotated lesions (89 benign, 44 malignant). This resource addresses a key infrastructure gap, as existing public datasets lack benign lesion annotations, limiting benign-malignant classification research. Training incorporated an expanded cohort of over 1,200 patients through integration with BreastDCEDL datasets, validating transfer learning approaches despite primary tumor-only annotations. Public release of the dataset, models, and evaluation protocols provides the first standardized benchmark for DCE-MRI lesion classification, enabling methodological advancement toward clinical deployment.
Page 6 of 1401393 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.