Sort by:
Page 16 of 1401395 results

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

arxiv logopreprintSep 22 2025
Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at https://github.com/zcablii/ViTP.

Diagnostic accuracy and consistency of ChatGPT-4o in radiology: influence of image, clinical data, and answer options on performance.

Atakır K, Işın K, Taş A, Önder H

pubmed logopapersSep 22 2025
This study aimed to evaluate the diagnostic accuracy of Chat Generative Pre-trained Transformer (ChatGPT) version 4 Omni (ChatGPT-4o) in radiology across seven information input combinations (image, clinical data, and multiple-choice options) to assess the consistency of its outputs across repeated trials and to compare its performance with that of human radiologists. We tested 129 distinct radiology cases under seven input conditions (varying presence of imaging, clinical context, and answer options). Each case was processed by ChatGPT-4o for seven different input combinations on three separate accounts. Diagnostic accuracy was determined by comparison with ground-truth diagnoses, and interobserver consistency was measured using Fleiss' kappa. Pairwise comparisons were performed with the Wilcoxon signed-rank test. Additionally, the same set of cases was evaluated by nine radiology residents to benchmark ChatGPT-4o's performance against human diagnostic accuracy. ChatGPT-4o's diagnostic accuracy was lowest for "image only" (19.90%) and "options only" (20.67%) conditions. The highest accuracy was observed in "image + clinical information + options" (80.88%) and "clinical information + options" (75.45%) conditions. The highest interobserver agreement was observed in the "image + clinical information + options" condition (κ = 0.733) and the lowest was in the "options only" condition (κ = 0.023), suggesting that more information improves consistency. However, there was no effective benefit of adding imaging data over already provided clinical data and options, as seen in post-hoc analysis. In human comparison, ChatGPT-4o outperformed radiology residents in text-based configurations (75.45% vs. 42.89%), whereas residents showed slightly better performance in image-based tasks (64.13% vs. 61.24%). Notably, when residents were allowed to use ChatGPT-4o as a support tool, their image-based diagnostic accuracy increased from 63.04% to 74.16%. ChatGPT-4o performs well when provided with rich textual input but remains limited in purely image- based diagnoses. Its accuracy and consistency increase with multimodal input, yet adding imaging does not significantly improve performance beyond clinical context and diagnostic options alone. The model's superior performance to residents in text-based tasks underscores its potential as a diagnostic aid in structured scenarios. Furthermore, its integration as a support tool may enhance human diagnostic accuracy, particularly in image-based interpretation. Although ChatGPT-4o is not yet capable of reliably interpreting radiologic images on its own, it demonstrates strong performance in text-based diagnostic reasoning. Its integration into clinical workflows-particularly for triage, structured decision support, or educational purposes-may augment radiologists' diagnostic capacity and consistency.

Conditional Diffusion Models for CT Image Synthesis from CBCT: A Systematic Review

Alzahra Altalib, Chunhui Li, Alessandro Perelli

arxiv logopreprintSep 22 2025
Objective: Cone-beam computed tomography (CBCT) provides a low-dose imaging alternative to conventional CT, but suffers from noise, scatter, and artifacts that degrade image quality. Synthetic CT (sCT) aims to translate CBCT to high-quality CT-like images for improved anatomical accuracy and dosimetric precision. Although deep learning approaches have shown promise, they often face limitations in generalizability and detail preservation. Conditional diffusion models (CDMs), with their iterative refinement process, offers a novel solution. This review systematically examines the use of CDMs for CBCT-to-sCT synthesis. Methods: A systematic search was conducted in Web of Science, Scopus, and Google Scholar for studies published between 2013 and 2024. Inclusion criteria targeted works employing conditional diffusion models specifically for sCT generation. Eleven relevant studies were identified and analyzed to address three questions: (1) What conditional diffusion methods are used? (2) How do they compare to conventional deep learning in accuracy? (3) What are their clinical implications? Results: CDMs incorporating anatomical priors and spatial-frequency features demonstrated improved structural preservation and noise robustness. Energy-guided and hybrid latent models enabled enhanced dosimetric accuracy and personalized image synthesis. Across studies, CDMs consistently outperformed traditional deep learning models in noise suppression and artefact reduction, especially in challenging cases like lung imaging and dual-energy CT. Conclusion: Conditional diffusion models show strong potential for generalized, accurate sCT generation from CBCT. However, clinical adoption remains limited. Future work should focus on scalability, real-time inference, and integration with multi-modal imaging to enhance clinical relevance.

Enhancing Instance Feature Representation: A Foundation Model-Based Multi-Instance Approach for Neonatal Retinal Screening.

Guo J, Wang K, Tan G, Li G, Zhang X, Chen J, Hu J, Liang Y, Jiang B

pubmed logopapersSep 22 2025
Automated analysis of neonatal fundus images presents a uniquely intricate challenge in medical imaging. Existing methodologies predominantly focus on diagnosing abnormalities from individual images, often leading to inaccuracies due to the diverse and subtle nature of neonatal retinal features. Consequently, clinical standards frequently mandate the acquisition of retinal images from multiple angles to ensure the detection of minute lesions. To accommodate this, we propose leveraging multiple fundus images captured from various regions of the retina to comprehensively screen for a wide range of neonatal ocular pathologies. We employ Multiple Instance Learning (MIL) for this task, and introduce a simple yet effective learnable structure on the existing MIL method, called Learnable Dense to Global (LD2G-MIL). Different from other methods that focus on instance-to-bag feature aggregation, the proposed method focuses on generating better instance-level representations that are co-optimized with downstream MIL targets in a learnable way. Additionally, it incorporates a bag prior-based similarity loss (BP loss) mechanism, leveraging prior knowledge to enhance performance in neonatal retinal screening. To validate the efficacy of our LD2G-MIL method, we compiled the Neonatal Fundus Images (NFI) dataset, an extensive collection comprising 115,621 retinal images from 8,886 neonatal clinical episodes. Empirical evaluations on this dataset demonstrate that our approach consistently outperforms stateof-the-art (SOTA) generic and specialized methods. The code and trained models are publicly available at https: //github.com/CVIU-CSU/LD2G-MIL.

Comprehensive Assessment of Tumor Stromal Heterogeneity in Bladder Cancer by Deep Learning and Habitat Radiomics.

Du Y, Sui Y, Tao Y, Cao J, Jiang X, Yu J, Wang B, Wang Y, Li H

pubmed logopapersSep 22 2025
Tumor stromal heterogeneity plays a pivotal role in bladder cancer progression. The tumor-stroma ratio (TSR) is a key pathological marker reflecting stromal heterogeneity. This study aimed to develop a preoperative, CT-based machine learning model for predicting TSR in bladder cancer, comparing various radiomic approaches, and evaluating their utility in prognostic assessment and immunotherapy response prediction. A total of 477 bladder urothelial carcinoma patients from two centers were retrospectively included. Tumors were segmented on preoperative contrast-enhanced CT, and radiomic features were extracted. K-means clustering was used to divide tumors into subregions. Radiomics models were constructed: a conventional model (Intra), a multi-subregion model (Habitat), and single-subregion models (HabitatH1/H2/H3). A deep transfer learning model (DeepL) based on the largest tumor cross-section was also developed. Model performance was evaluated in training, testing, and external validation cohorts, and associations with recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response were analyzed. The HabitatH1 model demonstrated robust diagnostic performance with favorable calibration and clinical utility. The DeepL model surpassed all radiomics models in predictive accuracy. A nomogram combining DeepL and clinical variables effectively predicted recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response. Imaging-predicted TSR showed significant associations with the tumor immune microenvironment and treatment outcomes. CT-based habitat radiomics and deep learning models enable non-invasive, quantitative assessment of TSR in bladder cancer. The DeepL model provides superior diagnostic and prognostic value, supporting personalized treatment decisions and prediction of immunotherapy response.

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data

Ding Shaodong, Liu Ziyang, Zhou Yijun, Liu Tao

arxiv logopreprintSep 22 2025
The automatic diagnosis of Parkinson's disease is in high clinical demand due to its prevalence and the importance of targeted treatment. Current clinical practice often relies on diagnostic biomarkers in QSM and NM-MRI images. However, the lack of large, high-quality datasets makes training diagnostic models from scratch prone to overfitting. Adapting pre-trained 3D medical models is also challenging, as the diversity of medical imaging leads to mismatches in voxel spacing and modality between pre-training and fine-tuning data. In this paper, we address these challenges by leveraging 2D vision foundation models (VFMs). Specifically, we crop multiple key ROIs from NM and QSM images, process each ROI through separate branches to compress the ROI into a token, and then combine these tokens into a unified patient representation for classification. Within each branch, we use 2D VFMs to encode axial slices of the 3D ROI volume and fuse them into the ROI token, guided by an auxiliary segmentation head that steers the feature extraction toward specific brain nuclei. Additionally, we introduce multi-ROI supervised contrastive learning, which improves diagnostic performance by pulling together representations of patients from the same class while pushing away those from different classes. Our approach achieved first place in the MICCAI 2025 PDCADxFoundation challenge, with an accuracy of 86.0% trained on a dataset of only 300 labeled QSM and NM-MRI scans, outperforming the second-place method by 5.5%.These results highlight the potential of 2D VFMs for clinical analysis of 3D MR images.

MRI-based habitat analysis for pathologic response prediction after neoadjuvant chemoradiotherapy in rectal cancer: a multicenter study.

Chen Q, Zhang Q, Li Z, Zhang S, Xia Y, Wang H, Lu Y, Zheng A, Shao C, Shen F

pubmed logopapersSep 22 2025
To investigate MRI-based habitat analysis for its value in predicting pathologic response following neoadjuvant chemoradiotherapy (nCRT) in rectal cancer (RC) patients. 1021 RC patients in three hospitals were divided into the training and test sets (n = 319), the internal validation set (n = 317), and external validation sets 1 (n = 158) and 2 (n = 227). Deep learning was performed to automatically segment the entire lesion on high-resolution MRI. Simple linear iterative clustering was used to divide each tumor into subregions, from which radiomics features were extracted. The optimal number of clusters reflecting the diversity of the tumor ecosystem was determined. Finally, four models were developed: clinical, intratumoral heterogeneity (ITH)-based, radiomics, and fusion models. The performance of these models was evaluated. The impact of nCRT on disease-free survival (DFS) was further analyzed. The Delong test revealed the fusion model (AUCs of 0.867, 0.851, 0.852, and 0.818 in the four cohorts, respectively), the radiomics model (0.831, 0.694, 0.753, and 0.705, respectively), and the ITH model (0.790, 0.786, 0.759, and 0.722, respectively) were all superior to the clinical model (0.790, 0.605, 0.735, and 0.704, respectively). However, no significant differences were detected between the fusion and ITH models. Patients stratified using the fusion model showed significant differences in DFS between the good and poor response groups (all p < 0.05 in the four sets). The fusion model combining clinical factors, radiomics features, and ITH features may help predict pathologic response in RC cases receiving nCRT. Question Identifying rectal cancer (RC) patients likely to benefit from neoadjuvant chemoradiotherapy (nCRT) before treatment is crucial. Findings The fusion model shows the best performance in predicting response after neoadjuvant chemoradiotherapy. Clinical relevance The fusion model integrates clinical characteristics, radiomics features, and intratumoral heterogeneity (ITH)features, which can be applied for the prediction of response to nCRT in RC patients, offering potential benefits in terms of personalized treatment strategies.

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

arxiv logopreprintSep 22 2025
Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at github.com/zcablii/ViTP.

Development and Temporal Validation of a Deep Learning Model for Automatic Fetal Biometry from Ultrasound Videos.

Goetz-Fu M, Haller M, Collins T, Begusic N, Jochum F, Keeza Y, Uwineza J, Marescaux J, Weingertner AS, Sananès N, Hostettler A

pubmed logopapersSep 22 2025
The objective was to develop an artificial intelligence (AI)-based system, using deep neural network (DNN) technology, to automatically detect standard fetal planes during video capture, measure fetal biometry parameters and estimate fetal weight. A standard plane recognition DNN was trained to classify ultrasound images into four categories: head circumference (HC), abdominal circumference (AC), femur length (FL) standard planes, or 'other'. The recognized standard plane images were subsequently processed by three fetal biometry DNNs, automatically measuring HC, AC and FL. Fetal weight was then estimated with the Hadlock 3 formula. The training dataset consisted of 16,626 images. A prospective temporal validation was then conducted using an independent set of 281 ultrasound videos of healthy fetuses. Fetal weight and biometry measurements were compared against an expert sonographer. Two less experienced sonographers were used as controls. The AI system obtained a significantly lower absolute relative measurement error in fetal weight estimation than the controls (AI vs. medium-level: p = 0.032, AI vs. beginner: p < 1e-8), so in AC measurements (AI vs. medium-level: p = 1.72e-04, AI vs. beginner: p < 1e-06). Average absolute relative measurement errors of AI versus expert were: 0.96 % (S.D. 0.79 %) for HC, 1.56 % (S.D. 1.39 %) for AC, 1.77 % (S.D. 1.46 %) for FL and 3.10 % (S.D. 2.74 %) for fetal weight estimation. The AI system produced similar biometry measurements and fetal weight estimation to those of the expert sonographer. It is a promising tool to enhance non-expert sonographers' performance and reproducibility in fetal biometry measurements, and to reduce inter-operator variability.

Evaluation of Operator Variability and Validation of an AI-Assisted α-Angle Measurement System for DDH Using a Phantom Model.

Ohashi Y, Shimizu T, Koyano H, Nakamura Y, Takahashi D, Yamada K, Iwasaki N

pubmed logopapersSep 22 2025
Ultrasound examination using the Graf method is widely applied for early detection of developmental dysplasia of the hip (DDH), but intra- and inter-operator variability remains a limitation. This study aimed to quantify operator variability in hip ultrasound assessments and to validate an AI-assisted system for automated α-angle measurement to improve reproducibility. Thirty participants of different experience levels, including trained clinicians, residents, and medical students, each performed six ultrasound scans on a standardized infant hip phantom. Examination time, iliac margin inclination, and α-angle measurements were analyzed to assess intra- and inter-operator variability. In parallel, an AI-based system was developed to automatically detect anatomical landmarks and calculate α-angles from static images and dynamic video sequences. Validation was conducted using the phantom model with a known α-angle of 70°. Clinicians achieved shorter examination times and higher reproducibility than residents and students, with manual measurements systematically underestimating the reference α-angle. Static AI produced closer estimates with greater variability, whereas dynamic AI achieved the highest accuracy (mean 69.2°) and consistency with narrower limits of agreement than manual measurements. These findings confirm substantial operator variability and demonstrate that AI-assisted dynamic ultrasound analysis can improve reproducibility and reliability in routine DDH screening.
Page 16 of 1401395 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.