Sort by:
Page 17 of 1401395 results

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs

Advait Gosai, Arun Kavishwar, Stephanie L. McNamara, Soujanya Samineni, Renato Umeton, Alexander Chowdhury, William Lotter

arxiv logopreprintSep 22 2025
Recent work has shown promising performance of frontier large language models (LLMs) and their multimodal counterparts in medical quizzes and diagnostic tasks, highlighting their potential for broad clinical utility given their accessible, general-purpose nature. However, beyond diagnosis, a fundamental aspect of medical image interpretation is the ability to localize pathological findings. Evaluating localization not only has clinical and educational relevance but also provides insight into a model's spatial understanding of anatomy and disease. Here, we systematically assess two general-purpose MLLMs (GPT-4 and GPT-5) and a domain-specific model (MedGemma) in their ability to localize pathologies on chest radiographs, using a prompting pipeline that overlays a spatial grid and elicits coordinate-based predictions. Averaged across nine pathologies in the CheXlocalize dataset, GPT-5 exhibited a localization accuracy of 49.7%, followed by GPT-4 (39.1%) and MedGemma (17.7%), all lower than a task-specific CNN baseline (59.9%) and a radiologist benchmark (80.1%). Despite modest performance, error analysis revealed that GPT-5's predictions were largely in anatomically plausible regions, just not always precisely localized. GPT-4 performed well on pathologies with fixed anatomical locations, but struggled with spatially variable findings and exhibited anatomically implausible predictions more frequently. MedGemma demonstrated the lowest performance on all pathologies, showing limited capacity to generalize to this novel task. Our findings highlight both the promise and limitations of current MLLMs in medical imaging and underscore the importance of integrating them with task-specific tools for reliable use.

Comprehensive Assessment of Tumor Stromal Heterogeneity in Bladder Cancer by Deep Learning and Habitat Radiomics.

Du Y, Sui Y, Tao Y, Cao J, Jiang X, Yu J, Wang B, Wang Y, Li H

pubmed logopapersSep 22 2025
Tumor stromal heterogeneity plays a pivotal role in bladder cancer progression. The tumor-stroma ratio (TSR) is a key pathological marker reflecting stromal heterogeneity. This study aimed to develop a preoperative, CT-based machine learning model for predicting TSR in bladder cancer, comparing various radiomic approaches, and evaluating their utility in prognostic assessment and immunotherapy response prediction. A total of 477 bladder urothelial carcinoma patients from two centers were retrospectively included. Tumors were segmented on preoperative contrast-enhanced CT, and radiomic features were extracted. K-means clustering was used to divide tumors into subregions. Radiomics models were constructed: a conventional model (Intra), a multi-subregion model (Habitat), and single-subregion models (HabitatH1/H2/H3). A deep transfer learning model (DeepL) based on the largest tumor cross-section was also developed. Model performance was evaluated in training, testing, and external validation cohorts, and associations with recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response were analyzed. The HabitatH1 model demonstrated robust diagnostic performance with favorable calibration and clinical utility. The DeepL model surpassed all radiomics models in predictive accuracy. A nomogram combining DeepL and clinical variables effectively predicted recurrence-free survival, CD8+ T cell infiltration, and immunotherapy response. Imaging-predicted TSR showed significant associations with the tumor immune microenvironment and treatment outcomes. CT-based habitat radiomics and deep learning models enable non-invasive, quantitative assessment of TSR in bladder cancer. The DeepL model provides superior diagnostic and prognostic value, supporting personalized treatment decisions and prediction of immunotherapy response.

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

arxiv logopreprintSep 22 2025
Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at github.com/zcablii/ViTP.

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data

Ding Shaodong, Liu Ziyang, Zhou Yijun, Liu Tao

arxiv logopreprintSep 22 2025
The automatic diagnosis of Parkinson's disease is in high clinical demand due to its prevalence and the importance of targeted treatment. Current clinical practice often relies on diagnostic biomarkers in QSM and NM-MRI images. However, the lack of large, high-quality datasets makes training diagnostic models from scratch prone to overfitting. Adapting pre-trained 3D medical models is also challenging, as the diversity of medical imaging leads to mismatches in voxel spacing and modality between pre-training and fine-tuning data. In this paper, we address these challenges by leveraging 2D vision foundation models (VFMs). Specifically, we crop multiple key ROIs from NM and QSM images, process each ROI through separate branches to compress the ROI into a token, and then combine these tokens into a unified patient representation for classification. Within each branch, we use 2D VFMs to encode axial slices of the 3D ROI volume and fuse them into the ROI token, guided by an auxiliary segmentation head that steers the feature extraction toward specific brain nuclei. Additionally, we introduce multi-ROI supervised contrastive learning, which improves diagnostic performance by pulling together representations of patients from the same class while pushing away those from different classes. Our approach achieved first place in the MICCAI 2025 PDCADxFoundation challenge, with an accuracy of 86.0% trained on a dataset of only 300 labeled QSM and NM-MRI scans, outperforming the second-place method by 5.5%.These results highlight the potential of 2D VFMs for clinical analysis of 3D MR images.

Learning Scan-Adaptive MRI Undersampling Patterns with Pre-Optimized Mask Supervision

Aryan Dhar, Siddhant Gautam, Saiprasad Ravishankar

arxiv logopreprintSep 21 2025
Deep learning techniques have gained considerable attention for their ability to accelerate MRI data acquisition while maintaining scan quality. In this work, we present a convolutional neural network (CNN) based framework for learning undersampling patterns directly from multi-coil MRI data. Unlike prior approaches that rely on in-training mask optimization, our method is trained with precomputed scan-adaptive optimized masks as supervised labels, enabling efficient and robust scan-specific sampling. The training procedure alternates between optimizing a reconstructor and a data-driven sampling network, which generates scan-specific sampling patterns from observed low-frequency $k$-space data. Experiments on the fastMRI multi-coil knee dataset demonstrate significant improvements in sampling efficiency and image reconstruction quality, providing a robust framework for enhancing MRI acquisition through deep learning.

Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality Missingness

Zihan Liang, Ziwen Pan, Ruoxuan Xiong

arxiv logopreprintSep 21 2025
Clinical notes contain rich patient information, such as diagnoses or medications, making them valuable for patient representation learning. Recent advances in large language models have further improved the ability to extract meaningful representations from clinical texts. However, clinical notes are often missing. For example, in our analysis of the MIMIC-IV dataset, 24.5% of patients have no available discharge summaries. In such cases, representations can be learned from other modalities such as structured data, chest X-rays, or radiology reports. Yet the availability of these modalities is influenced by clinical decision-making and varies across patients, resulting in modality missing-not-at-random (MMNAR) patterns. We propose a causal representation learning framework that leverages observed data and informative missingness in multimodal clinical records. It consists of: (1) an MMNAR-aware modality fusion component that integrates structured data, imaging, and text while conditioning on missingness patterns to capture patient health and clinician-driven assignment; (2) a modality reconstruction component with contrastive learning to ensure semantic sufficiency in representation learning; and (3) a multitask outcome prediction model with a rectifier that corrects for residual bias from specific modality observation patterns. Comprehensive evaluations across MIMIC-IV and eICU show consistent gains over the strongest baselines, achieving up to 13.8% AUC improvement for hospital readmission and 13.1% for ICU admission.

An attention aided wavelet convolutional neural network for lung nodule characterization.

Halder A

pubmed logopapersSep 21 2025
Lung cancer is a leading cause of cancer-related mortality worldwide, necessitating the development of accurate and efficient diagnostic methods. Early detection and accurate characterization of pulmonary nodules significantly influence patient prognosis and treatment planning and can improve the five-year survival rate. However, distinguishing benign from malignant nodules using conventional imaging techniques remain a clinical challenge due to subtle structural similarities. Therefore, to address this issue, this study proposes a novel two-pathway wavelet-based deep learning computer-aided diagnosis (CADx) framework forimproved lung nodule classification using high-resolution computed tomography (HRCT) images. The proposed Wavelet-based Lung Cancer Detection Network (WaveLCDNet) is capable of characterizing lung nodules images through a hierarchical feature extraction pipeline consisting of convolutional neural network (CNN) blocks and trainable wavelet blocks for multi-resolution analysis. The introduced wavelet block can capture both spatial and frequency-domain information, preserving fine-grained texture details essential for nodule characterization. Additionally, in this work, convolutional block attention module (CBAM) based attention mechanism has been introduced to enhance discriminative feature learning. The extracted features from both pathways are adaptively fused and processed using global average pooling (GAP) operation. The introduced WaveLCDNet is trained and evaluated on the publicly accessible LIDC-IDRI dataset and achieved sensitivity, specificity, accuracy of 96.89%, 95.52%, and 96.70% for nodule characterization. In addition, the developed framework was externally validated on the Kaggle DSB2017 test dataset, achieving 95.90% accuracy with a Brier Score of 0.0215 for lung nodule characterization, reinforcing its reliability across independent imaging sources and its practical value for integration into real-world diagnostic workflows. By effectively combining multi-scale convolutional filtering with wavelet-based multi-resolution analysisand attention mechanisms, the introduced framework outperforms different recent most state-of-the-art deep learning models and offers a promising CADx solution forenhancing lung cancer screening early diagnosis in clinical settings.

Machine learning and deep learning approaches in MRI for quantifying and staging fatty liver disease: A systematic review.

Elhaie M, Koozari A, Koohi H, Alqurain QT

pubmed logopapersSep 20 2025
Fatty liver disease, encompassing non-alcoholic fatty liver disease (NAFLD) and alcohol-related liver disease (ALD), affects ∼25% of adults globally. Magnetic resonance imaging (MRI), particularly proton density fat fraction (PDFF), is the non-invasive gold standard for hepatic steatosis quantification, but its clinical use is limited by cost, protocol variability, a analysis time. Machine learning (ML) and deep learning (DL) techniques, including convolutional neural networks (CNNs) and generative adversarial networks (GANs), show promise in enhancing MRI-based quantification and staging. To systematically review the diagnostic accuracy, reproducibility, and clinical utility of ML and DL techniques applied to MRI for quantifying and staging hepatic steatosis in fatty liver disease. This systematic review was registered in PROSPERO (CRD420251121056) and adhered to PRISMA guidelines, searching PubMed, Cochrane Library, Scopus, IEEE Xplore, Web of Science, Google Scholar, and grey literature for studies on ML/DL applications in MRI for fatty liver disease. Eligible studies involved human participants with suspected/confirmed NAFLD, NASH, or ALD, using ML/DL (e.g., CNNs, GANs) on MRI data (e.g., PDFF, Dixon MRI). Outcomes included diagnostic accuracy (sensitivity, specificity, area under the curve (AUC)), reproducibility (intraclass correlation coefficient (ICC), Dice), and clinical utility (e.g., treatment planning). Two reviewers screened studies, extracted data, and assessed risk of bias using QUADAS-2. Narrative synthesis and meta-analysis (where feasible) were conducted. From 2550 records, 15 studies (n = 25-1038) were included, using CNNs, GANs, radiomics, and dictionary learning on PDFF, chemical shift-encoded MRI, or Dixon MRI. Diagnostic accuracy was high (AUC 0.85-0.97, r = 0.94-0.99 vs. biopsy/MRS), with reproducibility metrics robust (ICC 0.94-0.99, Dice 0.87-0.94). Efficiency improved significantly (e.g., processing <0.16 s/slice, scan time <1 min). Clinical utility included virtual biopsies, surgical planning, and treatment monitoring. Limitations included small sample sizes, single-center designs, and vendor variability. ML and DL enhance MRI-based hepatic steatosis assessment, offering high accuracy, reproducibility, and efficiency. CNNs excel in segmentation/PDFF quantification, while GANs and radiomics aid free-breathing MRI and NASH staging. Multi-center studies and standardization are needed for clinical integration.

Radiologist Interaction with AI-Generated Preliminary Reports: A Longitudinal Multi-Reader Study.

Hong EK, Suh CH, Nukala M, Esfahani A, Licaros A, Madan R, Hunsaker A, Hammer M

pubmed logopapersSep 20 2025
To investigate the integration of multimodal AI-generated reports into radiology workflow over time, focusing on their impact on efficiency, acceptability, and report quality. A multicase, multireader study involved 756 publicly available chest radiographs interpreted by five radiologists using preliminary reports generated by a radiology-specific multimodal AI model, divided into seven sequential batches of 108 radiographs each. Two thoracic radiologists assessed the final reports using RADPEER criteria for agreement and 5-point Likert scale for quality. Reading times, rate of acceptance without modification, agreement, and quality scores were measured, with statistical analyses evaluating trends across seven sequential batches. Radiologists' reading times for chest radiographs decreased from 25.8 seconds in Batch 1 to 19.3 seconds in Batch 7 (p < .001). Acceptability increased from 54.6% to 60.2% (p < .001), with normal chest radiographs demonstrating high rates (68.9%) compared to abnormal chest radiographs (52.6%; p < .001). Median agreement and quality scores remained stable for normal chest radiographs but varied significantly for abnormal chest radiographs (ps < .05). The introduction of AI-generated reports improved efficiency of chest radiograph interpretation, acceptability increased over time. However, agreement and quality scores showed variability, particularly in abnormal cases, emphasizing the need for oversight in the interpretation of complex chest radiographs.

Comparison of Prostate-Specific Membrane Antigen Positron Emission Tomography and Conventional Imaging Modalities in the Detection of Biochemical Recurrence of Prostate Cancer and Assessment of the Role of Artificial Intelligence: A Systematic Review and Meta-analysis.

Zhang H, Xie C, Huang C, Jiang Z, Tang Q

pubmed logopapersSep 20 2025
We conducted a systematic review and meta-analysis to assess and compare the diagnostic performance of prostate-specific membrane antigen positron-emission tomography (PSMA PET) with conventional imaging modalities in detecting biochemical recurrence of prostate cancer, and to assess the role of artificial intelligence in this context. A comprehensive search of PubMed, Embase, Web of Science, the Cochrane Library, and Scopus was conducted for studies, initially on May 7, 2025, and updated on July 28, 2025. Studies that compared PSMA PET with conventional imaging and assessed artificial intelligence for detecting biochemical recurrence of prostate cancer were considered. The QUADAS-2 technique was employed to evaluate study quality. Diagnosis accuracy and detection rates were aggregated utilizing a bivariate random-effects model. A total of 7637 patients from 67 studies were included. PSMA PET demonstrated significantly higher overall diagnostic accuracy for biochemical recurrence of prostate cancer compared to mpMRI, CT, and AI test sets, with accuracy values of (0.89 vs. 0.71, 0.45, and 0.76, P<0.01). For local recurrence, mpMRI outperformed PSMA PET and CT (0.93 vs. 0.84 and 0.77, P<0.01). PSMA PET was superior in detecting lymph node metastasis than mpMRI and CT (0.89 vs. 0.79 and 0.72, P<0.05). For bone metastasis, PSMA PET outperformed mpMRI, CT, and Bone scan (0.95 vs. 0.85, 0.81, and 0.80, P<0.05). For visceral metastasis, PSMA PET outperformed mpMRI (0.96 vs. 0.89, P=0.23), and CT (0.96 vs. 0.78, P<0.05). 21 studies involving 3113 samples were included to evaluate the performance of artificial intelligence in detecting biochemical recurrence of prostate cancer. The pooled sensitivity, specificity, DOR, and AUC of AI test sets in detecting biochemical recurrence of prostate cancer were 0.77, 0.76, 10.39, and 0.79. Heterogeneity limits the generalizability of our findings. PSMA PET outperformed mpMRI and CT in detecting overall, local recurrence, bone, and visceral metastasis, while mpMRI was more effective for local recurrence. While AI exhibits potential diagnostic efficacy. Despite promising results, heterogeneity and limited validation highlight the need for further research to support routine clinical use.
Page 17 of 1401395 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.