Sort by:
Page 307 of 4644638 results

PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis

Marzieh Oghbaie, Teresa Araújo, Hrvoje Bogunović

arxiv logopreprintJun 12 2025
Background and Objective: Prototype-based methods improve interpretability by learning fine-grained part-prototypes; however, their visualization in the input pixel space is not always consistent with human-understandable biomarkers. In addition, well-known prototype-based approaches typically learn extremely granular prototypes that are less interpretable in medical imaging, where both the presence and extent of biomarkers and lesions are critical. Methods: To address these challenges, we propose PiPViT (Patch-based Visual Interpretable Prototypes), an inherently interpretable prototypical model for image recognition. Leveraging a vision transformer (ViT), PiPViT captures long-range dependencies among patches to learn robust, human-interpretable prototypes that approximate lesion extent only using image-level labels. Additionally, PiPViT benefits from contrastive learning and multi-resolution input processing, which enables effective localization of biomarkers across scales. Results: We evaluated PiPViT on retinal OCT image classification across four datasets, where it achieved competitive quantitative performance compared to state-of-the-art methods while delivering more meaningful explanations. Moreover, quantitative evaluation on a hold-out test set confirms that the learned prototypes are semantically and clinically relevant. We believe PiPViT can transparently explain its decisions and assist clinicians in understanding diagnostic outcomes. Github page: https://github.com/marziehoghbaie/PiPViT

Score-based Generative Diffusion Models to Synthesize Full-dose FDG Brain PET from MRI in Epilepsy Patients

Jiaqi Wu, Jiahong Ouyang, Farshad Moradi, Mohammad Mehdi Khalighi, Greg Zaharchuk

arxiv logopreprintJun 12 2025
Fluorodeoxyglucose (FDG) PET to evaluate patients with epilepsy is one of the most common applications for simultaneous PET/MRI, given the need to image both brain structure and metabolism, but is suboptimal due to the radiation dose in this young population. Little work has been done synthesizing diagnostic quality PET images from MRI data or MRI data with ultralow-dose PET using advanced generative AI methods, such as diffusion models, with attention to clinical evaluations tailored for the epilepsy population. Here we compared the performance of diffusion- and non-diffusion-based deep learning models for the MRI-to-PET image translation task for epilepsy imaging using simultaneous PET/MRI in 52 subjects (40 train/2 validate/10 hold-out test). We tested three different models: 2 score-based generative diffusion models (SGM-Karras Diffusion [SGM-KD] and SGM-variance preserving [SGM-VP]) and a Transformer-Unet. We report results on standard image processing metrics as well as clinically relevant metrics, including congruency measures (Congruence Index and Congruency Mean Absolute Error) that assess hemispheric metabolic asymmetry, which is a key part of the clinical analysis of these images. The SGM-KD produced the best qualitative and quantitative results when synthesizing PET purely from T1w and T2 FLAIR images with the least mean absolute error in whole-brain specific uptake value ratio (SUVR) and highest intraclass correlation coefficient. When 1% low-dose PET images are included in the inputs, all models improve significantly and are interchangeable for quantitative performance and visual quality. In summary, SGMs hold great potential for pure MRI-to-PET translation, while all 3 model types can synthesize full-dose FDG-PET accurately using MRI and ultralow-dose PET.

AI-based identification of patients who benefit from revascularization: a multicenter study

Zhang, W., Miller, R. J., Patel, K., Shanbhag, A., Liang, J., Lemley, M., Ramirez, G., Builoff, V., Yi, J., Zhou, J., Kavanagh, P., Acampa, W., Bateman, T. M., Di Carli, M. F., Dorbala, S., Einstein, A. J., Fish, M. B., Hauser, M. T., Ruddy, T., Kaufmann, P. A., Miller, E. J., Sharir, T., Martins, M., Halcox, J., Chareonthaitawee, P., Dey, D., Berman, D., Slomka, P.

medrxiv logopreprintJun 12 2025
Background and AimsRevascularization in stable coronary artery disease often relies on ischemia severity, but we introduce an AI-driven approach that uses clinical and imaging data to estimate individualized treatment effects and guide personalized decisions. MethodsUsing a large, international registry from 13 centers, we developed an AI model to estimate individual treatment effects by simulating outcomes under alternative therapeutic strategies. The model was trained on an internal cohort constructed using 1:1 propensity score matching to emulate randomized controlled trials (RCTs), creating balanced patient pairs in which only the treatment strategy--early revascularization (defined as any procedure within 90 days of MPI) versus medical therapy--differed. This design allowed the model to estimate individualized treatment effects, forming the basis for counterfactual reasoning at the patient level. We then derived the AI-REVASC score, which quantifies the potential benefit, for each patient, of early revascularization. The score was validated in the held-out testing cohort using Cox regression. ResultsOf 45,252 patients, 19,935 (44.1%) were female, median age 65 (IQR: 57-73). During a median follow-up of 3.6 years (IQR: 2.7-4.9), 4,323 (9.6%) experienced MI or death. The AI model identified a group (n=1,335, 5.9%) that benefits from early revascularization with a propensity-adjusted hazard ratio of 0.50 (95% CI: 0.25-1.00). Patients identified for early revascularization had higher prevalence of hypertension, diabetes, dyslipidemia, and lower LVEF. ConclusionsThis study pioneers a scalable, data-driven approach that emulates randomized trials using retrospective data. The AI-REVASC score enables precision revascularization decisions where guidelines and RCTs fall short. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=104 SRC="FIGDIR/small/25329295v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): [email protected]@1df75d8org.highwire.dtl.DTLVardef@1b1ce68org.highwire.dtl.DTLVardef@663cdf_HPS_FORMAT_FIGEXP M_FIG C_FIG

A machine learning approach for personalized breast radiation dosimetry in CT: Integrating radiomics and deep neural networks.

Tzanis E, Stratakis J, Damilakis J

pubmed logopapersJun 11 2025
To develop a machine learning-based workflow for patient-specific breast radiation dosimetry in CT. Two hundred eighty-six chest CT examinations, with corresponding right and left breast contours, were retrospectively collected from the radiotherapy department at our institution to develop and validate breast segmentation U-Nets. Additionally, Monte Carlo simulations were performed for each CT scan to determine radiation doses to the breasts. The derived breast doses, along with predictors such as X-ray tube current and radiomic features, were then used to train deep neural networks (DNNs) for breast dose prediction. The breast segmentation models achieved a mean dice similarity coefficient of 0.92, with precision and sensitivity scores above 0.90 for both breasts, indicating high segmentation accuracy. The DNNs demonstrated close alignment with ground truth values, with mean predicted doses of 5.05 ± 0.50 mGy for the right breast and 5.06 ± 0.55 mGy for the left breast, compared to ground truth values of 5.03 ± 0.57 mGy and 5.02 ± 0.61 mGy, respectively. The mean absolute percentage errors were 4.01 % (range: 3.90 %-4.12 %) for the right breast and 4.82 % (range: 4.56 %-5.11 %) for the left breast. The mean inference time was 30.2 ± 4.3 s. Statistical analysis showed no significant differences between predicted and actual doses (p ≥ 0.07). This study presents an automated, machine learning-based workflow for breast radiation dosimetry in CT, integrating segmentation and dose prediction models. The models and code are available at: https://github.com/eltzanis/ML-based-Breast-Radiation-Dosimetry-in-CT.

Efficacy of a large language model in classifying branch-duct intraductal papillary mucinous neoplasms.

Sato M, Yasaka K, Abe S, Kurashima J, Asari Y, Kiryu S, Abe O

pubmed logopapersJun 11 2025
Appropriate categorization based on magnetic resonance imaging (MRI) findings is important for managing intraductal papillary mucinous neoplasms (IPMNs). In this study, a large language model (LLM) that classifies IPMNs based on MRI findings was developed, and its performance was compared with that of less experienced human readers. The medical image management and processing systems of our hospital were searched to identify MRI reports of branch-duct IPMNs (BD-IPMNs). They were assigned to the training, validation, and testing datasets in chronological order. The model was trained on the training dataset, and the best-performing model on the validation dataset was evaluated on the test dataset. Furthermore, two radiology residents (Readers 1 and 2) and an intern (Reader 3) manually sorted the reports in the test dataset. The accuracy, sensitivity, and time required for categorizing were compared between the model and readers. The accuracy of the fine-tuned LLM for the test dataset was 0.966, which was comparable to that of Readers 1 and 2 (0.931-0.972) and significantly better than that of Reader 3 (0.907). The fine-tuned LLM had an area under the receiver operating characteristic curve of 0.982 for the classification of cyst diameter ≥ 10 mm, which was significantly superior to that of Reader 3 (0.944). Furthermore, the fine-tuned LLM (25 s) completed the test dataset faster than the readers (1,887-2,646 s). The fine-tuned LLM classified BD-IPMNs based on MRI findings with comparable performance to that of radiology residents and significantly reduced the time required.

Identification of Atypical Scoliosis Patterns Using X-ray Images Based on Fine-Grained Techniques in Deep Learning.

Chen Y, He Z, Yang KG, Qin X, Lau AY, Liu Z, Lu N, Cheng JC, Lee WY, Chui EC, Qiu Y, Liu X, Chen X, Zhu Z

pubmed logopapersJun 11 2025
Study DesignRetrospective diagnostic study.ObjectivesTo develop a fine-grained classification model based on deep learning using X-ray images, to screen for scoliosis, and further to screen for atypical scoliosis patterns associated with Chiari Malformation type I (CMS).MethodsA total of 508 pairs of coronal and sagittal X-ray images from patients with CMS, adolescent idiopathic scoliosis (AIS), and normal controls (NC) were processed through construction of the ResNet-50 model, including the development of ResNet-50 Coronal, ResNet-50 Sagittal, ResNet-50 Dual, ResNet-50 Concat, and ResNet-50 Bilinear models. Evaluation metrics calculated included accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for both the scoliosis diagnosis system and the CMS diagnosis system, along with the generation of receiver operating characteristic (ROC) curves and heatmaps for CMS diagnosis.ResultsThe classification results for the scoliosis diagnosis system showed that the ResNet-50 Coronal model had the best overall performance. For the CMS diagnosis system, the ResNet-50 Coronal and ResNet-50 Dual models demonstrated optimal performance. Specifically, the ResNet-50 Dual model reached the diagnostic level of senior spine surgeons, and the ResNet-50 Coronal model even surpassed senior surgeons in specificity and PPV. The CMS heatmaps revealed that major classification weights were concentrated on features such as atypical curve types, significant lateral shift of scoliotic segments, longer affected segments, and severe trunk tilt.ConclusionsThe fine-grained classification model based on the ResNet-50 network can accurately screen for atypical scoliosis patterns associated with CMS, highlighting the importance of radiographic features such as atypical curve types in model classification.

Diagnostic accuracy of machine learning-based magnetic resonance imaging models in breast cancer classification: a systematic review and meta-analysis.

Zhang J, Wu Q, Lei P, Zhu X, Li B

pubmed logopapersJun 11 2025
This meta-analysis evaluates the diagnostic accuracy of machine learning (ML)-based magnetic resonance imaging (MRI) models in distinguishing benign from malignant breast lesions and explores factors influencing their performance. A systematic search of PubMed, Embase, Cochrane Library, Scopus, and Web of Science identified 12 eligible studies (from 3,739 records) up to August 2024. Data were extracted to calculate sensitivity, specificity, and area under the curve (AUC) using bivariate models in R 4.4.1. Study quality was assessed via QUADAS-2. Pooled sensitivity and specificity were 0.86 (95% CI: 0.82-0.90) and 0.82 (95% CI: 0.78-0.86), respectively, with an overall AUC of 0.90 (95% CI: 0.85-0.90). Diagnostic odds ratio (DOR) was 39.11 (95% CI: 25.04-53.17). Support vector machine (SVM) classifiers outperformed Naive Bayes, with higher sensitivity (0.88 vs. 0.86) and specificity (0.82 vs. 0.78). Heterogeneity was primarily attributed to MRI equipment (P = 0.037). ML-based MRI models demonstrate high diagnostic accuracy for breast cancer classification, with pooled sensitivity of 0.86 (95% CI: 0.82-0.90), specificity of 0.82 (95% CI: 0.78-0.86), and AUC of 0.90 (95% CI: 0.85-0.90). These results support their clinical utility as screening and diagnostic adjuncts, while highlighting the need for standardized protocols to improve generalizability.

Non-enhanced CT deep learning model for differentiating lung adenocarcinoma from tuberculoma: a multicenter diagnostic study.

Zhang G, Shang L, Li S, Zhang J, Zhang Z, Zhang X, Qian R, Yang K, Li X, Liu Y, Wu Y, Pu H, Cao Y, Man Q, Kong W

pubmed logopapersJun 11 2025
To develop and validate a deep learning model based on three-dimensional features (DL_3D) for distinguishing lung adenocarcinoma (LUAD) from tuberculoma (TBM). A total of 1160 patients were collected from three hospitals. A vision transformer network-based DL_3D model was trained, and its performance in differentiating LUAD from TBM was evaluated using validation and external test sets. The performance of the DL_3D model was compared with that of two-dimensional features (DL_2D), radiomics, and six radiologists. Diagnostic performance was assessed using the area under the receiver operating characteristic curves (AUCs) analysis. The study included 840 patients in the training set (mean age, 54.8 years [range, 19-86 years]; 514 men), 210 patients in the validation set (mean age, 54.3 years [range, 18-86 years]; 128 men), and 110 patients in the external test set (mean age, 54.7 years [range, 22-88 years]; 51 men). In both the validation and external test sets, DL_3D exhibited excellent diagnostic performance (AUCs, 0.895 and 0.913, respectively). In the test set, the DL_3D model showed better performance (AUC, 0.913; 95% CI: 0.854, 0.973) than the DL_2D (AUC, 0.804, 95% CI: 0.722, 0.886; p < 0.001), radiomics (AUC, 0.676, 95% CI: 0.574, 0.777; p < 0.001), and six radiologists (AUCs, 0.692 to 0.810; p value range < 0.001-0.035). The DL_3D model outperforms expert radiologists in distinguishing LUAD from TBM. Question Can a deep learning model perform in differentiating LUAD from TBM on non-enhanced CT images? Findings The DL_3D model demonstrated higher diagnostic performance than the DL_2D model, radiomics model, and six radiologists in differentiating LUAD and TBM. Clinical relevance The DL_3D model could accurately differentiate between LUAD and TBM, which can help clinicians make personalized treatment plans.

A fully open AI foundation model applied to chest radiography.

Ma D, Pang J, Gotway MB, Liang J

pubmed logopapersJun 11 2025
Chest radiography frequently serves as baseline imaging for most lung diseases<sup>1</sup>. Deep learning has great potential for automating the interpretation of chest radiography<sup>2</sup>. However, existing chest radiographic deep learning models are limited in diagnostic scope, generalizability, adaptability, robustness and extensibility. To overcome these limitations, we have developed Ark<sup>+</sup>, a foundation model applied to chest radiography and pretrained by cyclically accruing and reusing the knowledge from heterogeneous expert labels in numerous datasets. Ark<sup>+</sup> excels in diagnosing thoracic diseases. It expands the diagnostic scope and addresses potential misdiagnosis. It can adapt to evolving diagnostic needs and respond to novel diseases. It can learn rare conditions from a few samples and transfer to new diagnostic settings without training. It tolerates data biases and long-tailed distributions, and it supports federated learning to preserve privacy. All codes and pretrained models have been released, so that Ark<sup>+</sup> is open for fine-tuning, local adaptation and improvement. It is extensible to several modalities. Thus, it is a foundation model for medical imaging. The exceptional capabilities of Ark<sup>+</sup> stem from our insight: aggregating various datasets diversifies the patient populations and accrues knowledge from many experts to yield unprecedented performance while reducing annotation costs<sup>3</sup>. The development of Ark<sup>+</sup> reveals that open models trained by accruing and reusing knowledge from heterogeneous expert annotations with a multitude of public (big or small) datasets can surpass the performance of proprietary models trained on large data. We hope that our findings will inspire more researchers to share code and datasets or federate privacy-preserving data to create open foundation models with diverse, global expertise and patient populations, thus accelerating open science and democratizing AI for medicine.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

Li R, Mao S, Zhu C, Yang Y, Tan C, Li L, Mu X, Liu H, Yang Y

pubmed logopapersJun 11 2025
The rapid advancements in natural language processing, particularly the development of large language models (LLMs), have opened new avenues for managing complex clinical text data. However, the inherent complexity and specificity of medical texts present significant challenges for the practical application of prompt engineering in diagnostic tasks. This paper explores LLMs with new prompt engineering technology to enhance model interpretability and improve the prediction performance of pulmonary disease based on a traditional deep learning model. A retrospective dataset including 2965 chest CT radiology reports was constructed. The reports were from 4 cohorts, namely, healthy individuals and patients with pulmonary tuberculosis, lung cancer, and pneumonia. Then, a novel prompt engineering strategy that integrates feature summarization (F-Sum), chain of thought (CoT) reasoning, and a hybrid retrieval-augmented generation (RAG) framework was proposed. A feature summarization approach, leveraging term frequency-inverse document frequency (TF-IDF) and K-means clustering, was used to extract and distill key radiological findings related to 3 diseases. Simultaneously, the hybrid RAG framework combined dense and sparse vector representations to enhance LLMs' comprehension of disease-related text. In total, 3 state-of-the-art LLMs, GLM-4-Plus, GLM-4-air (Zhipu AI), and GPT-4o (OpenAI), were integrated with the prompt strategy to evaluate the efficiency in recognizing pneumonia, tuberculosis, and lung cancer. The traditional deep learning model, BERT (Bidirectional Encoder Representations from Transformers), was also compared to assess the superiority of LLMs. Finally, the proposed method was tested on an external validation dataset consisted of 343 chest computed tomography (CT) report from another hospital. Compared with BERT-based prediction model and various other prompt engineering techniques, our method with GLM-4-Plus achieved the best performance on test dataset, attaining an F1-score of 0.89 and accuracy of 0.89. On the external validation dataset, F1-score (0.86) and accuracy (0.92) of the proposed method with GPT-4o were the highest. Compared to the popular strategy with manually selected typical samples (few-shot) and CoT designed by doctors (F1-score=0.83 and accuracy=0.83), the proposed method that summarized disease characteristics (F-Sum) based on LLM and automatically generated CoT performed better (F1-score=0.89 and accuracy=0.90). Although the BERT-based model got similar results on the test dataset (F1-score=0.85 and accuracy=0.88), its predictive performance significantly decreased on the external validation set (F1-score=0.48 and accuracy=0.78). These findings highlight the potential of LLMs to revolutionize pulmonary disease prediction, particularly in resource-constrained settings, by surpassing traditional models in both accuracy and flexibility. The proposed prompt engineering strategy not only improves predictive performance but also enhances the adaptability of LLMs in complex medical contexts, offering a promising tool for advancing disease diagnosis and clinical decision-making.
Page 307 of 4644638 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.