Sort by:
Page 138 of 1521519 results

CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation

Jiong Wu, Yang Xing, Boxiao Yu, Wei Shao, Kuang Gong

arxiv logopreprintMay 25 2025
Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods. Code and pretrained model are available at: https://github.com/wujiong-hub/CDPDNet.git.

CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation

Jiong Wu, Yang Xing, Boxiao Yu, Wei Shao, Kuang Gong

arxiv logopreprintMay 25 2025
Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods. Code and pretrained model are available at: https://github.com/wujiong-hub/CDPDNet.git.

Integrating Large language models into radiology workflow: Impact of generating personalized report templates from summary.

Gupta A, Hussain M, Nikhileshwar K, Rastogi A, Rangarajan K

pubmed logopapersMay 25 2025
To evaluate feasibility of large language models (LLMs) to convert radiologist-generated report summaries into personalized report templates, and assess its impact on scan reporting time and quality. In this retrospective study, 100 CT scans from oncology patients were randomly divided into two equal sets. Two radiologists generated conventional reports for one set and summary reports for the other, and vice versa. Three LLMs - GPT-4, Google Gemini, and Claude Opus - generated complete reports from the summaries using institution-specific generic templates. Two expert radiologists qualitatively evaluated the radiologist summaries and LLM-generated reports using the ACR RADPEER scoring system, using conventional radiologist reports as reference. Reporting time for conventional versus summary-based reports was compared, and LLM-generated reports were analyzed for errors. Quantitative similarity and linguistic metrics were computed to assess report alignment across models with the original radiologist-generated report summaries. Statistical analyses were performed using Python 3.0 to identify significant differences in reporting times, error rates and quantitative metrics. The average reporting time was significantly shorter for summary method (6.76 min) compared to conventional method (8.95 min) (p < 0.005). Among the 100 radiologist summaries, 10 received RADPEER scores worse than 1, with three deemed to have clinically significant discrepancies. Only one LLM-generated report received a worse RADPEER score than its corresponding summary. Error frequencies among LLM-generated reports showed no significant differences across models, with template-related errors being most common (χ<sup>2</sup> = 1.146, p = 0.564). Quantitative analysis indicated significant differences in similarity and linguistic metrics among the three LLMs (p < 0.05), reflecting unique generation patterns. Summary-based scan reporting along with use of LLMs to generate complete personalized report templates can shorten reporting time while maintaining the report quality. However, there remains a need for human oversight to address errors in the generated reports. Summary-based reporting of radiological studies along with the use of large language models to generate tailored reports using generic templates has the potential to make the workflow more efficient by shortening the reporting time while maintaining the quality of reporting.

Evaluation of synthetic training data for 3D intraoral reconstruction of cleft patients from single images.

Lingens L, Lill Y, Nalabothu P, Benitez BK, Mueller AA, Gross M, Solenthaler B

pubmed logopapersMay 24 2025
This study investigates the effectiveness of synthetic training data in predicting 2D landmarks for 3D intraoral reconstruction in cleft lip and palate patients. We take inspiration from existing landmark prediction and 3D reconstruction techniques for faces and demonstrate their potential in medical applications. We generated both real and synthetic datasets from intraoral scans and videos. A convolutional neural network was trained using a negative-Gaussian log-likelihood loss function to predict 2D landmarks and their corresponding confidence scores. The predicted landmarks were then used to fit a statistical shape model to generate 3D reconstructions from individual images. We analyzed the model's performance on real patient data and explored the dataset size required to overcome the domain gap between synthetic and real images. Our approach generates satisfying results on synthetic data and shows promise when tested on real data. The method achieves rapid 3D reconstruction from single images and can therefore provide significant value in day-to-day medical work. Our results demonstrate that synthetic training data are viable for training models to predict 2D landmarks and reconstruct 3D meshes in patients with cleft lip and palate. This approach offers an accessible, low-cost alternative to traditional methods, using smartphone technology for noninvasive, rapid, and accurate 3D reconstructions in clinical settings.

Deep learning reconstruction combined with contrast-enhancement boost in dual-low dose CT pulmonary angiography: a two-center prospective trial.

Shen L, Lu J, Zhou C, Bi Z, Ye X, Zhao Z, Xu M, Zeng M, Wang M

pubmed logopapersMay 24 2025
To investigate whether the deep learning reconstruction (DLR) combined with contrast-enhancement-boost (CE-boost) technique can improve the diagnostic quality of CT pulmonary angiography (CTPA) at low radiation and contrast doses, compared with routine CTPA using hybrid iterative reconstruction (HIR). This prospective two-center study included 130 patients who underwent CTPA for suspected pulmonary embolism. Patients were randomly divided into two groups: the routine CTPA group, reconstructed using HIR; and the dual-low dose CTPA group, reconstructed using HIR and DLR, additionally combined with the CE-boost to generate HIR-boost and DLR-boost images. Signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of pulmonary arteries were quantitatively assessed. Two experienced radiologists independently ordered CT images (5, best; 1, worst) based on overall image noise and vascular contrast. Diagnostic performance for PE detection was calculated for each dataset. Patient demographics were similar between groups. Compared to HIR images of the routine group, DLR-boost images of the dual-low dose group were significantly better at qualitative scores (p < 0.001). The CT values of pulmonary arteries between the DLR-boost and the HIR images were comparable (p > 0.05), whereas the SNRs and CNRs of pulmonary arteries in the DLR-boost images were the highest among all five datasets (p < 0.001). The AUCs of DLR, HIR-boost, and DLR-boost were 0.933, 0.924, and 0.986, respectively (all p > 0.05). DLR combined with CE-boost technique can significantly improve the image quality of CTPA with reduced radiation and contrast doses, facilitating a more accurate diagnosis of pulmonary embolism. Question The dual-low dose protocol is essential for detecting pulmonary emboli (PE) in follow-up CT pulmonary angiography (PA), yet effective solutions are still lacking. Findings Deep learning reconstruction (DLR)-boost with reduced radiation and contrast doses demonstrated higher quantitative and qualitative image quality than hybrid-iterative reconstruction in the routine CTPA. Clinical relevance DLR-boost based low-radiation and low-contrast-dose CTPA protocol offers a novel strategy to further enhance the image quality and diagnosis accuracy for pulmonary embolism patients.

Evaluation of locoregional invasiveness of early lung adenocarcinoma manifesting as ground-glass nodules via [<sup>68</sup>Ga]Ga-FAPI-46 PET/CT imaging.

Ruan D, Shi S, Guo W, Pang Y, Yu L, Cai J, Wu Z, Wu H, Sun L, Zhao L, Chen H

pubmed logopapersMay 24 2025
Accurate differentiation of the histologic invasiveness of early-stage lung adenocarcinoma is crucial for determining surgical strategies. This study aimed to investigate the potential of [<sup>68</sup>Ga]Ga-FAPI-46 PET/CT in assessing the invasiveness of early lung adenocarcinoma presenting as ground-glass nodules (GGNs) and identifying imaging features with strong predictive potential. This prospective study (NCT04588064) was conducted between July 2020 and July 2022, focusing on GGNs that were confirmed postoperatively to be either invasive adenocarcinoma (IAC), minimally invasive adenocarcinoma (MIA), or precursor glandular lesions (PGL). A total of 45 patients with 53 pulmonary GGNs were included in the study: 19 patients with GGNs associated with PGL-MIA and 34 with IAC. Lung nodules were segmented using the Segment Anything Model in Medical Images (MedSAM) and the PET Tumor Segmentation Extension. Clinical characteristics, along with conventional and high-throughput radiomics features from High-resolution CT (HRCT) and PET scans, were analysed. The predictive performance of these features in differentiating between PGL or MIA (PGL-MIA) and IAC was assessed using 5-fold cross-validation across six machine learning algorithms. Model validation was performed on an independent external test set (n = 11). The Chi-squared, Fisher's exact, and DeLong tests were employed to compare the performance of the models. The maximum standardised uptake value (SUVmax) derived from [<sup>68</sup>Ga]Ga-FAPI-46 PET was identified as an independent predictor of IAC. A cut-off value of 1.82 yielded a sensitivity of 94% (32/34), specificity of 84% (16/19), and an overall accuracy of 91% (48/53) in the training set, while achieving 100% (12/12) accuracy in the external test set. Radiomics-based classification further improved diagnostic performance, achieving a sensitivity of 97% (33/34), specificity of 89% (17/19), accuracy of 94% (50/53), and an area under the receiver operating characteristic curve (AUC) of 0.97 [95% CI: 0.93-1.00]. Compared with the CT-based radiomics model and the PET-based model, the combined PET/CT radiomics model did not show significant improvement in predictive performance. The key predictive feature was [<sup>68</sup>Ga]Ga-FAPI-46 PET log-sigma-7-mm-3D_firstorder_RootMeanSquared. The SUVmax derived from [<sup>68</sup>Ga]Ga-FAPI-46 PET/CT can effectively differentiate the invasiveness of early-stage lung adenocarcinoma manifesting as GGNs. Integrating high-throughput features from [<sup>68</sup>Ga]Ga-FAPI-46 PET/CT images can considerably enhance classification accuracy. NCT04588064; URL: https://clinicaltrials.gov/study/NCT04588064 .

Using machine learning models based on cardiac magnetic resonance parameters to predict the prognostic in children with myocarditis.

Hu D, Cui M, Zhang X, Wu Y, Liu Y, Zhai D, Guo W, Ju S, Fan G, Cai W

pubmed logopapersMay 24 2025
To develop machine learning (ML) models incorporating explanatory cardiac magnetic resonance (CMR) parameters for predicting the prognosis of myocarditis in pediatric patients. 77 patients with pediatric myocarditis diagnosed clinically between January 2020 and December 2023 were enrolled retrospectively. All patients were examined by ultrasound, electrocardiogram (ECG), serum biomarkers on admission, and CMR scan to obtain 16 explanatory CMR parameters. All patients underwent follow-up echocardiography and CMR. Patients were divided into two groups according to the occurrence of adverse cardiac events (ACE) during follow-up: the poor prognosis group (n = 23) and the good prognosis group (n = 54). Four models were established, including logistic regression (LR), random forest (RF), support vector machine classifier (SVC), and extreme gradient boosting (XGBoost) model. The performance of each model was evaluated by the area under the receiver operating characteristic curve (AUC). Model interpretation was generated by Shapley additive interpretation (Shap). Among the four models, the three most important features were late gadolinium enhancement (LGE), left ventricular ejection fraction (LVEF), and SAXPeak Global Circumferential Strain (SAXGCS). In addition, LGE, LVEF, SAXGCS, and LAXPeak Global Longitudinal Strain (LAXGLS) were selected as the key predictors for all four models. Four interpretable CMR parameters were extracted, among which the LR model had the best prediction performance. The AUC, sensitivity, and specificity were 0.893, 0.820, and 0.944, respectively. The findings indicate that the presence of LGE on CMR imaging, along with reductions in LVEF, SAXGCS, and LAXGLS, are predictive of poor prognosis in patients with acute myocarditis. ML models, particularly the LR model, demonstrate the potential to predict the prognosis of children with myocarditis. These findings provide valuable insights for cardiologists, supporting more informed clinical decision-making and potentially enhancing patient outcomes in pediatric myocarditis cases.

Explainable deep learning for age and gender estimation in dental CBCT scans using attention mechanisms and multi task learning.

Pishghadam N, Esmaeilyfard R, Paknahad M

pubmed logopapersMay 24 2025
Accurate and interpretable age estimation and gender classification are essential in forensic and clinical diagnostics, particularly when using high-dimensional medical imaging data such as Cone Beam Computed Tomography (CBCT). Traditional CBCT-based approaches often suffer from high computational costs and limited interpretability, reducing their applicability in forensic investigations. This study aims to develop a multi-task deep learning framework that enhances both accuracy and explainability in CBCT-based age estimation and gender classification using attention mechanisms. We propose a multi-task learning (MTL) model that simultaneously estimates age and classifies gender using panoramic slices extracted from CBCT scans. To improve interpretability, we integrate Convolutional Block Attention Module (CBAM) and Grad-CAM visualization, highlighting relevant craniofacial regions. The dataset includes 2,426 CBCT images from individuals aged 7 to 23 years, and performance is assessed using Mean Absolute Error (MAE) for age estimation and accuracy for gender classification. The proposed model achieves a MAE of 1.08 years for age estimation and 95.3% accuracy in gender classification, significantly outperforming conventional CBCT-based methods. CBAM enhances the model's ability to focus on clinically relevant anatomical features, while Grad-CAM provides visual explanations, improving interpretability. Additionally, using panoramic slices instead of full 3D CBCT volumes reduces computational costs without sacrificing accuracy. Our framework improves both accuracy and interpretability in forensic age estimation and gender classification from CBCT images. By incorporating explainable AI techniques, this model provides a computationally efficient and clinically interpretable tool for forensic and medical applications.

Stroke prediction in elderly patients with atrial fibrillation using machine learning combined clinical and left atrial appendage imaging phenotypic features.

Huang H, Xiong Y, Yao Y, Zeng J

pubmed logopapersMay 24 2025
Atrial fibrillation (AF) is one of the primary etiologies for ischemic stroke, and it is of paramount importance to delineate the risk phenotypes among elderly AF patients and to investigate more efficacious models for predicting stroke risk. This single-center prospective cohort study collected clinical data and cardiac computed tomography angiography (CTA) images from elderly AF patients. The clinical phenotypes and left atrial appendage (LAA) radiomic phenotypes of elderly AF patients were identified through K-means clustering. The independent correlations between these phenotypes and stroke risk were subsequently analyzed. Machine learning algorithms-Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest, and Extreme Gradient Boosting-were selected to develop a predictive model for stroke risk in this patient cohort. The model was assessed using the Area Under the Receiver Operating Characteristic Curve, Hosmer-Lemeshow tests, and Decision Curve Analysis. A total of 419 elderly AF patients (≥ 65 years old) were included. K-means clustering identified three clinical phenotypes: Group A (cardiac enlargement/dysfunction), Group B (normal phenotype), and Group C (metabolic/coagulation abnormalities). Stroke incidence was highest in Group A (19.3%) and Group C (14.5%) versus Group B (3.3%). Similarly, LAA radiomic phenotypes revealed elevated stroke risk in patients with enlarged LAA structure (Group B: 20.0%) and complex LAA morphology (Group C: 14.0%) compared to normal LAA (Group A: 2.9%). Among the five machine learning models, the SVM model achieved superior prediction performance (AUROC: 0.858 [95% CI: 0.830-0.887]). The stroke-risk prediction model for elderly AF patients constructed based on the SVM algorithm has strong predictive efficacy.

Deep learning-based identification of vertebral fracture and osteoporosis in lateral spine radiographs and DXA vertebral fracture assessment to predict incident fracture.

Hong N, Cho SW, Lee YH, Kim CO, Kim HC, Rhee Y, Leslie WD, Cummings SR, Kim KM

pubmed logopapersMay 24 2025
Deep learning (DL) identification of vertebral fractures and osteoporosis in lateral spine radiographs and DXA vertebral fracture assessment (VFA) images may improve fracture risk assessment in older adults. In 26 299 lateral spine radiographs from 9276 individuals attending a tertiary-level institution (60% train set; 20% validation set; 20% test set; VERTE-X cohort), DL models were developed to detect prevalent vertebral fracture (pVF) and osteoporosis. The pre-trained DL models from lateral spine radiographs were then fine-tuned in 30% of a DXA VFA dataset (KURE cohort), with performance evaluated in the remaining 70% test set. The area under the receiver operating characteristics curve (AUROC) for DL models to detect pVF and osteoporosis was 0.926 (95% CI 0.908-0.955) and 0.848 (95% CI 0.827-0.869) from VERTE-X spine radiographs, respectively, and 0.924 (95% CI 0.905-0.942) and 0.867 (95% CI 0.853-0.881) from KURE DXA VFA images, respectively. A total of 13.3% and 13.6% of individuals sustained an incident fracture during a median follow-up of 5.4 years and 6.4 years in the VERTE-X test set (n = 1852) and KURE test set (n = 2456), respectively. Incident fracture risk was significantly greater among individuals with DL-detected vertebral fracture (hazard ratios [HRs] 3.23 [95% CI 2.51-5.17] and 2.11 [95% CI 1.62-2.74] for the VERTE-X and KURE test sets) or DL-detected osteoporosis (HR 2.62 [95% CI 1.90-3.63] and 2.14 [95% CI 1.72-2.66]), which remained significant after adjustment for clinical risk factors and femoral neck bone mineral density. DL scores improved incident fracture discrimination and net benefit when combined with clinical risk factors. In summary, DL-detected pVF and osteoporosis in lateral spine radiographs and DXA VFA images enhanced fracture risk prediction in older adults.
Page 138 of 1521519 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.