Sort by:
Page 3 of 4454447 results

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Arvind Murari Vepa, Yannan Yu, Jingru Gan, Anthony Cuturrufo, Weikai Li, Wei Wang, Fabien Scalzo, Yizhou Sun

arxiv logopreprintSep 30 2025
We introduce mpLLM, a prompt-conditioned hierarchical mixture-of-experts (MoE) architecture for visual question answering over multi-parametric 3D brain MRI (mpMRI). mpLLM routes across modality-level and token-level projection experts to fuse multiple interrelated 3D modalities, enabling efficient training without image--report pretraining. To address limited image-text paired supervision, mpLLM integrates a synthetic visual question answering (VQA) protocol that generates medically relevant VQA from segmentation annotations, and we collaborate with medical experts for clinical validation. mpLLM outperforms strong medical VLM baselines by 5.3% on average across multiple mpMRI datasets. Our study features three main contributions: (1) the first clinically validated VQA dataset for 3D brain mpMRI, (2) a novel multimodal LLM that handles multiple interrelated 3D modalities, and (3) strong empirical results that demonstrate the medical utility of our methodology. Ablations highlight the importance of modality-level and token-level experts and prompt-conditioned routing. We have included our source code in the supplementary materials and will release our dataset upon publication.

Optimized T<sub>1</sub>-weighted MP-RAGE MRI of the brain at 0.55 T using variable flip angle coherent gradient echo imaging and deep learning reconstruction.

Bieri O, Nickel MD, Weidensteiner C, Madörin P, Bauman G

pubmed logopapersSep 29 2025
To propose and evaluate an optimized MP-RAGE protocol for rapid T<sub>1</sub>-weighted imaging of the brain at 0.55 T. Incoherent and coherent steady state free precession (SSFP) RAGE kernels with constant and variable excitation angles were investigated in terms of the white matter SNR and the white matter-gray matter signal difference. Potential edge smearing from the transient signal readout was assessed based on a differential point spread function analysis. Finally, the prospects of a deep-learning reconstruction (DLR) method for accelerated MP-RAGE MRI of undersampled data were evaluated for the best performing variant. MP-RAGE imaging with a variable flip angle (vFA) SSFP-FID kernel outperformed all other investigated variants. As compared to the standard MPRAGE sequence using a spoiled gradient echo kernel with constant flip angle, vFA SSFP-FID offered an average gain in the white matter SNR of 21% ± 2% and an average improvement for the white matter-gray matter signal difference for cortical gray matter of 47% ± 7%. The differential point spread function was narrowest for the spoiled gradient echo but slightly increased by 8% for vFA SSFP-FID. For vFA SSFP-FID, DLR offered a considerable decrease in the overall scan time from 5:17 min down to 2:46 min without noticeable image artifacts and degradations. At 0.55 T, a vFA MP-RAGE variant using an SSFP-FID kernel combined with a DLR method offers excellent prospects for rapid T<sub>1</sub>-weighted whole brain imaging in less than 3 min with nearly 1 mm (1.12 × 1.17 × 1.25 mm<sup>3</sup>) isotropic resolution.

A radiomics-based machine learning model and SHAP for predicting spread through air spaces and its prognostic implications in stage I lung adenocarcinoma: a multicenter cohort study.

Wang Y, Liu X, Zhao X, Wang Z, Li X, Sun D

pubmed logopapersSep 29 2025
Despite early detection via low-dose computed tomography and complete surgical resection for early-stage lung adenocarcinoma, postoperative recurrence remains high, particularly in patients with tumor spread through air spaces. A reliable preoperative prediction model is urgently needed to adjust the treatment modality. In this multicenter retrospective study, 609 patients with pathological stage I lung adenocarcinoma from 3 independent centers were enrolled. Regions of interest for the primary tumor and peritumoral areas (extended by three, six, and twelve voxel units) were manually delineated from preoperative CT imaging. Quantitative imaging features were extracted and filtered by correlation analysis and Random forest Ranking to yield 40 candidate features. Fifteen machine learning methods were evaluated, and a ten-fold cross-validated elastic net regression model was selected to construct the radiomics-based prediction model. A clinical model based on five key clinical variables and a combined model integrating imaging and clinical features were also developed. The radiomics model achieved accuracies of 0.801, 0.866, and 0.831 in the training set and two external test sets, with AUC of 0.791, 0.829, and 0.807. In one external test set, the clinical model had an AUC of 0.689, significantly lower than the radiomics model (0.807, p < 0.05). The combined model achieved the highest performance, with AUC of 0.834 in the training set and 0.894 in an external test set (p < 0.01 and p < 0.001, respectively). Interpretability analysis revealed that wavelet-transformed features dominated the model, with the highest contribution from a feature reflecting small high-intensity clusters within the tumor and the second highest from a feature representing low-intensity clusters in the six-voxel peritumoral region. Kaplan-Meier analysis demonstrated that patients with either pathologically confirmed or model-predicted spread had significantly shorter progression-free survival (p < 0.001). Our novel machine learning model, integrating imaging features from both tumor and peritumoral regions, preoperatively predicts tumor spread through air spaces in stage I lung adenocarcinoma. It outperforms traditional clinical models, highlighting the potential of quantitative imaging analysis in personalizing treatment. Future prospective studies and further optimization are warranted.

Convolutional neural network models of structural MRI for discriminating categories of cognitive impairment: a systematic review and meta-analysis.

Dong X, Li Y, Hao J, Zhou P, Yang C, Ai Y, He M, Zhang W, Hu H

pubmed logopapersSep 29 2025
Alzheimer's disease (AD) and mild cognitive impairment (MCI) pose significant challenges to public health and underscore the need for accurate and early diagnostic tools. Structural magnetic resonance imaging (sMRI) combined with advanced analytical techniques like convolutional neural networks (CNNs) seemed to offer a promising avenue for the diagnosis of these conditions. This systematic review and meta-analysis aimed to evaluate the diagnostic performance of CNN algorithms applied to sMRI data in differentiating between AD, MCI, and normal cognition (NC). Following the PRISMA-DTA guidelines, a comprehensive literature search was carried out in PubMed and Web of Science databases for studies published between 2018 and 2024. Studies were included if they employed CNNs for the diagnostic classification of sMRI data from participants with AD, MCI, or NC. The methodological quality of the included studies was assessed using the QUADAS-2 and METRICS tools. Data extraction and statistical analysis were performed to calculate pooled diagnostic accuracy metrics. A total of 21 studies were included in the study, comprising 16,139 participants in the analysis. The pooled sensitivity and specificity of CNN algorithms for differentiating AD from NC were 0.92 and 0.91, respectively. For distinguishing MCI from NC, the pooled sensitivity and specificity were 0.74 and 0.79, respectively. The algorithms also showed a moderate ability to differentiate AD from MCI, with a pooled sensitivity and specificity of 0.73 and 0.79, respectively. In the pMCI versus sMCI classification, a pooled sensitivity was 0.69 and a specificity was 0.81. Heterogeneity across studies was significant, as indicated by meta-regression results. CNN algorithms demonstrated promising diagnostic performance in differentiating AD, MCI, and NC using sMRI data. The highest accuracy was observed in distinguishing AD from NC and the lowest accuracy observed in distinguishing pMCI from sMCI. These findings suggest that CNN-based radiomics has the potential to serve as a valuable tool in the diagnostic armamentarium for neurodegenerative diseases. However, the heterogeneity among studies indicates a need for further methodological refinement and validation. This systematic review was registered in PROSPERO (Registration ID: CRD42022295408).

Novel multi-task learning for Alzheimer's stage classification using hippocampal MRI segmentation, feature fusion, and nomogram modeling.

Hu W, Du Q, Wei L, Wang D, Zhang G

pubmed logopapersSep 29 2025
To develop and validate a comprehensive and interpretable framework for multi-class classification of Alzheimer's disease (AD) progression stages based on hippocampal MRI, integrating radiomic, deep, and clinical features. This retrospective multi-center study included 2956 patients across four AD stages (Non-Demented, Very Mild Demented, Mild Demented, Moderate Demented). T1-weighted MRI scans were processed through a standardized pipeline involving hippocampal segmentation using four models (U-Net, nnU-Net, Swin-UNet, MedT). Radiomic features (n = 215) were extracted using the SERA platform, and deep features (n = 256) were learned using an LSTM network with attention applied to hippocampal slices. Fused features were harmonized with ComBat and filtered by ICC (≥ 0.75), followed by LASSO-based feature selection. Classification was performed using five machine learning models, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron (MLP), and eXtreme Gradient Boosting (XGBoost). Model interpretability was addressed using SHAP, and a nomogram and decision curve analysis (DCA) were developed. Additionally, an end-to-end 3D CNN-LSTM model and two transformer-based benchmarks (Vision Transformer, Swin Transformer) were trained for comparative evaluation. MedT achieved the best hippocampal segmentation (Dice = 92.03% external). Fused features yielded the highest classification performance with XGBoost (external accuracy = 92.8%, AUC = 94.2%). SHAP identified MMSE, hippocampal volume, and APOE ε4 as top contributors. The nomogram accurately predicted early-stage AD with clinical utility confirmed by DCA. The end-to-end model performed acceptably (AUC = 84.0%) but lagged behind the fused pipeline. Statistical tests confirmed significant performance advantages for feature fusion and MedT-based segmentation. This study demonstrates that integrating radiomics, deep learning, and clinical data from hippocampal MRI enables accurate and interpretable classification of AD stages. The proposed framework is robust, generalizable, and clinically actionable, representing a scalable solution for AD diagnostics.

Development of a High-Performance Ultrasound Prediction Model for the Diagnosis of Endometrial Cancer: An Interpretable XGBoost Algorithm Utilizing SHAP Analysis.

Lai H, Wu Q, Weng Z, Lyu G, Yang W, Ye F

pubmed logopapersSep 29 2025
To develop and validate an ultrasonography-based machine learning (ML) model for predicting malignant endometrial and cavitary lesions. This retrospective study was conducted on patients with pathologically confirmed results following transvaginal or transrectal ultrasound from 2021 to 2023. Endometrial ultrasound features were characterized using the International Endometrial Tumor Analysis (IETA) terminology. The dataset was ranomly divided (7:3) into training and validation sets. LASSO (least absolute shrinkage and selection operator) regression was applied for feature selection, and an extreme gradient boosting (XGBoost) model was developed. Performance was assessed via receiver operating characteristic (ROC) analysis, calibration, decision curve analysis, sensitivity, specificity, and accuracy. Among 1080 patients, 6 had a non-measurable endometrium. Of the remaining 1074 cases, 641 were premenopausal and 433 postmenopausal. Performance of the XGBoost model on the test set: The area under the curve (AUC) for the premenopausal group was 0.845 (0.781-0.909), with a relatively low sensitivity (0.588, 0.442-0.722) and a relatively high specificity (0.923, 0.863-0.959); the AUC for the postmenopausal group was 0.968 (0.944-0.992), with both sensitivity (0.895, 0.778-0.956) and specificity (0.931, 0.839-0.974) being relatively high. SHapley Additive exPlanations (SHAP) analysis identified key predictors: endometrial-myometrial junction, endometrial thickness, endometrial echogenicity, color Doppler flow score, and vascular pattern in premenopausal women; endometrial thickness, endometrial-myometrial junction, endometrial echogenicity, and color Doppler flow score in postmenopausal women. The XGBoost-based model exhibited excellent predictive performance, particularly in postmenopausal patients. SHAP analysis further enhances interpretability by identifying key ultrasonographic predictors of malignancy.

Cross-regional radiomics: a novel framework for relationship-based feature extraction with validation in Parkinson's disease motor subtyping.

Hosseini MS, Aghamiri SMR, Panahi M

pubmed logopapersSep 29 2025
Traditional radiomics approaches focus on single-region feature extraction, limiting their ability to capture complex inter-regional relationships crucial for understanding pathophysiological mechanisms in complex diseases. This study introduces a novel cross-regional radiomics framework that systematically extracts relationship-based features between anatomically and functionally connected brain regions. We analyzed T1-weighted magnetic resonance imaging (MRI) data from 140 early-stage Parkinson's disease patients (70 tremor-dominant, 70 postural instability gait difficulty) from the Parkinson's Progression Markers Initiative (PPMI) database across multiple imaging centers. Eight bilateral motor circuit regions (putamen, caudate nucleus, globus pallidus, substantia nigra) were segmented using standardized atlases. Two feature sets were developed: 48 traditional single-region of interest (ROI) features and 60 novel motor-circuit features capturing cross-regional ratios, asymmetry indices, volumetric relationships, and shape distributions. Six feature engineering scenarios were evaluated using center-based 5-fold cross-validation with six machine learning classifiers to ensure robust generalization across different imaging centers. Motor-circuit features demonstrated superior performance compared to single-ROI features across enhanced preprocessing scenarios. Peak performance was achieved with area under the curve (AUC) of 0.821 ± 0.117 versus 0.650 ± 0.220 for single-ROI features (p = 0.0012, Cohen's d = 0.665). Cross-regional ratios, particularly putamen-substantia nigra relationships, dominated the most discriminative features. Motor-circuit features showed superior generalization across multi-center data and better clinical utility through decision curve analysis and calibration curves. The proposed cross-regional radiomics framework significantly outperforms traditional single-region approaches for Parkinson's disease motor subtype classification. This methodology provides a foundation for advancing radiomics applications in complex diseases where inter-regional connectivity patterns are fundamental to pathophysiology.

A machine learning approach for non-invasive PCOS diagnosis from ultrasound and clinical features.

Agirsoy M, Oehlschlaeger MA

pubmed logopapersSep 29 2025
This study investigates the use of machine learning (ML) algorithms to support faster and more accurate diagnosis of polycystic ovary syndrome (PCOS), with a focus on both predictive performance and clinical applicability. Multiple algorithms were evaluated-including Artificial Neural Networks (ANN), Support Vector Machines (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost). XGBoost consistently outperformed the other models and was selected for final development and validation. To align with the Rotterdam criteria, the dataset was structured into three feature categories: clinical, biochemical, and ultrasound (USG) data. The study explored various combinations of these feature subsets to identify the most efficient diagnostic pathways. Feature selection using the chi-square-based SelectKBest method revealed the top 10 predictive features, which were further validated through XGBoost's internal feature importance, SHAP analysis, and expert clinical assessment. The final XGBoost model demonstrated robust performance across multiple feature combinations: • Clinical + USG + AMH: AUC = 0.9947, Precision = 0.9553, F1 Score = 0.9553, Accuracy = 0.9553. • Clinical + USG: AUC = 0.9852, Precision = 0.9583, F1 Score = 0.9388, Accuracy = 0.9384. The most influential features included follicle count on both ovaries, weight gain, Anti-Müllerian Hormone (AMH), hair growth, menstrual irregularity, fast food consumption, pimples, and hair loss, levels. External validation was performed using a publicly available dataset containing 320 instances and 18 diagnostic features. The XGBoost model trained on the top-ranked features achieved perfect performance on the test set (AUC = 1.0, Precision = 1.0, F1 Score = 1.0, Accuracy = 1.0), though further validation is necessary to rule out overfitting or data leakage. These findings suggest that combining clinical and ultrasound features enables highly accurate, non-invasive, and cost-effective PCOS diagnosis. This study demonstrates the potential of ML-driven tools to streamline clinical workflows, reduce reliance on invasive diagnostics, and support early intervention in women's health.

Artificial Intelligence Deep Learning Ultrasound Discrimination of Cosmetic Fillers: A Multicenter Study.

Wortsman X, Lozano M, Rodriguez FJ, Valderrama Y, Ortiz-Orellana G, Zattar L, de Cabo F, Ducati E, Sigrist R, Fontan C, Rezende J, Gonzalez C, Schelke L, Zavariz J, Barrera P, Velthuis P

pubmed logopapersSep 29 2025
Despite the growing use of artificial intelligence (AI) in medicine, imaging, and dermatology, to date, there is no information on the use of AI for discriminating cosmetic fillers on ultrasound (US). An international collaborative group working in dermatologic and esthetic US was formed and worked with the staff of the Department of Computer Science and AI of the Universidad de Granada to gather and process a relevant number of anonymized images. AI techniques based on deep learning (DL) with YOLO (you only look once) architecture, together with a bounding box annotation tool, allowed experts to manually delineate regions of interest for the discrimination of common cosmetic fillers under real-world conditions. A total of 14 physicians from 6 countries participated in the AI study and compiled a final dataset comprising 1432 US images, including HA (hyaluronic acid), PMMA (polymethylmethacrylate), CaHA (calcium hydroxyapatite), and SO (silicone oil) filler cases. The model exhibits robust and consistent classification performance, with an average accuracy of 0.92 ± 0.04 across the cross-validation folds. YOLOv11 demonstrated outstanding performance in the detection of HA and SO, yielding F1 scores of 0.96 ± 0.02 and 0.94 ± 0.04, respectively. On the other hand, CaHA and PMMA show somewhat lower and less consistent performance in terms of precision and recall, with F1-scores around 0.83. AI using YOLOv11 allowed us to discriminate reliably between HA and SO using different complexity high-frequency US devices and operators. Further AI DL-specific work is needed to identify CaHA and PMMA more accurately.

Evaluation of Context-Aware Prompting Techniques for Classification of Tumor Response Categories in Radiology Reports Using Large Language Model.

Park J, Sim WS, Yu JY, Park YR, Lee YH

pubmed logopapersSep 29 2025
Radiology reports are essential for medical decision-making, providing crucial data for diagnosing diseases, devising treatment plans, and monitoring disease progression. While large language models (LLMs) have shown promise in processing free-text reports, research on effective prompting techniques for radiologic applications remains limited. To evaluate the effectiveness of LLM-driven classification based on radiology reports in terms of tumor response category (TRC), and to optimize the model through a comparison of four different prompt engineering techniques for effectively performing this classification task in clinical applications, we included 3062 whole-spine contrast-enhanced magnetic resonance imaging (MRI) radiology reports for prompt engineering and validation. TRCs were labeled by two radiologists based on criteria modified from the Response Evaluation Criteria in Solid Tumors (RECIST) guidelines. The Llama3 instruct model was used to classify TRCs in this study through four different prompts: General, In-Context Learning (ICL), Chain-of-Thought (CoT), and ICL with CoT. AUROC, accuracy, precision, recall, and F1-score were calculated against each prompt and model (8B, 70B) with the test report dataset. The average AUROC for ICL (0.96 internally, 0.93 externally) and ICL with CoT prompts (0.97 internally, 0.94 externally) outperformed other prompts. Error increased with prompt complexity, including 0.8% incomplete sentence errors and 11.3% probability-classification inconsistencies. This study demonstrates that context-aware LLM prompts substantially improved the efficiency and effectiveness of classifying TRCs from radiology reports, despite potential intrinsic hallucinations. While further improvements are required for real-world application, our findings suggest that context-aware prompts have significant potential for segmenting complex radiology reports and enhancing oncology clinical workflows.
Page 3 of 4454447 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.