Sort by:
Page 19 of 51504 results

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination

Hirano, Y., Miki, S., Yamagishi, Y., Hanaoka, S., Nakao, T., Kikuchi, T., Nakamura, Y., Nomura, Y., Yoshikawa, T., Abe, O.

medrxiv logopreprintJun 23 2025
PurposeTo assess and compare the accuracy and legitimacy of multimodal large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE). Materials and methodsThe dataset comprised questions from JDRBE 2021, 2023, and 2024, with ground-truth answers established through consensus among multiple board-certified diagnostic radiologists. Questions without associated images and those lacking unanimous agreement on answers were excluded. Eight LLMs were evaluated: GPT-4 Turbo, GPT-4o, GPT-4.5, GPT-4.1, o3, o4-mini, Claude 3.7 Sonnet, and Gemini 2.5 Pro. Each model was evaluated under two conditions: with inputting images (vision) and without (text-only). Performance differences between the conditions were assessed using McNemars exact test. Two diagnostic radiologists (with 2 and 18 years of experience) independently rated the legitimacy of responses from four models (GPT-4 Turbo, Claude 3.7 Sonnet, o3, and Gemini 2.5 Pro) using a five-point Likert scale, blinded to model identity. Legitimacy scores were analyzed using Friedmans test, followed by pairwise Wilcoxon signed-rank tests with Holm correction. ResultsThe dataset included 233 questions. Under the vision condition, o3 achieved the highest accuracy at 72%, followed by o4-mini (70%) and Gemini 2.5 Pro (70%). Under the text-only condition, o3 topped the list with an accuracy of 67%. Addition of image input significantly improved the accuracy of two models (Gemini 2.5 Pro and GPT-4.5), but not the others. Both o3 and Gemini 2.5 Pro received significantly higher legitimacy scores than GPT-4 Turbo and Claude 3.7 Sonnet from both raters. ConclusionRecent multimodal LLMs, particularly o3 and Gemini 2.5 Pro, have demonstrated remarkable progress on JDRBE questions, reflecting their rapid evolution in diagnostic radiology. Secondary abstract Eight multimodal large language models were evaluated on the Japan Diagnostic Radiology Board Examination. OpenAIs o3 and Google DeepMinds Gemini 2.5 Pro achieved high accuracy rates (72% and 70%) and received good legitimacy scores from human raters, demonstrating steady progress.

STACT-Time: Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification

Irsyad Adam, Tengyue Zhang, Shrayes Raman, Zhuyu Qiu, Brandon Taraku, Hexiang Feng, Sile Wang, Ashwath Radhachandran, Shreeram Athreya, Vedrana Ivezic, Peipei Ping, Corey Arnold, William Speier

arxiv logopreprintJun 22 2025
Thyroid cancer is among the most common cancers in the United States. Thyroid nodules are frequently detected through ultrasound (US) imaging, and some require further evaluation via fine-needle aspiration (FNA) biopsy. Despite its effectiveness, FNA often leads to unnecessary biopsies of benign nodules, causing patient discomfort and anxiety. To address this, the American College of Radiology Thyroid Imaging Reporting and Data System (TI-RADS) has been developed to reduce benign biopsies. However, such systems are limited by interobserver variability. Recent deep learning approaches have sought to improve risk stratification, but they often fail to utilize the rich temporal and spatial context provided by US cine clips, which contain dynamic global information and surrounding structural changes across various views. In this work, we propose the Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification (STACT-Time) model, a novel representation learning framework that integrates imaging features from US cine clips with features from segmentation masks automatically generated by a pretrained model. By leveraging self-attention and cross-attention mechanisms, our model captures the rich temporal and spatial context of US cine clips while enhancing feature representation through segmentation-guided learning. Our model improves malignancy prediction compared to state-of-the-art models, achieving a cross-validation precision of 0.91 (plus or minus 0.02) and an F1 score of 0.89 (plus or minus 0.02). By reducing unnecessary biopsies of benign nodules while maintaining high sensitivity for malignancy detection, our model has the potential to enhance clinical decision-making and improve patient outcomes.

Decoding Federated Learning: The FedNAM+ Conformal Revolution

Sree Bhargavi Balija, Amitash Nanda, Debashis Sahoo

arxiv logopreprintJun 22 2025
Federated learning has significantly advanced distributed training of machine learning models across decentralized data sources. However, existing frameworks often lack comprehensive solutions that combine uncertainty quantification, interpretability, and robustness. To address this, we propose FedNAM+, a federated learning framework that integrates Neural Additive Models (NAMs) with a novel conformal prediction method to enable interpretable and reliable uncertainty estimation. Our method introduces a dynamic level adjustment technique that utilizes gradient-based sensitivity maps to identify key input features influencing predictions. This facilitates both interpretability and pixel-wise uncertainty estimates. Unlike traditional interpretability methods such as LIME and SHAP, which do not provide confidence intervals, FedNAM+ offers visual insights into prediction reliability. We validate our approach through experiments on CT scan, MNIST, and CIFAR datasets, demonstrating high prediction accuracy with minimal loss (e.g., only 0.1% on MNIST), along with transparent uncertainty measures. Visual analysis highlights variable uncertainty intervals, revealing low-confidence regions where model performance can be improved with additional data. Compared to Monte Carlo Dropout, FedNAM+ delivers efficient and global uncertainty estimates with reduced computational overhead, making it particularly suitable for federated learning scenarios. Overall, FedNAM+ provides a robust, interpretable, and computationally efficient framework that enhances trust and transparency in decentralized predictive modeling.

The future of biomarkers for vascular contributions to cognitive impairment and dementia (VCID): proceedings of the 2025 annual workshop of the Albert research institute for white matter and cognition.

Lennon MJ, Karvelas N, Ganesh A, Whitehead S, Sorond FA, Durán Laforet V, Head E, Arfanakis K, Kolachalama VB, Liu X, Lu H, Ramirez J, Walker K, Weekman E, Wellington CL, Winston C, Barone FC, Corriveau RA

pubmed logopapersJun 21 2025
Advances in biomarkers and pathophysiology of vascular contributions to cognitive impairment and dementia (VCID) are expected to bring greater mechanistic insights, more targeted treatments, and potentially disease-modifying therapies. The 2025 Annual Workshop of the Albert Research Institute for White Matter and Cognition, sponsored by the Leo and Anne Albert Charitable Trust since 2015, focused on novel biomarkers for VCID. The meeting highlighted the complexity of dementia, emphasizing that the majority of cases involve multiple brain pathologies, with vascular pathology typically present. Potential novel approaches to diagnosis of disease processes and progression that may result in VCID included measures of microglial senescence and retinal changes, as well as artificial intelligence (AI) integration of multimodal datasets. Proteomic studies identified plasma proteins associated with cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL; a rare genetic disorder affecting brain vessels) and age-related vascular pathology that suggested potential therapeutic targets. Blood-based microglial and brain-derived extracellular vesicles are promising tools for early detection of brain inflammation and other changes that have been associated with cognitive decline. Imaging measures of blood perfusion, oxygen extraction, and cerebrospinal fluid (CSF) flow were discussed as potential VCID biomarkers, in part because of correlations with classic pathological Alzheimer's disease (AD) biomarkers. MRI-visible perivascular spaces, which may be a novel imaging biomarker of sleep-driven glymphatic waste clearance dysfunction, are associated with vascular risk factors, lower cognitive function, and various brain pathologies including Alzheimer's, Parkinson's and cerebral amyloid angiopathy (CAA). People with Down syndrome are at high risk for dementia. Individuals with Down syndrome who develop dementia almost universally experience mixed brain pathologies, with AD pathology and cerebrovascular pathology being the most common. This follows the pattern in the general population where mixed pathologies are also predominant in the brains of people clinically diagnosed with dementia, including AD dementia. Intimate partner violence-related brain injury, hypertension's impact on dementia risk, and the promise of remote ischemic conditioning for treating VCID were additional themes.

Automated detection and classification of osteolytic lesions in panoramic radiographs using CNNs and vision transformers.

van Nistelrooij N, Ghanad I, Bigdeli AK, Thiem DGE, von See C, Rendenbach C, Maistreli I, Xi T, Bergé S, Heiland M, Vinayahalingam S, Gaudin R

pubmed logopapersJun 21 2025
Diseases underlying osteolytic lesions in jaws are characterized by the absorption of bone tissue and are often asymptomatic, delaying their diagnosis. Well-defined lesions (benign cyst-like lesions) and ill-defined lesions (osteomyelitis or malignancy) can be detected early in a panoramic radiograph (PR) by an experienced examiner, but most dentists lack appropriate training. To support dentists, this study aimed to develop and evaluate deep learning models for the detection of osteolytic lesions in PRs. A dataset of 676 PRs (165 well-defined, 181 ill-defined, 330 control) was collected from the Department of Oral and Maxillofacial Surgery at Charité Berlin, Germany. The osteolytic lesions were pixel-wise segmented and labeled as well-defined or ill-defined. Four model architectures for instance segmentation (Mask R-CNN with a Swin-Tiny or ResNet-50 backbone, Mask DINO, and YOLOv5) were employed with five-fold cross-validation. Their effectiveness was evaluated with sensitivity, specificity, F1-score, and AUC and failure cases were shown. Mask R-CNN with a Swin-Tiny backbone was most effective (well-defined F1 = 0.784, AUC = 0.881; ill-defined F1 = 0.904, AUC = 0.971) and the model architectures including vision transformer components were more effective than those without. Model mistakes were observed around the maxillary sinus, at tooth extraction sites, and for radiolucent bands. Promising deep learning models were developed for the detection of osteolytic lesions in PRs, particularly those with vision transformer components (Mask R-CNN with Swin-Tiny and Mask DINO). These results underline the potential of vision transformers for enhancing the automated detection of osteolytic lesions, offering a significant improvement over traditional deep learning models.

Development of Radiomics-Based Risk Prediction Models for Stages of Hashimoto's Thyroiditis Using Ultrasound, Clinical, and Laboratory Factors.

Chen JH, Kang K, Wang XY, Chi JN, Gao XM, Li YX, Huang Y

pubmed logopapersJun 21 2025
To develop a radiomics risk-predictive model for differentiating the different stages of Hashimoto's thyroiditis (HT). Data from patients with HT who underwent definitive surgical pathology between January 2018 and December 2023 were retrospectively collected and categorized into early HT (HT patients with simple positive antibodies or simultaneously accompanied by elevated thyroid hormones) and late HT (HT patients with positive antibodies and beginning to present subclinical hypothyroidism or developing hypothyroidism). Ultrasound images and five clinical and 12 laboratory indicators were obtained. Six classifiers were used to construct radiomics models. The gradient boosting decision tree (GBDT) classifier was used to screen for the best features to explore the main risk factors for differentiating early HT. The performance of each model was evaluated by receiver operating characteristic (ROC) curve. The model was validated using one internal and two external test cohorts. A total of 785 patients were enrolled. Extreme gradient boosting (XGBOOST) showed best performance in the training cohort, with an AUC of 0.999 (0.998, 1), and AUC values of 0.993 (0.98, 1), 0.947 (0.866, 1), and 0.98 (0.939, 1), respectively, in the internal test, first external, and second external cohorts. Ultrasound radiomic features contributed to 78.6% (11/14) of the model. The first-order feature of traverse section of thyroid ultrasound image, texture feature gray-level run length matrix (GLRLM) of longitudinal section of thyroid ultrasound image and free thyroxine showed the greatest contributions in the model. Our study developed and tested a risk-predictive model that effectively differentiated HT stages to more precisely and actively manage patients with HT at an earlier stage.

OpenMAP-BrainAge: Generalizable and Interpretable Brain Age Predictor

Pengyu Kan, Craig Jones, Kenichi Oishi

arxiv logopreprintJun 21 2025
Purpose: To develop an age prediction model which is interpretable and robust to demographic and technological variances in brain MRI scans. Materials and Methods: We propose a transformer-based architecture that leverages self-supervised pre-training on large-scale datasets. Our model processes pseudo-3D T1-weighted MRI scans from three anatomical views and incorporates brain volumetric information. By introducing a stem architecture, we reduce the conventional quadratic complexity of transformer models to linear complexity, enabling scalability for high-dimensional MRI data. We trained our model on ADNI2 $\&$ 3 (N=1348) and OASIS3 (N=716) datasets (age range: 42 - 95) from the North America, with an 8:1:1 split for train, validation and test. Then, we validated it on the AIBL dataset (N=768, age range: 60 - 92) from Australia. Results: We achieved an MAE of 3.65 years on ADNI2 $\&$ 3 and OASIS3 test set and a high generalizability of MAE of 3.54 years on AIBL. There was a notable increase in brain age gap (BAG) across cognitive groups, with mean of 0.15 years (95% CI: [-0.22, 0.51]) in CN, 2.55 years ([2.40, 2.70]) in MCI, 6.12 years ([5.82, 6.43]) in AD. Additionally, significant negative correlation between BAG and cognitive scores was observed, with correlation coefficient of -0.185 (p < 0.001) for MoCA and -0.231 (p < 0.001) for MMSE. Gradient-based feature attribution highlighted ventricles and white matter structures as key regions influenced by brain aging. Conclusion: Our model effectively fused information from different views and volumetric information to achieve state-of-the-art brain age prediction accuracy, improved generalizability and interpretability with association to neurodegenerative disorders.

Robust Radiomic Signatures of Intervertebral Disc Degeneration from MRI.

McSweeney T, Tiulpin A, Kowlagi N, Määttä J, Karppinen J, Saarakkala S

pubmed logopapersJun 20 2025
A retrospective analysis. The aim of this study was to identify a robust radiomic signature from deep learning segmentations for intervertebral disc (IVD) degeneration classification. Low back pain (LBP) is the most common musculoskeletal symptom worldwide and IVD degeneration is an important contributing factor. To improve the quantitative phenotyping of IVD degeneration from T2-weighted magnetic resonance imaging (MRI) and better understand its relationship with LBP, multiple shape and intensity features have been investigated. IVD radiomics have been less studied but could reveal sub-visual imaging characteristics of IVD degeneration. We used data from Northern Finland Birth Cohort 1966 members who underwent lumbar spine T2-weighted MRI scans at age 45-47 (n=1397). We used a deep learning model to segment the lumbar spine IVDs and extracted 737 radiomic features, as well as calculating IVD height index and peak signal intensity difference. Intraclass correlation coefficients across image and mask perturbations were calculated to identify robust features. Sparse partial least squares discriminant analysis was used to train a Pfirrmann grade classification model. The radiomics model had balanced accuracy of 76.7% (73.1-80.3%) and Cohen's Kappa of 0.70 (0.67-0.74), compared to 66.0% (62.0-69.9%) and 0.55 (0.51-0.59) for an IVD height index and peak signal intensity model. 2D sphericity and interquartile range emerged as radiomics-based features that were robust and highly correlated to Pfirrmann grade (Spearman's correlation coefficients of -0.72 and -0.77 respectively). Based on our findings these radiomic signatures could serve as alternatives to the conventional indices, representing a significant advance in the automated quantitative phenotyping of IVD degeneration from standard-of-care MRI.

Detection of breast cancer using fractional discrete sinc transform based on empirical Fourier decomposition.

Azmy MM

pubmed logopapersJun 20 2025
Breast cancer is the most common cause of death among women worldwide. Early detection of breast cancer is important; for saving patients' lives. Ultrasound and mammography are the most common noninvasive methods for detecting breast cancer. Computer techniques are used to help physicians diagnose cancer. In most of the previous studies, the classification parameter rates were not high enough to achieve the correct diagnosis. In this study, new approaches were applied to detect breast cancer images from three databases. The programming software used to extract features from the images was MATLAB R2022a. Novel approaches were obtained using new fractional transforms. These fractional transforms were deduced from the fraction Fourier transform and novel discrete transforms. The novel discrete transforms were derived from discrete sine and cosine transforms. The steps of the approaches were described below. First, fractional transforms were applied to the breast images. Then, the empirical Fourier decomposition (EFD) was obtained. The mean, variance, kurtosis, and skewness were subsequently calculated. Finally, RNN-BILSTM (recurrent neural network-bidirectional-long short-term memory) was used as a classification phase. The proposed approaches were compared to obtain the highest accuracy rate during the classification phase based on different fractional transforms. The highest accuracy rate was obtained when the fractional discrete sinc transform of approach 4 was applied. The area under the receiver operating characteristic curve (AUC) was 1. The accuracy, sensitivity, specificity, precision, G-mean, and F-measure rates were 100%. If traditional machine learning methods, such as support vector machines (SVMs) and artificial neural networks (ANNs), were used, the classification parameter rates would be low. Therefore, the fourth approach used RNN-BILSTM to extract the features of breast images perfectly. This approach can be programed on a computer to help physicians correctly classify breast images.

Generalizable model to predict new or progressing compression fractures in tumor-infiltrated thoracolumbar vertebrae in an all-comer population.

Flores A, Nitturi V, Kavoussi A, Feygin M, Andrade de Almeida RA, Ramirez Ferrer E, Anand A, Nouri S, Allam AK, Ricciardelli A, Reyes G, Reddy S, Rampalli I, Rhines L, Tatsui CE, North RY, Ghia A, Siewerdsen JH, Ropper AE, Alvarez-Breckenridge C

pubmed logopapersJun 20 2025
Neurosurgical evaluation is required in the setting of spinal metastases at high risk for leading to a vertebral body fracture. Both irradiated and nonirradiated vertebrae are affected. Understanding fracture risk is critical in determining management, including follow-up timing and prophylactic interventions. Herein, the authors report the results of a machine learning model that predicts the development or progression of a pathological vertebral compression fracture (VCF) in metastatic tumor-infiltrated thoracolumbar vertebrae in an all-comer population. A multi-institutional all-comer cohort of patients with tumor containing vertebral levels spanning T1 through L5 and at least 1 year of follow-up was included in the study. Clinical features of the patients, diseases, and treatments were collected. CT radiomic features of the vertebral bodies were extracted from tumor-infiltrated vertebrae that did or did not subsequently fracture or progress. Recursive feature elimination (RFE) of both radiomic and clinical features was performed. The resulting features were used to create a purely clinical model, purely radiomic model, and combined clinical-radiomic model. A Spine Instability Neoplastic Score (SINS) model was created for a baseline performance comparison. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity (with 95% confidence intervals) with tenfold cross-validation. Within 1 year from initial CT, 123 of 977 vertebrae developed VCF. Selected clinical features included SINS, SINS component for < 50% vertebral body collapse, SINS component for "none of the prior 3" (i.e., "none of the above" on the SINS component for vertebral body involvement), histology, age, and BMI. Of the 2015 radiomic features, RFE selected 19 to be used in the pure radiomic model and the combined clinical-radiomic model. The best performing model was a random forest classifier using both clinical and radiomic features, demonstrating an AUROC of 0.86 (95% CI 0.82-0.9), sensitivity of 0.78 (95% CI 0.70-0.84), and specificity of 0.80 (95% CI 0.77-0.82). This performance was significantly higher than the best SINS-alone model (AUROC 0.75, 95% CI 0.70-0.80) and outperformed the clinical-only model but not in a statistically significant manner (AUROC 0.82, 95% CI 0.77-0.87). The authors developed a clinically generalizable machine learning model to predict the risk of a new or progressing VCF in an all-comer population. This model addresses limitations from prior work and was trained on the largest cohort of patients and vertebrae published to date. If validated, the model could lead to more consistent and systematic identification of high-risk vertebrae, resulting in faster, more accurate triage of patients for optimal management.
Page 19 of 51504 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.