Sort by:
Page 14 of 91901 results

SAM-Med3D: A Vision Foundation Model for General-Purpose Segmentation on Volumetric Medical Images.

Wang H, Guo S, Ye J, Deng Z, Cheng J, Li T, Chen J, Su Y, Huang Z, Shen Y, zzzzFu B, Zhang S, He J

pubmed logopapersJul 31 2025
Existing volumetric medical image segmentation models are typically task-specific, excelling at specific targets but struggling to generalize across anatomical structures or modalities. This limitation restricts their broader clinical use. In this article, we introduce segment anything model (SAM)-Med3D, a vision foundation model (VFM) for general-purpose segmentation on volumetric medical images. Given only a few 3-D prompt points, SAM-Med3D can accurately segment diverse anatomical structures and lesions across various modalities. To achieve this, we gather and preprocess a large-scale 3-D medical image segmentation dataset, SA-Med3D-140K, from 70 public datasets and 8K licensed private cases from hospitals. This dataset includes 22K 3-D images and 143K corresponding masks. SAM-Med3D, a promptable segmentation model characterized by its fully learnable 3-D structure, is trained on this dataset using a two-stage procedure and exhibits impressive performance on both seen and unseen segmentation targets. We comprehensively evaluate SAM-Med3D on 16 datasets covering diverse medical scenarios, including different anatomical structures, modalities, targets, and zero-shot transferability to new/unseen tasks. The evaluation demonstrates the efficiency and efficacy of SAM-Med3D, as well as its promising application to diverse downstream tasks as a pretrained model. Our approach illustrates that substantial medical resources can be harnessed to develop a general-purpose medical AI for various potential applications. Our dataset, code, and models are available at: https://github.com/uni-medical/SAM-Med3D.

An interpretable CT-based machine learning model for predicting recurrence risk in stage II colorectal cancer.

Wu Z, Gong L, Luo J, Chen X, Yang F, Wen J, Hao Y, Wang Z, Gu R, Zhang Y, Liao H, Wen G

pubmed logopapersJul 31 2025
This study aimed to develop an interpretable 3-year disease-free survival risk prediction tool to stratify patients with stage II colorectal cancer (CRC) by integrating CT images and clinicopathological factors. A total of 769 patients with pathologically confirmed stage II CRC and disease-free survival (DFS) follow-up information were recruited from three medical centers and divided into training (n = 442), test (n = 190), and validation cohorts (n = 137). CT-based tumor radiomics features were extracted, selected, and used to calculate a Radscore. A combined model was developed using artificial neural network (ANN) algorithm, by integrating the Radscore with significant clinicoradiological factors to classify patients into high- and low-risk groups. Model performance was assessed using the area under the curve (AUC), and feature contributions were qualified using the Shapley additive explanation (SHAP) algorithm. Kaplan-Meier survival analysis revealed the prognostic stratification value of the risk groups. Fourteen radiomics features and five clinicoradiological factors were selected to construct the radiomics and clinicoradiological models, respectively. The combined model demonstrated optimal performance, with AUCs of 0.811 and 0.846 in the test and validation cohorts, respectively. Kaplan-Meier curves confirmed effective patient stratification (p < 0.001) in both test and validation cohorts. A high Radscore, rough intestinal outer edge, and advanced age were identified as key prognostic risk factors using the SHAP. The combined model effectively stratified patients with stage II CRC into different prognostic risk groups, aiding clinical decision-making. Integrating CT images with clinicopathological information can facilitate the identification of patients with stage II CRC who are most likely to benefit from adjuvant chemotherapy. The effectiveness of adjuvant chemotherapy for stage II colorectal cancer remains debated. A combined model successfully identified high-risk stage II colorectal cancer patients. Shapley additive explanations enhance the interpretability of the model's predictions.

Prognostication in patients with idiopathic pulmonary fibrosis using quantitative airway analysis from HRCT: a retrospective study.

Nan Y, Federico FN, Humphries S, Mackintosh JA, Grainge C, Jo HE, Goh N, Reynolds PN, Hopkins PMA, Navaratnam V, Moodley Y, Walters H, Ellis S, Keir G, Zappala C, Corte T, Glaspole I, Wells AU, Yang G, Walsh SL

pubmed logopapersJul 31 2025
Predicting shorter life expectancy is crucial for prioritizing antifibrotic therapy in fibrotic lung diseases, where progression varies widely, from stability to rapid deterioration. This heterogeneity complicates treatment decisions, emphasizing the need for reliable baseline measures. This study focuses on leveraging artificial intelligence model to address heterogeneity in disease outcomes, focusing on mortality as the ultimate measure of disease trajectory. This retrospective study included 1744 anonymised patients who underwent high-resolution CT scanning. The AI model, SABRE (Smart Airway Biomarker Recognition Engine), was developed using data from patients with various lung diseases (n=460, including lung cancer, pneumonia, emphysema, and fibrosis). Then, 1284 high-resolution CT scans with evidence of diffuse FLD from the Australian IPF Registry and OSIC were used for clinical analyses. Airway branches were categorized and quantified by anatomic structures and volumes, followed by multivariable analysis to explore the associations between these categories and patients' progression and mortality, adjusting for disease severity or traditional measurements. Cox regression identified SABRE-based variables as independent predictors of mortality and progression, even adjusting for disease severity (fibrosis extent, traction bronchiectasis extent, and ILD extent), traditional measures (FVC%, DLCO%, and CPI), and previously reported deep learning algorithms for fibrosis quantification and morphological analysis. Combining SABRE with DLCO significantly improved prognosis utility, yielding an AUC of 0.852 at the first year and a C-index of 0.752. SABRE-based variables capture prognostic signals beyond that provided by traditional measurements, disease severity scores, and established AI-based methods, reflecting the progressiveness and pathogenesis of the disease.

Identification and validation of an explainable machine learning model for vascular depression diagnosis in the older adults: a multicenter cohort study.

Zhang R, Li T, Fan F, He H, Lan L, Sun D, Xu Z, Peng S, Cao J, Xu J, Peng X, Lei M, Song H, Zhang J

pubmed logopapersJul 31 2025
Vascular depression (VaDep) is a prevalent affective disorder in older adults that significantly impacts functional status and quality of life. Early identification and intervention are crucial but largely insufficient in clinical practice due to inconspicuous depressive symptoms mostly, heterogeneous imaging manifestations, and the lack of definitive peripheral biomarkers. This study aimed to develop and validate an interpretable machine learning (ML) model for VaDep to serve as a clinical support tool. This study included 602 participants from Wuhan in China divided into 236 VaDep patients and 366 controls for training and internal validation from July 2020 to October 2023. An independent dataset of 171 participants from surrounding areas was used for external validation. We collected clinical data, neuropsychological assessments, blood test results, and MRI scans to develop and refine ML models through cross-validation. Feature reduction was implemented to simplify the models without compromising their performance, with validation achieved through internal and external datasets. The SHapley Additive exPlanations method was used to enhance model interpretability. The Light Gradient Boosting Machine (LGBM) model outperformed from the selected 6 ML algorithms based on performance metrics. An optimized, interpretable LGBM model with 8 key features, including white matter hyperintensities score, age, vascular endothelial growth factor, interleukin-6, brain-derived neurotrophic factor, tumor necrosis factor-alpha levels, lacune counts, and serotonin level, demonstrated high diagnostic accuracy in both internal (AUROC = 0.937) and external (AUROC = 0.896) validations. The final model also achieved, and marginally exceeded, clinician-level diagnostic performance. Our research established a consistent and explainable ML framework for identifying VaDep in older adults, utilizing comprehensive clinical data. The 8 characteristics identified in the final LGBM model provide new insights for further exploration of VaDep mechanisms and emphasize the need for enhanced focus on early identification and intervention in this vulnerable group. More attention needs to be paid to the affective health of older adults.

Out-of-Distribution Detection in Medical Imaging via Diffusion Trajectories

Lemar Abdi, Francisco Caetano, Amaan Valiuddin, Christiaan Viviers, Hamdi Joudeh, Fons van der Sommen

arxiv logopreprintJul 31 2025
In medical imaging, unsupervised out-of-distribution (OOD) detection offers an attractive approach for identifying pathological cases with extremely low incidence rates. In contrast to supervised methods, OOD-based approaches function without labels and are inherently robust to data imbalances. Current generative approaches often rely on likelihood estimation or reconstruction error, but these methods can be computationally expensive, unreliable, and require retraining if the inlier data changes. These limitations hinder their ability to distinguish nominal from anomalous inputs efficiently, consistently, and robustly. We propose a reconstruction-free OOD detection method that leverages the forward diffusion trajectories of a Stein score-based denoising diffusion model (SBDDM). By capturing trajectory curvature via the estimated Stein score, our approach enables accurate anomaly scoring with only five diffusion steps. A single SBDDM pre-trained on a large, semantically aligned medical dataset generalizes effectively across multiple Near-OOD and Far-OOD benchmarks, achieving state-of-the-art performance while drastically reducing computational cost during inference. Compared to existing methods, SBDDM achieves a relative improvement of up to 10.43% and 18.10% for Near-OOD and Far-OOD detection, making it a practical building block for real-time, reliable computer-aided diagnosis.

Consistent Point Matching

Halid Ziya Yerebakan, Gerardo Hermosillo Valadez

arxiv logopreprintJul 31 2025
This study demonstrates that incorporating a consistency heuristic into the point-matching algorithm \cite{yerebakan2023hierarchical} improves robustness in matching anatomical locations across pairs of medical images. We validated our approach on diverse longitudinal internal and public datasets spanning CT and MRI modalities. Notably, it surpasses state-of-the-art results on the Deep Lesion Tracking dataset. Additionally, we show that the method effectively addresses landmark localization. The algorithm operates efficiently on standard CPU hardware and allows configurable trade-offs between speed and robustness. The method enables high-precision navigation between medical images without requiring a machine learning model or training data.

Effectiveness of Radiomics-Based Machine Learning Models in Differentiating Pancreatitis and Pancreatic Ductal Adenocarcinoma: Systematic Review and Meta-Analysis.

Zhang L, Li D, Su T, Xiao T, Zhao S

pubmed logopapersJul 31 2025
Pancreatic ductal adenocarcinoma (PDAC) and mass-forming pancreatitis (MFP) share similar clinical, laboratory, and imaging features, making accurate diagnosis challenging. Nevertheless, PDAC is highly malignant with a poor prognosis, whereas MFP is an inflammatory condition typically responding well to medical or interventional therapies. Some investigators have explored radiomics-based machine learning (ML) models for distinguishing PDAC from MFP. However, systematic evidence supporting the feasibility of these models is insufficient, presenting a notable challenge for clinical application. This study intended to review the diagnostic performance of radiomics-based ML models in differentiating PDAC from MFP, summarize the methodological quality of the included studies, and provide evidence-based guidance for optimizing radiomics-based ML models and advancing their clinical use. PubMed, Embase, Cochrane, and Web of Science were searched for relevant studies up to June 29, 2024. Eligible studies comprised English cohort, case-control, or cross-sectional designs that applied fully developed radiomics-based ML models-including traditional and deep radiomics-to differentiate PDAC from MFP, while also reporting their diagnostic performance. Studies without full text, limited to image segmentation, or insufficient outcome metrics were excluded. Methodological quality was appraised by means of the radiomics quality score. Since the limited applicability of QUADAS-2 in radiomics-based ML studies, the risk of bias was not formally assessed. Pooled sensitivity, specificity, area under the curve of summary receiver operating characteristics (SROC), likelihood ratios, and diagnostic odds ratio were estimated through a bivariate mixed-effects model. Results were presented with forest plots, SROC curves, and Fagan's nomogram. Subgroup analysis was performed to appraise the diagnostic performance of radiomics-based ML models across various imaging modalities, including computed tomography (CT), magnetic resonance imaging, positron emission tomography-CT, and endoscopic ultrasound. This meta-analysis included 24 studies with 14,406 cases, including 7635 PDAC cases. All studies adopted a case-control design, with 5 conducted across multiple centers. Most studies used CT as the primary imaging modality. The radiomics quality score scores ranged from 5 points (14%) to 17 points (47%), with an average score of 9 (25%). The radiomics-based ML models demonstrated high diagnostic performance. Based on the independent validation sets, the pooled sensitivity, specificity, area under the curve of SROC, positive likelihood ratio, negative likelihood ratio, and diagnostic odds ratio were 0.92 (95% CI 0.91-0.94), 0.90 (95% CI 0.85-0.94), 0.94 (95% CI 0.74-0.99), 9.3 (95% CI 6.0-14.2), 0.08 (95% CI 0.07-0.11), and 110 (95% CI 62-194), respectively. Radiomics-based ML models demonstrate high diagnostic accuracy in differentiating PDAC from MFP, underscoring their potential as noninvasive tools for clinical decision-making. Nonetheless, the overall methodological quality was moderate due to limitations in external validation, standardized protocols, and reproducibility. These findings support the promise of radiomics in clinical diagnostics while highlighting the need for more rigorous, multicenter research to enhance model generalizability and clinical applicability.

External Validation of a Winning Artificial Intelligence Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.

Harper JP, Lee GR, Pan I, Nguyen XV, Quails N, Prevedello LM

pubmed logopapersJul 31 2025
The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-the-art performance in the competition's data set, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step toward the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test by using one of the leading algorithms of the competition. The deep learning algorithm was selected due to its performance, portability, and ease of use, and installed locally. One hundred examinations (50 consecutive cervical spine CT scans with at least 1 fracture present and 50 consecutive negative CT scans) from a level 1 trauma center not represented in the competition data set were processed at 6.4 seconds per examination. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and area under the curve were calculated. The external validation data set comprised older patients in comparison to the competition set (53.5 ± 21.8 years versus 58 ± 22.0, respectively; <i>P</i> < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the convolutional neural networks frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment. The model performed with a similar sensitivity on the test and external data set, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.

CX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning

Wenjie Li, Yujie Zhang, Haoran Sun, Yueqi Li, Fanrui Zhang, Mengzhe Xu, Victoria Borja Clausich, Sade Mellin, Renhao Yang, Chenrun Wang, Jethro Zih-Shuo Wang, Shiyi Yao, Gen Li, Yidong Xu, Hanyu Wang, Yilin Huang, Angela Lin Wang, Chen Shi, Yin Zhang, Jianan Guo, Luqi Yang, Renxuan Li, Yang Xu, Jiawei Liu, Yao Zhang, Lei Liu, Carlos Gutiérrez SanRomán, Lei Wang

arxiv logopreprintJul 31 2025
Chest X-ray (CXR) imaging is one of the most widely used diagnostic modalities in clinical practice, encompassing a broad spectrum of diagnostic tasks. Recent advancements have seen the extensive application of reasoning-based multimodal large language models (MLLMs) in medical imaging to enhance diagnostic efficiency and interpretability. However, existing multimodal models predominantly rely on "one-time" diagnostic approaches, lacking verifiable supervision of the reasoning process. This leads to challenges in multi-task CXR diagnosis, including lengthy reasoning, sparse rewards, and frequent hallucinations. To address these issues, we propose CX-Mind, the first generative model to achieve interleaved "think-answer" reasoning for CXR tasks, driven by curriculum-based reinforcement learning and verifiable process rewards (CuRL-VPR). Specifically, we constructed an instruction-tuning dataset, CX-Set, comprising 708,473 images and 2,619,148 samples, and generated 42,828 high-quality interleaved reasoning data points supervised by clinical reports. Optimization was conducted in two stages under the Group Relative Policy Optimization framework: initially stabilizing basic reasoning with closed-domain tasks, followed by transfer to open-domain diagnostics, incorporating rule-based conditional process rewards to bypass the need for pretrained reward models. Extensive experimental results demonstrate that CX-Mind significantly outperforms existing medical and general-domain MLLMs in visual understanding, text generation, and spatiotemporal alignment, achieving an average performance improvement of 25.1% over comparable CXR-specific models. On real-world clinical dataset (Rui-CXR), CX-Mind achieves a mean recall@1 across 14 diseases that substantially surpasses the second-best results, with multi-center expert evaluations further confirming its clinical utility across multiple dimensions.

Adaptively Distilled ControlNet: Accelerated Training and Superior Sampling for Medical Image Synthesis

Kunpeng Qiu, Zhiying Zhou, Yongxin Guo

arxiv logopreprintJul 31 2025
Medical image annotation is constrained by privacy concerns and labor-intensive labeling, significantly limiting the performance and generalization of segmentation models. While mask-controllable diffusion models excel in synthesis, they struggle with precise lesion-mask alignment. We propose \textbf{Adaptively Distilled ControlNet}, a task-agnostic framework that accelerates training and optimization through dual-model distillation. Specifically, during training, a teacher model, conditioned on mask-image pairs, regularizes a mask-only student model via predicted noise alignment in parameter space, further enhanced by adaptive regularization based on lesion-background ratios. During sampling, only the student model is used, enabling privacy-preserving medical image generation. Comprehensive evaluations on two distinct medical datasets demonstrate state-of-the-art performance: TransUNet improves mDice/mIoU by 2.4%/4.2% on KiTS19, while SANet achieves 2.6%/3.5% gains on Polyps, highlighting its effectiveness and superiority. Code is available at GitHub.
Page 14 of 91901 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.