Latest Papers on Radiology AI. Tags: In Silico

Data-Agnostic Augmentations for Unknown Variations: Out-of-Distribution Generalisation in MRI Segmentation

Puru Vaish, Felix Meister, Tobias Heimann, Christoph Brune, Jelmer M. Wolterink

•preprint•May 15 2025

Medical image segmentation models are often trained on curated datasets, leading to performance degradation when deployed in real-world clinical settings due to mismatches between training and test distributions. While data augmentation techniques are widely used to address these challenges, traditional visually consistent augmentation strategies lack the robustness needed for diverse real-world scenarios. In this work, we systematically evaluate alternative augmentation strategies, focusing on MixUp and Auxiliary Fourier Augmentation. These methods mitigate the effects of multiple variations without explicitly targeting specific sources of distribution shifts. We demonstrate how these techniques significantly improve out-of-distribution generalization and robustness to imaging variations across a wide range of transformations in cardiac cine MRI and prostate MRI segmentation. We quantitatively find that these augmentation methods enhance learned feature representations by promoting separability and compactness. Additionally, we highlight how their integration into nnU-Net training pipelines provides an easy-to-implement, effective solution for enhancing the reliability of medical segmentation models in real-world applications.

MRI Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

Ordered-subsets Multi-diffusion Model for Sparse-view CT Reconstruction

Pengfei Yu, Bin Huang, Minghui Zhang, Weiwen Wu, Shaoyu Wang, Qiegen Liu

•preprint•May 15 2025

Score-based diffusion models have shown significant promise in the field of sparse-view CT reconstruction. However, the projection dataset is large and riddled with redundancy. Consequently, applying the diffusion model to unprocessed data results in lower learning effectiveness and higher learning difficulty, frequently leading to reconstructed images that lack fine details. To address these issues, we propose the ordered-subsets multi-diffusion model (OSMM) for sparse-view CT reconstruction. The OSMM innovatively divides the CT projection data into equal subsets and employs multi-subsets diffusion model (MSDM) to learn from each subset independently. This targeted learning approach reduces complexity and enhances the reconstruction of fine details. Furthermore, the integration of one-whole diffusion model (OWDM) with complete sinogram data acts as a global information constraint, which can reduce the possibility of generating erroneous or inconsistent sinogram information. Moreover, the OSMM's unsupervised learning framework provides strong robustness and generalizability, adapting seamlessly to varying sparsity levels of CT sinograms. This ensures consistent and reliable performance across different clinical scenarios. Experimental results demonstrate that OSMM outperforms traditional diffusion models in terms of image quality and noise resilience, offering a powerful and versatile solution for advanced CT imaging in sparse-view scenarios.

CT Reconstruction Methodology In Silico Academic Lab

A fully automatic radiomics pipeline for postoperative facial nerve function prediction of vestibular schwannoma.

Song G, Li K, Wang Z, Liu W, Xue Q, Liang J, Zhou Y, Geng H, Liu D

•papers•May 14 2025

Vestibular schwannoma (VS) is the most prevalent intracranial schwannoma. Surgery is one of the options for the treatment of VS, with the preservation of facial nerve (FN) function being the primary objective. Therefore, postoperative FN function prediction is essential. However, achieving automation for such a method remains a challenge. In this study, we proposed a fully automatic deep learning approach based on multi-sequence magnetic resonance imaging (MRI) to predict FN function after surgery in VS patients. We first developed a segmentation network 2.5D Trans-UNet, which combined Transformer and U-Net to optimize contour segmentation for radiomic feature extraction. Next, we built a deep learning network based on the integration of 1DConvolutional Neural Network (1DCNN) and Gated Recurrent Unit (GRU) to predict postoperative FN function using the extracted features. We trained and tested the 2.5D Trans-UNet segmentation network on public and private datasets, achieving accuracies of 89.51% and 90.66%, respectively, confirming the model's strong performance. Then Feature extraction and selection were performed on the private dataset's segmentation results using 2.5D Trans-UNet. The selected features were used to train the 1DCNN-GRU network for classification. The results showed that our proposed fully automatic radiomics pipeline outperformed the traditional radiomics pipeline on the test set, achieving an accuracy of 88.64%, demonstrating its effectiveness in predicting the postoperative FN function in VS patients. Our proposed automatic method has the potential to become a valuable decision-making tool in neurosurgery, assisting neurosurgeons in making more informed decisions regarding surgical interventions and improving the treatment of VS patients.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Deep learning for cerebral vascular occlusion segmentation: A novel ConvNeXtV2 and GRN-integrated U-Net framework for diffusion-weighted imaging.

Ince S, Kunduracioglu I, Algarni A, Bayram B, Pacal I

•papers•May 14 2025

Cerebral vascular occlusion is a serious condition that can lead to stroke and permanent neurological damage due to insufficient oxygen and nutrients reaching brain tissue. Early diagnosis and accurate segmentation are critical for effective treatment planning. Due to its high soft tissue contrast, Magnetic Resonance Imaging (MRI) is commonly used for detecting these occlusions such as ischemic stroke. However, challenges such as low contrast, noise, and heterogeneous lesion structures in MRI images complicate manual segmentation and often lead to misinterpretations. As a result, deep learning-based Computer-Aided Diagnosis (CAD) systems are essential for faster and more accurate diagnosis and treatment methods, although they can sometimes face challenges such as high computational costs and difficulties in segmenting small or irregular lesions. This study proposes a novel U-Net architecture enhanced with ConvNeXtV2 blocks and GRN-based Multi-Layer Perceptrons (MLP) to address these challenges in cerebral vascular occlusion segmentation. This is the first application of ConvNeXtV2 in this domain. The proposed model significantly improves segmentation accuracy, even in low-contrast regions, while maintaining high computational efficiency, which is crucial for real-world clinical applications. To reduce false positives and improve overall accuracy, small lesions (≤5 pixels) were removed in the preprocessing step with the support of expert clinicians. Experimental results on the ISLES 2022 dataset showed superior performance with an Intersection over Union (IoU) of 0.8015 and a Dice coefficient of 0.8894. Comparative analyses indicate that the proposed model achieves higher segmentation accuracy than existing U-Net variants and other methods, offering a promising solution for clinical use.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab

AI-based metal artefact correction algorithm for radiotherapy patients with dental hardware in head and neck CT: Towards precise imaging.

Yu X, Zhong S, Zhang G, Du J, Wang G, Hu J

•papers•May 14 2025

To investigate the clinical efficiency of an AI-based metal artefact correction algorithm (AI-MAC), for reducing dental metal artefacts in head and neck CT, compared to conventional interpolation-based MAC. We retrospectively collected 41 patients with non-removal dental hardware who underwent non-contrast head and neck CT prior to radiotherapy. All images were reconstructed with standard reconstruction algorithm (SRA), and were additionally processed with both conventional MAC and AI-MAC. The image quality of SRA, MAC and AI-MAC were compared by qualitative scoring on a 5-point scale, with scores ≥ 3 considered interpretable. This was followed by a quantitative evaluation, including signal-to-noise ratio (SNR) and artefact index (Idxartefact). Organ contouring accuracy was quantified via calculating the dice similarity coefficient (DSC) and hausdorff distance (HD) for oral cavity and teeth, using the clinically accepted contouring as reference. Moreover, the treatment planning dose distribution for oral cavity was assessed. AI-MAC yielded superior qualitative image quality as well as quantitative metrics, including SNR and Idxartefact, to SRA and MAC. The image interpretability significantly improved from 41.46% for SRA and 56.10% for MAC to 92.68% for AI-MAC (p < 0.05). Compared to SRA and MAC, the best DSC and HD for both oral cavity and teeth were obtained on AI-MAC (all p < 0.05). No significant differences for dose distribution were found among the three image sets. AI-MAC outperforms conventional MAC in metal artefact reduction, achieving superior image quality with high image interpretability for patients with dental hardware undergoing head and neck CT. Furthermore, the use of AI-MAC improves the accuracy of organ contouring while providing consistent dose calculation against metal artefacts in radiotherapy. AI-MAC is a novel deep learning-based technique for reducing metal artefacts on CT. This in-vivo study first demonstrated its capability of reducing metal artefacts while preserving organ visualization, as compared with conventional MAC.

CT Reconstruction Neurological Retrospective Clinical In Silico Academic Lab GenAI

Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.

Takita H, Walston SL, Mitsuyama Y, Watanabe K, Ishimaru S, Ueda D

•papers•May 14 2025

To compare the diagnostic performance of three proprietary large language models (LLMs)-Claude, GPT, and Gemini-in structuring free-text Japanese radiology reports for intracranial hemorrhage and skull fractures, and to assess the impact of three different prompting approaches on model accuracy. In this retrospective study, head CT reports from the Japan Medical Imaging Database between 2018 and 2023 were collected. Two board-certified radiologists established the ground truth regarding intracranial hemorrhage and skull fractures through independent review and consensus. Each radiology report was analyzed by three LLMs using three prompting strategies-Standard, Chain of Thought, and Self Consistency prompting. Diagnostic performance (accuracy, precision, recall, and F1-score) was calculated for each LLM-prompt combination and compared using McNemar's tests with Bonferroni correction. Misclassified cases underwent qualitative error analysis. A total of 3949 head CT reports from 3949 patients (mean age 59 ± 25 years, 56.2% male) were enrolled. Across all institutions, 856 patients (21.6%) had intracranial hemorrhage and 264 patients (6.6%) had skull fractures. All nine LLM-prompt combinations achieved very high accuracy. Claude demonstrated significantly higher accuracy for intracranial hemorrhage than GPT and Gemini, and also outperformed Gemini for skull fractures (p < 0.0001). Gemini's performance improved notably with Chain of Thought prompting. Error analysis revealed common challenges including ambiguous phrases and findings unrelated to intracranial hemorrhage or skull fractures, underscoring the importance of careful prompt design. All three proprietary LLMs exhibited strong performance in structuring free-text head CT reports for intracranial hemorrhage and skull fractures. While the choice of prompting method influenced accuracy, all models demonstrated robust potential for clinical and research applications. Future work should refine the prompts and validate these approaches in prospective, multilingual settings.

CT LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI

Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.

Xie Y, Hu Z, Tao H, Hu Y, Liang H, Lu X, Wang L, Li X, Chen S

•papers•May 14 2025

To evaluate the performance of large language models (LLMs) in automatically generating whole-organ MRI score (WORMS)-based structured MRI reports and predicting osteoarthritis (OA) severity for the knee. A total of 160 consecutive patients suspected of OA were included. Knee MRI reports were reviewed by three radiologists to establish the WORMS reference standard for 39 key features. GPT-4o and GPT-4o-mini were prompted using in-context knowledge (ICK) and chain-of-thought (COT) to generate WORMS-based structured reports from original reports and to automatically predict the OA severity. Four Orthopedic surgeons reviewed original and LLM-generated reports to conduct pairwise preference and difficulty tests, and their review times were recorded. GPT-4o demonstrated perfect performance in extracting the laterality of the knee (accuracy = 100%). GPT-4o outperformed GPT-4o mini in generating WORMS reports (Accuracy: 93.9% vs 76.2%, respectively). GPT-4o achieved higher recall (87.3% s 46.7%, p < 0.001), while maintaining higher precision compared to GPT-4o mini (94.2% vs 71.2%, p < 0.001). For predicting OA severity, GPT-4o outperformed GPT-4o mini across all prompt strategies (best accuracy: 98.1% vs 68.7%). Surgeons found it easier to extract information and gave more preference to LLM-generated reports over the original reports (both p < 0.001) while spending less time on each report (51.27 ± 9.41 vs 87.42 ± 20.26 s, p < 0.001). GPT-4o generated expert multi-feature, WORMS-based reports from original free-text knee MRI reports. GPT-4o with COT achieved high accuracy in categorizing OA severity. Surgeons reported greater preference and higher efficiency when using LLM-generated reports. The perfect performance of generating WORMS-based reports and the high efficiency and ease of use suggest that integrating LLMs into clinical workflows could greatly enhance productivity and alleviate the documentation burden faced by clinicians in knee OA. GPT-4o successfully generated WORMS-based knee MRI reports. GPT-4o with COT prompting achieved impressive accuracy in categorizing knee OA severity. Greater preference and higher efficiency were reported for LLM-generated reports.

MRI LLM Radiology Report Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Synthetic Data-Enhanced Classification of Prevalent Osteoporotic Fractures Using Dual-Energy X-Ray Absorptiometry-Based Geometric and Material Parameters.

Quagliato L, Seo J, Hong J, Lee T, Chung YS

•papers•May 14 2025

Bone fracture risk assessment for osteoporotic patients is essential for implementing early countermeasures and preventing discomfort and hospitalization. Current methodologies, such as Fracture Risk Assessment Tool (FRAX), provide a risk assessment over a 5- to 10-year period rather than evaluating the bone's current health status. The database was collected by Ajou University Medical Center from 2017 to 2021. It included 9,260 patients, aged 55 to 99, comprising 242 femur fracture (FX) cases and 9,018 non-fracture (NFX) cases. To model the association of the bone's current health status with prevalent FXs, three prediction algorithms-extreme gradient boosting (XGB), support vector machine, and multilayer perceptron-were trained using two-dimensional dual-energy X-ray absorptiometry (2D-DXA) analysis results and subsequently benchmarked. The XGB classifier, which proved most effective, was then further refined using synthetic data generated by the adaptive synthetic oversampler to balance the FX and NFX classes and enhance boundary sharpness for better classification accuracy. The XGB model trained on raw data demonstrated good prediction capabilities, with an area under the curve (AUC) of 0.78 and an F1 score of 0.71 on test cases. The inclusion of synthetic data improved classification accuracy in terms of both specificity and sensitivity, resulting in an AUC of 0.99 and an F1 score of 0.98. The proposed methodology demonstrates that current bone health can be assessed through post-processed results from 2D-DXA analysis. Moreover, it was also shown that synthetic data can help stabilize uneven databases by balancing majority and minority classes, thereby significantly improving classification performance.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

DCSNet: A Lightweight Knowledge Distillation-Based Model with Explainable AI for Lung Cancer Diagnosis from Histopathological Images

Sadman Sakib Alif, Nasim Anzum Promise, Fiaz Al Abid, Aniqua Nusrat Zereen

•preprint•May 14 2025

Lung cancer is a leading cause of cancer-related deaths globally, where early detection and accurate diagnosis are critical for improving survival rates. While deep learning, particularly convolutional neural networks (CNNs), has revolutionized medical image analysis by detecting subtle patterns indicative of early-stage lung cancer, its adoption faces challenges. These models are often computationally expensive and require significant resources, making them unsuitable for resource constrained environments. Additionally, their lack of transparency hinders trust and broader adoption in sensitive fields like healthcare. Knowledge distillation addresses these challenges by transferring knowledge from large, complex models (teachers) to smaller, lightweight models (students). We propose a knowledge distillation-based approach for lung cancer detection, incorporating explainable AI (XAI) techniques to enhance model transparency. Eight CNNs, including ResNet50, EfficientNetB0, EfficientNetB3, and VGG16, are evaluated as teacher models. We developed and trained a lightweight student model, Distilled Custom Student Network (DCSNet) using ResNet50 as the teacher. This approach not only ensures high diagnostic performance in resource-constrained settings but also addresses transparency concerns, facilitating the adoption of AI-driven diagnostic tools in healthcare.

Mixed Modality Classification Chest Methodology In Silico Ethics

Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation

Anne-Marie Rickmann, Stephanie L. Thorn, Shawn S. Ahn, Supum Lee, Selen Uman, Taras Lysyy, Rachel Burns, Nicole Guerrera, Francis G. Spinale, Jason A. Burdick, Albert J. Sinusas, James S. Duncan

•preprint•May 14 2025

Cardiac image segmentation is an important step in many cardiac image analysis and modeling tasks such as motion tracking or simulations of cardiac mechanics. While deep learning has greatly advanced segmentation in clinical settings, there is limited work on pre-clinical imaging, notably in porcine models, which are often used due to their anatomical and physiological similarity to humans. However, differences between species create a domain shift that complicates direct model transfer from human to pig data. Recently, foundation models trained on large human datasets have shown promise for robust medical image segmentation; yet their applicability to porcine data remains largely unexplored. In this work, we investigate whether foundation models can generate sufficiently accurate pseudo-labels for pig cardiac CT and propose a simple self-training approach to iteratively refine these labels. Our method requires no manually annotated pig data, relying instead on iterative updates to improve segmentation quality. We demonstrate that this self-training process not only enhances segmentation accuracy but also smooths out temporal inconsistencies across consecutive frames. Although our results are encouraging, there remains room for improvement, for example by incorporating more sophisticated self-training strategies and by exploring additional foundation models and other cardiac imaging technologies.

CT Segmentation Cardiac Methodology In Silico GenAI

Filter Papers

Tags

Data-Agnostic Augmentations for Unknown Variations: Out-of-Distribution Generalisation in MRI Segmentation

Ordered-subsets Multi-diffusion Model for Sparse-view CT Reconstruction

A fully automatic radiomics pipeline for postoperative facial nerve function prediction of vestibular schwannoma.

Deep learning for cerebral vascular occlusion segmentation: A novel ConvNeXtV2 and GRN-integrated U-Net framework for diffusion-weighted imaging.

AI-based metal artefact correction algorithm for radiotherapy patients with dental hardware in head and neck CT: Towards precise imaging.

Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.

Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.

Synthetic Data-Enhanced Classification of Prevalent Osteoporotic Fractures Using Dual-Energy X-Ray Absorptiometry-Based Geometric and Material Parameters.

DCSNet: A Lightweight Knowledge Distillation-Based Model with Explainable AI for Lung Cancer Diagnosis from Histopathological Images

Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation

Ready to Sharpen Your Edge?