Latest Papers on Radiology AI. Sources: pubmed

Uncertainty Co-estimator for Improving Semi-Supervised Medical Image Segmentation.

Zeng X, Xiong S, Xu J, Du G, Rong Y

•papers•May 15 2025

Recently, combining the strategy of consistency regularization with uncertainty estimation has shown promising performance on semi-supervised medical image segmentation tasks. However, most existing methods estimate the uncertainty solely based on the outputs of a single neural network, which results in imprecise uncertainty estimations and eventually degrades the segmentation performance. In this paper, we propose a novel Uncertainty Co-estimator (UnCo) framework to deal with this problem. Inspired by the co-training technique, UnCo establishes two different mean-teacher modules (i.e., two pairs of teacher and student models), and estimates three types of uncertainty from the multi-source predictions generated by these models. Through combining these uncertainties, their differences will help to filter out incorrect noise in each estimate, thus allowing the final fused uncertainty maps to be more accurate. These resulting maps are then used to enhance a cross-consistency regularization imposed between the two modules. In addition, UnCo also designs an internal consistency regularization within each module, so that the student models can aggregate diverse feature information from both modules, thus promoting the semi-supervised segmentation performance. Finally, an adversarial constraint is introduced to maintain the model diversity. Experimental results on four medical image datasets indicate that UnCo can achieve new state-of-the-art performance on both 2D and 3D semi-supervised segmentation tasks. The source code will be available at https://github.com/z1010x/UnCo.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA Open Code

CLIF-Net: Intersection-guided Cross-view Fusion Network for Infection Detection from Cranial Ultrasound.

Yu M, Peterson MR, Burgoine K, Harbaugh T, Olupot-Olupot P, Gladstone M, Hagmann C, Cowan FM, Weeks A, Morton SU, Mulondo R, Mbabazi-Kabachelor E, Schiff SJ, Monga V

•papers•May 15 2025

This paper addresses the problem of detecting possible serious bacterial infection (pSBI) of infancy, i.e. a clinical presentation consistent with bacterial sepsis in newborn infants using cranial ultrasound (cUS) images. The captured image set for each patient enables multiview imagery: coronal and sagittal, with geometric overlap. To exploit this geometric relation, we develop a new learning framework, called the intersection-guided Crossview Local- and Image-level Fusion Network (CLIF-Net). Our technique employs two distinct convolutional neural network branches to extract features from coronal and sagittal images with newly developed multi-level fusion blocks. Specifically, we leverage the spatial position of these images to locate the intersecting region. We then identify and enhance the semantic features from this region across multiple levels using cross-attention modules, facilitating the acquisition of mutually beneficial and more representative features from both views. The final enhanced features from the two views are then integrated and projected through the image-level fusion layer, outputting pSBI and non-pSBI class probabilities. We contend that our method of exploiting multi-view cUS images enables a first of its kind, robust 3D representation tailored for pSBI detection. When evaluated on a dataset of 302 cUS scans from Mbale Regional Referral Hospital in Uganda, CLIF-Net demonstrates substantially enhanced performance, surpassing the prevailing state-of-the-art infection detection techniques.

Ultrasound Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Artificial intelligence algorithm improves radiologists' bone age assessment accuracy artificial intelligence algorithm improves radiologists' bone age assessment accuracy.

Chang TY, Chou TY, Jen IA, Yuh YS

•papers•May 15 2025

Artificial intelligence (AI) algorithms can provide rapid and precise radiographic bone age (BA) assessment. This study assessed the effects of an AI algorithm on the BA assessment performance of radiologists, and evaluated how automation bias could affect radiologists. In this prospective randomized crossover study, six radiologists with varying levels of experience (senior, mi-level, and junior) assessed cases from a test set of 200 standard BA radiographs. The test set was equally divided into two subsets: datasets A and B. Each radiologist assessed BA independently without AI assistance (A- B-) and with AI assistance (A+ B+). We used the mean of assessments made by two experts as the ground truth for accuracy assessment; subsequently, we calculated the mean absolute difference (MAD) between the radiologists' BA predictions and ground-truth BA and evaluated the proportion of estimates for which the MAD exceeded one year. Additionally, we compared the radiologists' performance under conditions of early AI assistance with their performance under conditions of delayed AI assistance; the radiologists were allowed to reject AI interpretations. The overall accuracy of senior, mid-level, and junior radiologists improved significantly with AI assistance than without AI assistance (MAD: 0.74 vs. 0.46 years, p < 0.001; proportion of assessments for which MAD exceeded 1 year: 24.0% vs. 8.4%, p < 0.001). The proportion of improved BA predictions with AI assistance (16.8%) was significantly higher than that of less accurate predictions with AI assistance (2.3%; p < 0.001). No consistent timing effect was observed between conditions of early and delayed AI assistance. Most disagreements between radiologists and AI occurred over images for patients aged ≤8 years. Senior radiologists had more disagreements than other radiologists. The AI algorithm improved the BA assessment accuracy of radiologists with varying experience levels. Automation bias was prone to affect less experienced radiologists.

X-Ray Classification Musculoskeletal Prospective Clinical Pilot Academic Lab

MIMI-ONET: Multi-Modal image augmentation via Butterfly Optimized neural network for Huntington DiseaseDetection.

Amudaria S, Jawhar SJ

•papers•May 15 2025

Huntington's disease (HD) is a chronic neurodegenerative ailment that affects cognitive decline, motor impairment, and psychiatric symptoms. However, the existing HD detection methods are struggle with limited annotated datasets that restricts their generalization performance. This research work proposes a novel MIMI-ONET for primary detection of HD using augmented multi-modal brain MRI images. The two-dimensional stationary wavelet transform (2DSWT) decomposes the MRI images into different frequency wavelet sub-bands. These sub-bands are enhanced with Contract Stretching Adaptive Histogram Equalization (CSAHE) and Multi-scale Adaptive Retinex (MSAR) by reducing the irrelevant distortions. The proposed MIMI-ONET introduces a Hepta Generative Adversarial Network (Hepta-GAN) to generates different noise-free HD images based on hepta azimuth angles (45°, 90°, 135°, 180°, 225°, 270°, 315°). Hepta-GAN incorporates Affine Estimation Module (AEM) to extract the multi-scale features using dilated convolutional layers for efficient HD image generation. Moreover, Hepta-GAN is normalized with Butterfly Optimization (BO) algorithm for enhancing augmentation performance by balancing the parameters. Finally, the generated images are given to Deep neural network (DNN) for the classification of normal control (NC), Adult-Onset HD (AHD) and Juvenile HD (JHD) cases. The ability of the proposed MIMI-ONET is evaluated with precision, specificity, f1 score, recall, and accuracy, PSNR and MSE. From the experimental results, the proposed MIMI-ONET attains the accuracy of 98.85% and reaches PSNR value of 48.05 based on the gathered Image-HD dataset. The proposed MIMI-ONET increases the overall accuracy of 9.96%, 1.85%, 5.91%, 13.80% and 13.5% for 3DCNN, KNN, FCN, RNN and ML framework respectively.

MRI Classification Neurological Methodology In Silico Academic Lab

Performance of Artificial Intelligence in Diagnosing Lumbar Spinal Stenosis: A Systematic Review and Meta-Analysis.

Yang X, Zhang Y, Li Y, Wu Z

•papers•May 15 2025

The present study followed the reporting guidelines for systematic reviews and meta-analyses. We conducted this study to review the diagnostic value of artificial intelligence (AI) for various types of lumbar spinal stenosis (LSS) and the level of stenosis, offering evidence-based support for the development of smart diagnostic tools. AI is currently being utilized for image processing in clinical practice. Some studies have explored AI techniques for identifying the severity of LSS in recent years. Nevertheless, there remains a shortage of structured data proving its effectiveness. Four databases (PubMed, Cochrane, Embase, and Web of Science) were searched until March 2024, including original studies that utilized deep learning (DL) and machine learning (ML) models to diagnose LSS. The risk of bias of included studies was assessed using Quality Assessment of Diagnostic Accuracy Studies is a quality evaluation tool for diagnostic research (diagnostic tests). Computed Tomography. PROSPERO is an international database of prospectively registered systematic reviews. Summary Receiver Operating Characteristic. Magnetic Resonance. Central canal stenosis. three-dimensional magnetic resonance myelography. The accuracy in the validation set was extracted for a meta-analysis. The meta-analysis was completed in R4.4.0. A total of 48 articles were included, with an overall accuracy of 0.885 (95% CI: 0.860-0907) for dichotomous tasks. Among them, the accuracy was 0.892 (95% CI: 0.867-0915) for DL and 0.833 (95% CI: 0.760-0895) for ML. The overall accuracy for LSS was 0.895 (95% CI: 0.858-0927), with an accuracy of 0.912 (95% CI: 0.873-0.944) for DL and 0.843 (95% CI: 0.766-0.907) for ML. The overall accuracy for central canal stenosis was 0.875 (95% CI: 0.821-0920), with an accuracy of 0.881 (95% CI: 0.829-0.925) for DL and 0.733 (95% CI: 0.541-0.877) for ML. The overall accuracy for neural foramen stenosis was 0.893 (95% CI: 0.851-0.928). In polytomous tasks, the accuracy was 0.936 (95% CI: 0.895-0.967) for no LSS, 0.503 (95% CI: 0.391-0.614) for mild LSS, 0.512 (95% CI: 0.336-0.688) for moderate LSS, and 0.860 for severe LSS (95% CI: 0.733-0.954). AI is highly valuable for diagnosing LSS. However, further external validation is necessary to enhance the analysis of different stenosis categories and improve the diagnostic accuracy for mild to moderate stenosis levels.

Mixed Modality Classification Musculoskeletal Meta Analysis In Silico Academic Lab

A fully automatic radiomics pipeline for postoperative facial nerve function prediction of vestibular schwannoma.

Song G, Li K, Wang Z, Liu W, Xue Q, Liang J, Zhou Y, Geng H, Liu D

•papers•May 14 2025

Vestibular schwannoma (VS) is the most prevalent intracranial schwannoma. Surgery is one of the options for the treatment of VS, with the preservation of facial nerve (FN) function being the primary objective. Therefore, postoperative FN function prediction is essential. However, achieving automation for such a method remains a challenge. In this study, we proposed a fully automatic deep learning approach based on multi-sequence magnetic resonance imaging (MRI) to predict FN function after surgery in VS patients. We first developed a segmentation network 2.5D Trans-UNet, which combined Transformer and U-Net to optimize contour segmentation for radiomic feature extraction. Next, we built a deep learning network based on the integration of 1DConvolutional Neural Network (1DCNN) and Gated Recurrent Unit (GRU) to predict postoperative FN function using the extracted features. We trained and tested the 2.5D Trans-UNet segmentation network on public and private datasets, achieving accuracies of 89.51% and 90.66%, respectively, confirming the model's strong performance. Then Feature extraction and selection were performed on the private dataset's segmentation results using 2.5D Trans-UNet. The selected features were used to train the 1DCNN-GRU network for classification. The results showed that our proposed fully automatic radiomics pipeline outperformed the traditional radiomics pipeline on the test set, achieving an accuracy of 88.64%, demonstrating its effectiveness in predicting the postoperative FN function in VS patients. Our proposed automatic method has the potential to become a valuable decision-making tool in neurosurgery, assisting neurosurgeons in making more informed decisions regarding surgical interventions and improving the treatment of VS patients.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Deep learning for cerebral vascular occlusion segmentation: A novel ConvNeXtV2 and GRN-integrated U-Net framework for diffusion-weighted imaging.

Ince S, Kunduracioglu I, Algarni A, Bayram B, Pacal I

•papers•May 14 2025

Cerebral vascular occlusion is a serious condition that can lead to stroke and permanent neurological damage due to insufficient oxygen and nutrients reaching brain tissue. Early diagnosis and accurate segmentation are critical for effective treatment planning. Due to its high soft tissue contrast, Magnetic Resonance Imaging (MRI) is commonly used for detecting these occlusions such as ischemic stroke. However, challenges such as low contrast, noise, and heterogeneous lesion structures in MRI images complicate manual segmentation and often lead to misinterpretations. As a result, deep learning-based Computer-Aided Diagnosis (CAD) systems are essential for faster and more accurate diagnosis and treatment methods, although they can sometimes face challenges such as high computational costs and difficulties in segmenting small or irregular lesions. This study proposes a novel U-Net architecture enhanced with ConvNeXtV2 blocks and GRN-based Multi-Layer Perceptrons (MLP) to address these challenges in cerebral vascular occlusion segmentation. This is the first application of ConvNeXtV2 in this domain. The proposed model significantly improves segmentation accuracy, even in low-contrast regions, while maintaining high computational efficiency, which is crucial for real-world clinical applications. To reduce false positives and improve overall accuracy, small lesions (≤5 pixels) were removed in the preprocessing step with the support of expert clinicians. Experimental results on the ISLES 2022 dataset showed superior performance with an Intersection over Union (IoU) of 0.8015 and a Dice coefficient of 0.8894. Comparative analyses indicate that the proposed model achieves higher segmentation accuracy than existing U-Net variants and other methods, offering a promising solution for clinical use.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab

AI-based metal artefact correction algorithm for radiotherapy patients with dental hardware in head and neck CT: Towards precise imaging.

Yu X, Zhong S, Zhang G, Du J, Wang G, Hu J

•papers•May 14 2025

To investigate the clinical efficiency of an AI-based metal artefact correction algorithm (AI-MAC), for reducing dental metal artefacts in head and neck CT, compared to conventional interpolation-based MAC. We retrospectively collected 41 patients with non-removal dental hardware who underwent non-contrast head and neck CT prior to radiotherapy. All images were reconstructed with standard reconstruction algorithm (SRA), and were additionally processed with both conventional MAC and AI-MAC. The image quality of SRA, MAC and AI-MAC were compared by qualitative scoring on a 5-point scale, with scores ≥ 3 considered interpretable. This was followed by a quantitative evaluation, including signal-to-noise ratio (SNR) and artefact index (Idxartefact). Organ contouring accuracy was quantified via calculating the dice similarity coefficient (DSC) and hausdorff distance (HD) for oral cavity and teeth, using the clinically accepted contouring as reference. Moreover, the treatment planning dose distribution for oral cavity was assessed. AI-MAC yielded superior qualitative image quality as well as quantitative metrics, including SNR and Idxartefact, to SRA and MAC. The image interpretability significantly improved from 41.46% for SRA and 56.10% for MAC to 92.68% for AI-MAC (p < 0.05). Compared to SRA and MAC, the best DSC and HD for both oral cavity and teeth were obtained on AI-MAC (all p < 0.05). No significant differences for dose distribution were found among the three image sets. AI-MAC outperforms conventional MAC in metal artefact reduction, achieving superior image quality with high image interpretability for patients with dental hardware undergoing head and neck CT. Furthermore, the use of AI-MAC improves the accuracy of organ contouring while providing consistent dose calculation against metal artefacts in radiotherapy. AI-MAC is a novel deep learning-based technique for reducing metal artefacts on CT. This in-vivo study first demonstrated its capability of reducing metal artefacts while preserving organ visualization, as compared with conventional MAC.

CT Reconstruction Neurological Retrospective Clinical In Silico Academic Lab GenAI

Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.

Takita H, Walston SL, Mitsuyama Y, Watanabe K, Ishimaru S, Ueda D

•papers•May 14 2025

To compare the diagnostic performance of three proprietary large language models (LLMs)-Claude, GPT, and Gemini-in structuring free-text Japanese radiology reports for intracranial hemorrhage and skull fractures, and to assess the impact of three different prompting approaches on model accuracy. In this retrospective study, head CT reports from the Japan Medical Imaging Database between 2018 and 2023 were collected. Two board-certified radiologists established the ground truth regarding intracranial hemorrhage and skull fractures through independent review and consensus. Each radiology report was analyzed by three LLMs using three prompting strategies-Standard, Chain of Thought, and Self Consistency prompting. Diagnostic performance (accuracy, precision, recall, and F1-score) was calculated for each LLM-prompt combination and compared using McNemar's tests with Bonferroni correction. Misclassified cases underwent qualitative error analysis. A total of 3949 head CT reports from 3949 patients (mean age 59 ± 25 years, 56.2% male) were enrolled. Across all institutions, 856 patients (21.6%) had intracranial hemorrhage and 264 patients (6.6%) had skull fractures. All nine LLM-prompt combinations achieved very high accuracy. Claude demonstrated significantly higher accuracy for intracranial hemorrhage than GPT and Gemini, and also outperformed Gemini for skull fractures (p < 0.0001). Gemini's performance improved notably with Chain of Thought prompting. Error analysis revealed common challenges including ambiguous phrases and findings unrelated to intracranial hemorrhage or skull fractures, underscoring the importance of careful prompt design. All three proprietary LLMs exhibited strong performance in structuring free-text head CT reports for intracranial hemorrhage and skull fractures. While the choice of prompting method influenced accuracy, all models demonstrated robust potential for clinical and research applications. Future work should refine the prompts and validate these approaches in prospective, multilingual settings.

CT LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI

Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.

Xie Y, Hu Z, Tao H, Hu Y, Liang H, Lu X, Wang L, Li X, Chen S

•papers•May 14 2025

To evaluate the performance of large language models (LLMs) in automatically generating whole-organ MRI score (WORMS)-based structured MRI reports and predicting osteoarthritis (OA) severity for the knee. A total of 160 consecutive patients suspected of OA were included. Knee MRI reports were reviewed by three radiologists to establish the WORMS reference standard for 39 key features. GPT-4o and GPT-4o-mini were prompted using in-context knowledge (ICK) and chain-of-thought (COT) to generate WORMS-based structured reports from original reports and to automatically predict the OA severity. Four Orthopedic surgeons reviewed original and LLM-generated reports to conduct pairwise preference and difficulty tests, and their review times were recorded. GPT-4o demonstrated perfect performance in extracting the laterality of the knee (accuracy = 100%). GPT-4o outperformed GPT-4o mini in generating WORMS reports (Accuracy: 93.9% vs 76.2%, respectively). GPT-4o achieved higher recall (87.3% s 46.7%, p < 0.001), while maintaining higher precision compared to GPT-4o mini (94.2% vs 71.2%, p < 0.001). For predicting OA severity, GPT-4o outperformed GPT-4o mini across all prompt strategies (best accuracy: 98.1% vs 68.7%). Surgeons found it easier to extract information and gave more preference to LLM-generated reports over the original reports (both p < 0.001) while spending less time on each report (51.27 ± 9.41 vs 87.42 ± 20.26 s, p < 0.001). GPT-4o generated expert multi-feature, WORMS-based reports from original free-text knee MRI reports. GPT-4o with COT achieved high accuracy in categorizing OA severity. Surgeons reported greater preference and higher efficiency when using LLM-generated reports. The perfect performance of generating WORMS-based reports and the high efficiency and ease of use suggest that integrating LLMs into clinical workflows could greatly enhance productivity and alleviate the documentation burden faced by clinicians in knee OA. GPT-4o successfully generated WORMS-based knee MRI reports. GPT-4o with COT prompting achieved impressive accuracy in categorizing knee OA severity. Greater preference and higher efficiency were reported for LLM-generated reports.

MRI LLM Radiology Report Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Filter Papers

Tags

Uncertainty Co-estimator for Improving Semi-Supervised Medical Image Segmentation.

CLIF-Net: Intersection-guided Cross-view Fusion Network for Infection Detection from Cranial Ultrasound.

Artificial intelligence algorithm improves radiologists' bone age assessment accuracy artificial intelligence algorithm improves radiologists' bone age assessment accuracy.

MIMI-ONET: Multi-Modal image augmentation via Butterfly Optimized neural network for Huntington DiseaseDetection.

Performance of Artificial Intelligence in Diagnosing Lumbar Spinal Stenosis: A Systematic Review and Meta-Analysis.

A fully automatic radiomics pipeline for postoperative facial nerve function prediction of vestibular schwannoma.

Deep learning for cerebral vascular occlusion segmentation: A novel ConvNeXtV2 and GRN-integrated U-Net framework for diffusion-weighted imaging.

AI-based metal artefact correction algorithm for radiotherapy patients with dental hardware in head and neck CT: Towards precise imaging.

Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.

Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.

Ready to Sharpen Your Edge?