Sort by:
Page 210 of 2522511 results

Evaluating the Performance of Reasoning Large Language Models on Japanese Radiology Board Examination Questions.

Nakaura T, Takamure H, Kobayashi N, Shiraishi K, Yoshida N, Nagayama Y, Uetani H, Kidoh M, Funama Y, Hirai T

pubmed logopapersMay 17 2025
This study evaluates the performance, cost, and processing time of OpenAI's reasoning large language models (LLMs) (o1-preview, o1-mini) and their base models (GPT-4o, GPT-4o-mini) on Japanese radiology board examination questions. A total of 210 questions from the 2022-2023 official board examinations of the Japan Radiological Society were presented to each of the four LLMs. Performance was evaluated by calculating the percentage of correctly answered questions within six predefined radiology subspecialties. The total cost and processing time for each model were also recorded. The McNemar test was used to assess the statistical significance of differences in accuracy between paired model responses. The o1-preview achieved the highest accuracy (85.7%), significantly outperforming GPT-4o (73.3%, P<.001). Similarly, o1-mini (69.5%) performed significantly better than GPT-4o-mini (46.7%, P<.001). Across all radiology subspecialties, o1-preview consistently ranked highest. However, reasoning models incurred substantially higher costs (o1-preview: $17.10, o1-mini: $2.58) compared to their base counterparts (GPT-4o: $0.496, GPT-4o-mini: $0.04), and their processing times were approximately 3.7 and 1.2 times longer, respectively. Reasoning LLMs demonstrated markedly superior performance in answering radiology board exam questions compared to their base models, albeit at a substantially higher cost and increased processing time.

Breast Arterial Calcifications on Mammography: A Review of the Literature.

Rossi J, Cho L, Newell MS, Venta LA, Montgomery GH, Destounis SV, Moy L, Brem RF, Parghi C, Margolies LR

pubmed logopapersMay 17 2025
Identifying systemic disease with medical imaging studies may improve population health outcomes. Although the pathogenesis of peripheral arterial calcification and coronary artery calcification differ, breast arterial calcification (BAC) on mammography is associated with cardiovascular disease (CVD), a leading cause of death in women. While professional society guidelines on the reporting or management of BAC have not yet been established, and assessment and quantification methods are not yet standardized, the value of reporting BAC is being considered internationally as a possible indicator of subclinical CVD. Furthermore, artificial intelligence (AI) models are being developed to identify and quantify BAC on mammography, as well as to predict the risk of CVD. This review outlines studies evaluating the association of BAC and CVD, introduces the role of preventative cardiology in clinical management, discusses reasons to consider reporting BAC, acknowledges current knowledge gaps and barriers to assessing and reporting calcifications, and provides examples of how AI can be utilized to measure BAC and contribute to cardiovascular risk assessment. Ultimately, reporting BAC on mammography might facilitate earlier mitigation of cardiovascular risk factors in asymptomatic women.

MedVKAN: Efficient Feature Extraction with Mamba and KAN for Medical Image Segmentation

Hancan Zhu, Jinhao Chen, Guanghua He

arxiv logopreprintMay 17 2025
Medical image segmentation relies heavily on convolutional neural networks (CNNs) and Transformer-based models. However, CNNs are constrained by limited receptive fields, while Transformers suffer from scalability challenges due to their quadratic computational complexity. To address these limitations, recent advances have explored alternative architectures. The state-space model Mamba offers near-linear complexity while capturing long-range dependencies, and the Kolmogorov-Arnold Network (KAN) enhances nonlinear expressiveness by replacing fixed activation functions with learnable ones. Building on these strengths, we propose MedVKAN, an efficient feature extraction model integrating Mamba and KAN. Specifically, we introduce the EFC-KAN module, which enhances KAN with convolutional operations to improve local pixel interaction. We further design the VKAN module, integrating Mamba with EFC-KAN as a replacement for Transformer modules, significantly improving feature extraction. Extensive experiments on five public medical image segmentation datasets show that MedVKAN achieves state-of-the-art performance on four datasets and ranks second on the remaining one. These results validate the potential of Mamba and KAN for medical image segmentation while introducing an innovative and computationally efficient feature extraction framework. The code is available at: https://github.com/beginner-cjh/MedVKAN.

Fully Automated Evaluation of Condylar Remodeling after Orthognathic Surgery in Skeletal Class II Patients Using Deep Learning and Landmarks.

Jia W, Wu H, Mei L, Wu J, Wang M, Cui Z

pubmed logopapersMay 17 2025
Condylar remodeling is a key prognostic indicator in maxillofacial surgery for skeletal class II patients. This study aimed to develop and validate a fully automated method leveraging landmark-guided segmentation and registration for efficient assessment of condylar remodeling. A V-Net-based deep learning workflow was developed to automatically segment the mandible and localize anatomical landmarks from CT images. Cutting planes were computed based on the landmarks to segment the condylar and ramus volumes from the mandible mask. The stable ramus served as a reference for registering pre- and post-operative condyles using the Iterative Closest Point (ICP) algorithm. Condylar remodeling was subsequently assessed through mesh registration, heatmap visualization, and quantitative metrics of surface distance and volumetric change. Experts also rated the concordance between automated assessments and clinical diagnoses. In the test set, condylar segmentation achieved a Dice coefficient of 0.98, and landmark prediction yielded a mean absolute error of 0.26 mm. The automated evaluation process was completed in 5.22 seconds, approximately 150 times faster than manual assessments. The method accurately quantified condylar volume changes, ranging from 2.74% to 50.67% across patients. Expert ratings for all test cases averaged 9.62. This study introduced a consistent, accurate, and fully automated approach for condylar remodeling evaluation. The well-defined anatomical landmarks guided precise segmentation and registration, while deep learning supported an end-to-end automated workflow. The test results demonstrated its broad clinical applicability across various degrees of condylar remodeling and high concordance with expert assessments. By integrating anatomical landmarks and deep learning, the proposed method improves efficiency by 150 times without compromising accuracy, thereby facilitating an efficient and accurate assessment of orthognathic prognosis. The personalized 3D condylar remodeling models aid in visualizing sequelae, such as joint pain or skeletal relapse, and guide individualized management of TMJ disorders.

A Robust Automated Segmentation Method for White Matter Hyperintensity of Vascular-origin.

He H, Jiang J, Peng S, He C, Sun T, Fan F, Song H, Sun D, Xu Z, Wu S, Lu D, Zhang J

pubmed logopapersMay 17 2025
White matter hyperintensity (WMH) is a primary manifestation of small vessel disease (SVD), leading to vascular cognitive impairment and other disorders. Accurate WMH quantification is vital for diagnosis and prognosis, but current automatic segmentation methods often fall short, especially across different datasets. The aims of this study are to develop and validate a robust deep learning segmentation method for WMH of vascular-origin. In this study, we developed a transformer-based method for the automatic segmentation of vascular-origin WMH using both 3D T1 and 3D T2-FLAIR images. Our initial dataset comprised 126 participants with varying WMH burdens due to SVD, each with manually segmented WMH masks used for training and testing. External validation was performed on two independent datasets: the WMH Segmentation Challenge 2017 dataset (170 subjects) and an in-house vascular risk factor dataset (70 subjects), which included scans acquired on eight different MRI systems at field strengths of 1.5T, 3T, and 5T. This approach enabled a comprehensive assessment of the method's generalizability across diverse imaging conditions. We further compared our method against LGA, LPA, BIANCA, UBO-detector and TrUE-Net in optimized settings. Our method consistently outperformed others, achieving a median Dice coefficient of 0.78±0.09 in our primary dataset, 0.72±0.15 in the external dataset 1, and 0.72±0.14 in the external dataset 2. The relative volume errors were 0.15±0.14, 0.50±0.86, and 0.47±1.02, respectively. The true positive rates were 0.81±0.13, 0.92±0.09, and 0.92±0.12, while the false positive rates were 0.20±0.09, 0.40±0.18, and 0.40±0.19. None of the external validation datasets were used for model training; instead, they comprise previously unseen MRI scans acquired from different scanners and protocols. This setup closely reflects real-world clinical scenarios and further demonstrates the robustness and generalizability of our model across diverse MRI systems and acquisition settings. As such, the proposed method provides a reliable solution for WMH segmentation in large-scale cohort studies.

ML-Driven Alzheimer 's disease prediction: A deep ensemble modeling approach.

Jumaili MLF, Sonuç E

pubmed logopapersMay 17 2025
Alzheimer's disease (AD) is a progressive neurological disorder characterized by cognitive decline due to brain cell death, typically manifesting later in life.Early and accurate detection is critical for effective disease management and treatment. This study proposes an ensemble learning framework that combines five deep learning architectures (VGG16, VGG19, ResNet50, InceptionV3, and EfficientNetB7) to improve the accuracy of AD diagnosis. We use a comprehensive dataset of 3,714 MRI brain scans collected from specialized clinics in Iraq, categorized into three classes: NonDemented (834 images), MildDemented (1,824 images), and VeryDemented (1,056 images). The proposed voting ensemble model achieves a diagnostic accuracy of 99.32% on our dataset. The effectiveness of the model is further validated on two external datasets: OASIS (achieving 86.6% accuracy) and ADNI (achieving 99.5% accuracy), demonstrating competitive performance compared to existing approaches. Moreover, the proposed model exhibits high precision and recall across all stages of dementia, providing a reliable and robust tool for early AD detection. This study highlights the effectiveness of ensemble learning in AD diagnosis and shows promise for clinical applications.

Evaluation of synthetic images derived from a neural network in pediatric brain magnetic resonance imaging.

Nagaraj UD, Meineke J, Sriwastwa A, Tkach JA, Leach JL, Doneva M

pubmed logopapersMay 17 2025
Synthetic MRI (SyMRI) is a technique used to estimate tissue properties and generate multiple MR sequence contrasts from a single acquisition. However, image quality can be suboptimal. To evaluate a neural network approach using artificial intelligence-based direct contrast synthesis (AI-DCS) of the multi-contrast weighted images to improve image quality. This prospective, IRB approved study enrolled 50 pediatric patients undergoing clinical brain MRI. In addition to the standard of care (SOC) clinical protocol, 2D multi-delay multi-echo (MDME) sequence was obtained. SOC 3D T1-weighted (T1W), 2D T2-weighted (T2W) and 2D T2W fluid-attenuated inversion recovery (FLAIR) images from 35 patients were used to train a neural network generating synthetic T1W, T2W, and FLAIR images. Quantitative analysis of grey matter (GM) and white matter (WM) apparent signal to noise (aSNR) and grey-white matter (GWM) apparent contrast to noise (aCNR) ratios was performed. 8 patients were evaluated. When compared to SyMRI, T1W AI-DCS had better overall image quality, reduced noise/artifacts, and better subjective SNR in 100 % (16/16) of evaluations. When compared to SyMRI, T2W AI-DCS overall image quality and diagnostic confidence was better in 93.8 % (15/16) and 87.5 % (14/16) of evaluations, respectively. When compared to SyMRI, FLAIR AI-DCS was better in 93.8 % (15/16) of evaluations in overall image quality and in 100 % (16/16) of evaluations for noise/artifacts and subjective SNR. Quantitative analysis revealed higher WM aSNR compared with SyMRI (p < 0.05) for T1W, T2W and FLAIR. AI-DCS demonstrates better overall image quality than SyMRI on T1W, T2W and FLAIR.

CorBenchX: Large-Scale Chest X-Ray Error Dataset and Vision-Language Model Benchmark for Report Error Correction

Jing Zou, Qingqiu Li, Chenyu Lian, Lihao Liu, Xiaohan Yan, Shujun Wang, Jing Qin

arxiv logopreprintMay 17 2025
AI-driven models have shown great promise in detecting errors in radiology reports, yet the field lacks a unified benchmark for rigorous evaluation of error detection and further correction. To address this gap, we introduce CorBenchX, a comprehensive suite for automated error detection and correction in chest X-ray reports, designed to advance AI-assisted quality control in clinical practice. We first synthesize a large-scale dataset of 26,326 chest X-ray error reports by injecting clinically common errors via prompting DeepSeek-R1, with each corrupted report paired with its original text, error type, and human-readable description. Leveraging this dataset, we benchmark both open- and closed-source vision-language models,(e.g., InternVL, Qwen-VL, GPT-4o, o4-mini, and Claude-3.7) for error detection and correction under zero-shot prompting. Among these models, o4-mini achieves the best performance, with 50.6 % detection accuracy and correction scores of BLEU 0.853, ROUGE 0.924, BERTScore 0.981, SembScore 0.865, and CheXbertF1 0.954, remaining below clinical-level accuracy, highlighting the challenge of precise report correction. To advance the state of the art, we propose a multi-step reinforcement learning (MSRL) framework that optimizes a multi-objective reward combining format compliance, error-type accuracy, and BLEU similarity. We apply MSRL to QwenVL2.5-7B, the top open-source model in our benchmark, achieving an improvement of 38.3% in single-error detection precision and 5.2% in single-error correction over the zero-shot baseline.

Measurement Score-Based Diffusion Model

Chicago Y. Park, Shirin Shoushtari, Hongyu An, Ulugbek S. Kamilov

arxiv logopreprintMay 17 2025
Diffusion models are widely used in applications ranging from image generation to inverse problems. However, training diffusion models typically requires clean ground-truth images, which are unavailable in many applications. We introduce the Measurement Score-based diffusion Model (MSM), a novel framework that learns partial measurement scores using only noisy and subsampled measurements. MSM models the distribution of full measurements as an expectation over partial scores induced by randomized subsampling. To make the MSM representation computationally efficient, we also develop a stochastic sampling algorithm that generates full images by using a randomly selected subset of partial scores at each step. We additionally propose a new posterior sampling method for solving inverse problems that reconstructs images using these partial scores. We provide a theoretical analysis that bounds the Kullback-Leibler divergence between the distributions induced by full and stochastic sampling, establishing the accuracy of the proposed algorithm. We demonstrate the effectiveness of MSM on natural images and multi-coil MRI, showing that it can generate high-quality images and solve inverse problems -- all without access to clean training data. Code is available at https://github.com/wustl-cig/MSM.
Page 210 of 2522511 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.