Latest Papers on Radiology AI. Tags: Benchmark SOTA

Lack of children in public medical imaging data points to growing age bias in biomedical AI

Hua, S. B. Z., Heller, N., He, P., Towbin, A. J., Chen, I., Lu, A., Erdman, L.

•preprint•Jun 7 2025

Artificial intelligence (AI) is rapidly transforming healthcare, but its benefits are not reaching all patients equally. Children remain overlooked with only 17% of FDA-approved medical AI devices labeled for pediatric use. In this work, we demonstrate that this exclusion may stem from a fundamental data gap. Our systematic review of 181 public medical imaging datasets reveals that children represent just under 1% of available data, while the majority of machine learning imaging conference papers we surveyed utilized publicly available data for methods development. Much like systematic biases of other kinds in model development, past studies have demonstrated the manner in which pediatric representation in data used for models intended for the pediatric population is essential for model performance in that population. We add to these findings, showing that adult-trained chest radiograph models exhibit significant age bias when applied to pediatric populations, with higher false positive rates in younger children. This work underscores the urgent need for increased pediatric representation in publicly accessible medical datasets. We provide actionable recommendations for researchers, policymakers, and data curators to address this age equity gap and ensure AI benefits patients of all ages. 1-2 sentence summaryOur analysis reveals a critical healthcare age disparity: children represent less than 1% of public medical imaging datasets. This gap in representation leads to biased predictions across medical image foundation models, with the youngest patients facing the highest risk of misdiagnosis.

X-Ray Classification Chest Review In Silico Academic Lab Ethics Policy Benchmark SOTA

Detecting neurodegenerative changes in glaucoma using deep mean kurtosis-curve-corrected tractometry

Kasa, L. W., Schierding, W., Kwon, E., Holdsworth, S., Danesh-Meyer, H. V.

•preprint•Jun 6 2025

Glaucoma is increasingly recognized as a neurodegenerative condition involving both retinal and central nervous system structures. Here, we present an integrated framework that combines MK-Curve-corrected diffusion kurtosis imaging (DKI), tractometry, and deep autoencoder-based normative modeling to detect localized white matter abnormalities associated with glaucoma. Using UK Biobank diffusion MRI data, we show that MK-Curve approach corrects anatomically implausible values and improves the reliability of DKI metrics - particularly mean (MK), radial (RK), and axial kurtosis (AK) - in regions of complex fiber architecture. Tractometry revealed reduced MK in glaucoma patients along the optic radiation, inferior longitudinal fasciculus, and inferior fronto-occipital fasciculus, but not in a non-visual control tract, supporting disease specificity. These abnormalities were spatially localized, with significant changes observed at multiple points along the tracts. MK demonstrated greater sensitivity than MD and exhibited altered distributional features, reflecting microstructural heterogeneity not captured by standard metrics. Node-wise MK values in the right optic radiation showed weak but significant correlations with retinal OCT measures (ganglion cell layer and retinal nerve fiber layer thickness), reinforcing the biological relevance of these findings. Deep autoencoder-based modeling further enabled subject-level anomaly detection that aligned spatially with group-level changes and outperformed traditional approaches. Together, our results highlight the potential of advanced diffusion modeling and deep learning for sensitive, individualized detection of glaucomatous neurodegeneration and support their integration into future multimodal imaging pipelines in neuro-ophthalmology.

MRI Detection Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Full Conformal Adaptation of Medical Vision-Language Models

Julio Silva-Rodríguez, Leo Fillioux, Paul-Henry Cournède, Maria Vakalopoulou, Stergios Christodoulidis, Ismail Ben Ayed, Jose Dolz

•preprint•Jun 6 2025

Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities and are being progressively integrated into medical image analysis. Although its discriminative potential has been widely explored, its reliability aspect remains overlooked. This work investigates their behavior under the increasingly popular split conformal prediction (SCP) framework, which theoretically guarantees a given error level on output sets by leveraging a labeled calibration set. However, the zero-shot performance of VLMs is inherently limited, and common practice involves few-shot transfer learning pipelines, which cannot absorb the rigid exchangeability assumptions of SCP. To alleviate this issue, we propose full conformal adaptation, a novel setting for jointly adapting and conformalizing pre-trained foundation models, which operates transductively over each test data point using a few-shot adaptation set. Moreover, we complement this framework with SS-Text, a novel training-free linear probe solver for VLMs that alleviates the computational cost of such a transductive approach. We provide comprehensive experiments using 3 different modality-specialized medical VLMs and 9 adaptation tasks. Our framework requires exactly the same data as SCP, and provides consistent relative improvements of up to 27% on set efficiency while maintaining the same coverage guarantees.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Inconsistency of AI in intracranial aneurysm detection with varying dose and image reconstruction.

Goelz L, Laudani A, Genske U, Scheel M, Bohner G, Bauknecht HC, Mutze S, Hamm B, Jahnke P

•papers•Jun 6 2025

Scanner-related changes in data quality are common in medical imaging, yet monitoring their impact on diagnostic AI performance remains challenging. In this study, we performed standardized consistency testing of an FDA-cleared and CE-marked AI for triage and notification of intracranial aneurysms across changes in image data quality caused by dose and image reconstruction. Our assessment was based on repeated examinations of a head CT phantom designed for AI evaluation, replicating a patient with three intracranial aneurysms in the anterior, middle and posterior circulation. We show that the AI maintains stable performance within the medium dose range but produces inconsistent results at reduced dose and, unexpectedly, at higher dose when filtered back projection is used. Data quality standards required for AI are stricter than those for neuroradiologists, who report higher aneurysm visibility rates and experience performance degradation only at substantially lower doses, with no decline at higher doses.

CT Detection Neurological Retrospective Clinical Phantom/Animal FDA 510(k)Academic Lab Benchmark SOTA

Predicting infarct outcomes after extended time window thrombectomy in large vessel occlusion using knowledge guided deep learning.

Dai L, Yuan L, Zhang H, Sun Z, Jiang J, Li Z, Li Y, Zha Y

•papers•Jun 6 2025

Predicting the final infarct after an extended time window mechanical thrombectomy (MT) is beneficial for treatment planning in acute ischemic stroke (AIS). By introducing guidance from prior knowledge, this study aims to improve the accuracy of the deep learning model for post-MT infarct prediction using pre-MT brain perfusion data. This retrospective study collected CT perfusion data at admission for AIS patients receiving MT over 6 hours after symptom onset, from January 2020 to December 2024, across three centers. Infarct on post-MT diffusion weighted imaging served as ground truth. Five Swin transformer based models were developed for post-MT infarct segmentation using pre-MT CT perfusion parameter maps: BaselineNet served as the basic model for comparative analysis, CollateralFlowNet included a collateral circulation evaluation score, InfarctProbabilityNet incorporated infarct probability mapping, ArterialTerritoryNet was guided by artery territory mapping, and UnifiedNet combined all prior knowledge sources. Model performance was evaluated using the Dice coefficient and intersection over union (IoU). A total of 221 patients with AIS were included (65.2% women) with a median age of 73 years. Baseline ischemic core based on CT perfusion threshold achieved a Dice coefficient of 0.50 and IoU of 0.33. BaselineNet improved to a Dice coefficient of 0.69 and IoU of 0.53. Compared with BaselineNet, models incorporating medical knowledge demonstrated higher performance: CollateralFlowNet (Dice coefficient 0.72, IoU 0.56), InfarctProbabilityNet (Dice coefficient 0.74, IoU 0.58), ArterialTerritoryNet (Dice coefficient 0.75, IoU 0.60), and UnifiedNet (Dice coefficient 0.82, IoU 0.71) (all P<0.05). In this study, integrating medical knowledge into deep learning models enhanced the accuracy of infarct predictions in AIS patients undergoing extended time window MT.

CT Segmentation Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

The Predictive Value of Multiparameter Characteristics of Coronary Computed Tomography Angiography for Coronary Stent Implantation.

Xu X, Wang Y, Yang T, Wang Z, Chu C, Sun L, Zhao Z, Li T, Yu H, Wang X, Song P

•papers•Jun 6 2025

This study aims to evaluate the predictive value of multiparameter characteristics of coronary computed tomography angiography (CCTA) plaque and the ratio of coronary artery volume to myocardial mass (V/M) in guiding percutaneous coronary stent implantation (PCI) in patients diagnosed with unstable angina. Patients who underwent CCTA and coronary angiography (CAG) within 2 months were retrospectively analyzed. According to CAG results, patients were divided into a medical therapy group (n=41) and a PCI revascularization group (n=37). The plaque characteristics and V/M were quantitatively evaluated. The parameters included minimum lumen area at stenosis (MLA), maximum area stenosis (MAS), maximum diameter stenosis (MDS), total plaque burden (TPB), plaque length, plaque volume, and each component volume within the plaque. Fractional flow reserve (FFR) and pericoronary fat attenuation index (FAI) were calculated based on CCTA. Artificial intelligence software was employed to compare the differences in each parameter between the 2 groups at both the vessel and plaque levels. The PCI group had higher MAS, MDS, TPB, FAI, noncalcified plaque volume and lipid plaque volume, and significantly lower V/M, MLA, and CT-derived fractional flow reserve (FFRCT). V/M, TPB, MLA, FFRCT, and FAI are important influencing factors of PCI. The combined model of MLA, FFRCT, and FAI had the largest area under the ROC curve (AUC=0.920), and had the best performance in predicting PCI. The integration of AI-derived multiparameter features from one-stop CCTA significantly enhances the accuracy of predicting PCI in angina pectoris patients, evaluating at the plaque, vessel, and patient levels.

CT Classification Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Comparative analysis of convolutional neural networks and vision transformers in identifying benign and malignant breast lesions.

Wang L, Fang S, Chen X, Pan C, Meng M

•papers•Jun 6 2025

Various deep learning models have been developed and employed for medical image classification. This study conducted comprehensive experiments on 12 models, aiming to establish reliable benchmarks for research on breast dynamic contrast-enhanced magnetic resonance imaging image classification. Twelve deep learning models were systematically compared by analyzing variations in 4 key hyperparameters: optimizer (Op), learning rate, batch size (BS), and data augmentation. The evaluation criteria encompassed a comprehensive set of metrics including accuracy (Ac), loss value, precision, recall rate, F1-score, and area under the receiver operating characteristic curve. Furthermore, the training times and model parameter counts were assessed for holistic performance comparison. Adjustments in the BS within Adam Op had a minimal impact on Ac in the convolutional neural network models. However, altering the Op and learning rate while maintaining the same BS significantly affected the Ac. The ResNet152 network model exhibited the lowest Ac. Both the recall rate and area under the receiver operating characteristic curve for the ResNet152 and Vision transformer-base (ViT) models were inferior compared to the others. Data augmentation unexpectedly reduced the Ac of ResNet50, ResNet152, VGG16, VGG19, and ViT models. The VGG16 model boasted the shortest training duration, whereas the ViT model, before data augmentation, had the longest training time and smallest model weight. The ResNet152 and ViT models were not well suited for image classification tasks involving small breast dynamic contrast-enhanced magnetic resonance imaging datasets. Although data augmentation is typically beneficial, its application should be approached cautiously. These findings provide important insights to inform and refine future research in this domain.

MRI Classification Breast Methodology In Silico Academic Lab Benchmark SOTA

Query Nearby: Offset-Adjusted Mask2Former enhances small-organ segmentation

Xin Zhang, Dongdong Meng, Sheng Li

•preprint•Jun 6 2025

Medical segmentation plays an important role in clinical applications like radiation therapy and surgical guidance, but acquiring clinically acceptable results is difficult. In recent years, progress has been witnessed with the success of utilizing transformer-like models, such as combining the attention mechanism with CNN. In particular, transformer-based segmentation models can extract global information more effectively, compensating for the drawbacks of CNN modules that focus on local features. However, utilizing transformer architecture is not easy, because training transformer-based models can be resource-demanding. Moreover, due to the distinct characteristics in the medical field, especially when encountering mid-sized and small organs with compact regions, their results often seem unsatisfactory. For example, using ViT to segment medical images directly only gives a DSC of less than 50\%, which is far lower than the clinically acceptable score of 80\%. In this paper, we used Mask2Former with deformable attention to reduce computation and proposed offset adjustment strategies to encourage sampling points within the same organs during attention weights computation, thereby integrating compact foreground information better. Additionally, we utilized the 4th feature map in Mask2Former to provide a coarse location of organs, and employed an FCN-based auxiliary head to help train Mask2Former more quickly using Dice loss. We show that our model achieves SOTA (State-of-the-Art) performance on the HaNSeg and SegRap2023 datasets, especially on mid-sized and small organs.Our code is available at link https://github.com/earis/Offsetadjustment\_Background-location\_Decoder\_Mask2former.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA Open Code

Magnetic resonance imaging and the evaluation of vestibular schwannomas: a systematic review

Lee, K. S., Wijetilake, N., Connor, S., Vercauteren, T., Shapey, J.

•preprint•Jun 6 2025

IntroductionThe assessment of vestibular schwannoma (VS) requires a standardized measurement approach as growth is a key element in defining treatment strategy for VS. Volumetric measurements offer higher sensitivity and precision, but existing methods of segmentation, are labour-intensive, lack standardisation and are prone to variability and subjectivity. A new core set of measurement indicators reported consistently, will support clinical decision-making and facilitate evidence synthesis. This systematic review aimed to identify indicators used in 1) magnetic resonance imaging (MRI) acquisition and 2) measurement or 3) growth of VS. This work is expected to inform a Delphi consensus. MethodsSystematic searches of Medline, Embase and Cochrane Central were undertaken on 4th October 2024. Studies that assessed the evaluation of VS with MRI, between 2014 and 2024 were included. ResultsThe final dataset consisted of 102 studies and 19001 patients. Eighty-six (84.3%) studies employed post contrast T1 as the MRI acquisition of choice for evaluating VS. Nine (8.8%) studies additionally employed heavily weighted T2 sequences such as constructive interference in steady state (CISS) and FIESTA-C. Only 45 (44.1%) studies reported the slice thickness with the majority 38 (84.4%) choosing <3mm in thickness. Fifty-eight (56.8%) studies measured volume whilst 49 (48.0%) measured the largest linear dimension; 14 (13.7%) studies used both measurements. Four studies employed semi-automated or automated segmentation processes to measure the volumes of VS. Of 68 studies investigating growth, 54 (79.4%) provided a threshold. Significant variation in volumetric growth was observed but the threshold for significant percentage change reported by most studies was 20% (n = 18). ConclusionSubstantial variation in MRI acquisition, and methods for evaluating measurement and growth of VS, exists across the literature. This lack of standardization is likely attributed to resource constraints and the fact that currently available volumetric segmentation methods are very labour-intensive. Following the identification of the indicators employed in the literature, this study aims to develop a Delphi consensus for the standardized measurement of VS and uptake in employing a data-driven artificial intelligence-based measuring tools.

MRI Segmentation Neurological Review Post Market Academic Lab Benchmark SOTA

GNNs surpass transformers in tumor medical image segmentation.

Xiao H, Yang G, Li Z, Yi C

•papers•Jun 5 2025

To assess the suitability of Transformer-based architectures for medical image segmentation and investigate the potential advantages of Graph Neural Networks (GNNs) in this domain. We analyze the limitations of the Transformer, which models medical images as sequences of image patches, limiting its flexibility in capturing complex and irregular tumor structures. To address it, we propose U-GNN, a pure GNN-based U-shaped architecture designed for medical image segmentation. U-GNN retains the U-Net-inspired inductive bias while leveraging GNNs' topological modeling capabilities. The architecture consists of Vision GNN blocks stacked into a U-shaped structure. Additionally, we introduce the concept of multi-order similarity and propose a zero-computation-cost approach to incorporate higher-order similarity in graph construction. Each Vision GNN block segments the image into patch nodes, constructs multi-order similarity graphs, and aggregates node features via multi-order node information aggregation. Experimental evaluations on multi-organ and cardiac segmentation datasets demonstrate that U-GNN significantly outperforms existing CNN- and Transformer-based models. U-GNN achieves a 6% improvement in Dice Similarity Coefficient (DSC) and an 18% reduction in Hausdorff Distance (HD) compared to state-of-the-art methods. The source code will be released upon paper acceptance.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

Lack of children in public medical imaging data points to growing age bias in biomedical AI

Detecting neurodegenerative changes in glaucoma using deep mean kurtosis-curve-corrected tractometry

Full Conformal Adaptation of Medical Vision-Language Models

Inconsistency of AI in intracranial aneurysm detection with varying dose and image reconstruction.

Predicting infarct outcomes after extended time window thrombectomy in large vessel occlusion using knowledge guided deep learning.

The Predictive Value of Multiparameter Characteristics of Coronary Computed Tomography Angiography for Coronary Stent Implantation.

Comparative analysis of convolutional neural networks and vision transformers in identifying benign and malignant breast lesions.

Query Nearby: Offset-Adjusted Mask2Former enhances small-organ segmentation

Magnetic resonance imaging and the evaluation of vestibular schwannomas: a systematic review

GNNs surpass transformers in tumor medical image segmentation.

Ready to Sharpen Your Edge?