Latest Papers on Radiology AI. Tags: Benchmark SOTA

Development of a deep learning-based automated diagnostic system (DLADS) for classifying mammographic lesions - a first large-scale multi-institutional clinical trial in Japan.

Yamaguchi T, Koyama Y, Inoue K, Ban K, Hirokaga K, Kujiraoka Y, Okanami Y, Shinohara N, Tsunoda H, Uematsu T, Mukai H

•papers•Jul 3 2025

Recently, western countries have built evidence on mammographic artificial Intelligence-computer-aided diagnosis (AI-CADx) systems; however, their effectiveness has not yet been sufficiently validated in Japanese women. In this study, we aimed to establish a Japanese mammographic AI-CADx system for the first time. We retrospectively collected screening or diagnostic mammograms from 63 institutions in Japan. We then randomly divided the images into training, validation, and test datasets in a balanced ratio of 8:1:1 on a case-level basis. The gold standard of annotation for the AI-CADx system is mammographic findings based on pathologic references. The AI-CADx system was developed using SE-ResNet modules and a sliding window algorithm. A cut-off concentration gradient of the heatmap image was set at 15%. The AI-CADx system was considered accurate if it detected the presence of a malignant lesion in a breast cancer mammogram. The primary endpoint of the AI-CADx system was defined as a sensitivity and specificity of over 80% for breast cancer diagnosis in the test dataset. We collected 20,638 mammograms from 11,450 Japanese women with a median age of 55 years. The mammograms included 5019 breast cancer (24.3%), 5026 benign (24.4%), and 10,593 normal (51.3%) mammograms. In the test dataset of 2059 mammograms, the AI-CADx system achieved a sensitivity of 83.5% and a specificity of 84.7% for breast cancer diagnosis. The AUC in the test dataset was 0.841 (DeLong 95% CI; 0.822-0.859). The Accuracy was almost consistent independent of breast density, mammographic findings, type of cancer, and mammography vendors (AUC (range); 0.639-0.906). The developed Japanese mammographic AI-CADx system diagnosed breast cancer with a pre-specified sensitivity and specificity. We are planning a prospective study to validate the breast cancer diagnostic performance of Japanese physicians using this AI-CADx system as a second reader. UMIN, trial number UMIN000039009. Registered 26 December 2019, https://www.umin.ac.jp/ctr/.

Mammography Classification Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Interpretable and generalizable deep learning model for preoperative assessment of microvascular invasion and outcome in hepatocellular carcinoma based on MRI: a multicenter study.

Dong X, Jia X, Zhang W, Zhang J, Xu H, Xu L, Ma C, Hu H, Luo J, Zhang J, Wang Z, Ji W, Yang D, Yang Z

•papers•Jul 3 2025

This study aimed to develop an interpretable, domain-generalizable deep learning model for microvascular invasion (MVI) assessment in hepatocellular carcinoma (HCC). Utilizing a retrospective dataset of 546 HCC patients from five centers, we developed and validated a clinical-radiological model and deep learning models aimed at MVI prediction. The models were developed on a dataset of 263 cases consisting of data from three centers, internally validated on a set of 66 patients, and externally tested on two independent sets. An adversarial network-based deep learning (AD-DL) model was developed to learn domain-invariant features from multiple centers within the training set. The area under the receiver operating characteristic curve (AUC) was calculated using pathological MVI status. With the best-performed model, early recurrence-free survival (ERFS) stratification was validated on the external test set by the log-rank test, and the differentially expressed genes (DEGs) associated with MVI status were tested on the RNA sequencing analysis of the Cancer Imaging Archive. The AD-DL model demonstrated the highest diagnostic performance and generalizability with an AUC of 0.793 in the internal test set, 0.801 in external test set 1, and 0.773 in external test set 2. The model's prediction of MVI status also demonstrated a significant correlation with ERFS (p = 0.048). DEGs associated with MVI status were primarily enriched in the metabolic processes and the Wnt signaling pathway, and the epithelial-mesenchymal transition process. The AD-DL model allows preoperative MVI prediction and ERFS stratification in HCC patients, which has a good generalizability and biological interpretability. The adversarial network-based deep learning model predicts MVI status well in HCC patients and demonstrates good generalizability. By integrating bioinformatics analysis of the model's predictions, it achieves biological interpretability, facilitating its clinical translation. Current MVI assessment models for HCC lack interpretability and generalizability. The adversarial network-based model's performance surpassed clinical radiology and squeeze-and-excitation network-based models. Biological function analysis was employed to enhance the interpretability and clinical translatability of the adversarial network-based model.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Cross-validation of an artificial intelligence tool for fracture classification and localization on conventional radiography in Dutch population.

Ruitenbeek HC, Sahil S, Kumar A, Kushawaha RK, Tanamala S, Sathyamurthy S, Agrawal R, Chattoraj S, Paramasamy J, Bos D, Fahimi R, Oei EHG, Visser JJ

•papers•Jul 3 2025

The aim of this study is to validate the effectiveness of an AI tool trained on Indian data in a Dutch medical center and to assess its ability to classify and localize fractures. Conventional radiographs acquired between January 2019 and November 2022 were analyzed using a multitask deep neural network. The tool, trained on Indian data, identified and localized fractures in 17 body parts. The reference standard was based on radiology reports resulting from routine clinical workflow and confirmed by an experienced musculoskeletal radiologist. The analysis included both patient-wise and fracture-wise evaluations, employing binary and Intersection over Union (IoU) metrics to assess fracture detection and localization accuracy. In total, 14,311 radiographs (median age, 48 years (range 18-98), 7265 male) were analyzed and categorized by body parts; clavicle, shoulder, humerus, elbow, forearm, wrist, hand and finger, pelvis, hip, femur, knee, lower leg, ankle, foot and toe. 4156/14,311 (29%) had fractures. The AI tool demonstrated overall patient-wise sensitivity, specificity, and AUC of 87.1% (95% CI: 86.1-88.1%), 87.1% (95% CI: 86.4-87.7%), and 0.92 (95% CI: 0.91-0.93), respectively. Fracture detection rate was 60% overall, ranging from 7% for rib fractures to 90% for clavicle fractures. This study validates a fracture detection AI tool on a Western-European dataset, originally trained on Indian data. While classification performance is robust on real clinical data, fracture-wise analysis reveals variability in localization accuracy, underscoring the need for refinement in fracture localization. AI may provide help by enabling optimal use of limited resources or personnel. This study evaluates an AI tool designed to aid in detecting fractures, possibly reducing reading time or optimization of radiology workflow by prioritizing fracture-positive cases. Cross-validation on a consecutive Dutch cohort confirms this AI tool's clinical robustness. The tool detected fractures with 87% sensitivity, 87% specificity, and 0.92 AUC. AI localizes 60% of fractures, the highest for clavicle (90%) and lowest for ribs (7%).

X-Ray Detection Musculoskeletal Retrospective Clinical Clinical Pilot Startup Benchmark SOTA

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

Zunhui Xia, Hongxing Li, Libin Lan

•preprint•Jul 3 2025

Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is explicitly designed to attend to the most relevant content. In addition, a detailed theoretical analysis has been conducted, demonstrating that MedFormer has superior generality and efficiency in comparison to existing medical vision transformers. Extensive experiments on a variety of imaging modality datasets consistently show that MedFormer is highly effective in enhancing performance across all three above-mentioned medical image recognition tasks. The code is available at https://github.com/XiaZunhui/MedFormer.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

A deep active learning framework for mitotic figure detection with minimal manual annotation and labelling.

Liu E, Lin A, Kakodkar P, Zhao Y, Wang B, Ling C, Zhang Q

•papers•Jul 3 2025

Accurately and efficiently identifying mitotic figures (MFs) is crucial for diagnosing and grading various cancers, including glioblastoma (GBM), a highly aggressive brain tumour requiring precise and timely intervention. Traditional manual counting of MFs in whole slide images (WSIs) is labour-intensive and prone to interobserver variability. Our study introduces a deep active learning framework that addresses these challenges with minimal human intervention. We utilized a dataset of GBM WSIs from The Cancer Genome Atlas (TCGA). Our framework integrates convolutional neural networks (CNNs) with an active learning strategy. Initially, a CNN is trained on a small, annotated dataset. The framework then identifies uncertain samples from the unlabelled data pool, which are subsequently reviewed by experts. These ambiguous cases are verified and used for model retraining. This iterative process continues until the model achieves satisfactory performance. Our approach achieved 81.75% precision and 82.48% recall for MF detection. For MF subclass classification, it attained an accuracy of 84.1%. Furthermore, this approach significantly reduced annotation time - approximately 900 min across 66 WSIs - cutting the effort nearly in half compared to traditional methods. Our deep active learning framework demonstrates a substantial improvement in both efficiency and accuracy for MF detection and classification in GBM WSIs. By reducing reliance on large annotated datasets, it minimizes manual effort while maintaining high performance. This methodology can be generalized to other medical imaging tasks, supporting broader applications in the healthcare domain.

Mixed Modality Detection Neurological Methodology In Silico Academic Lab Benchmark SOTA

PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Michal Golovanevsky, Pranav Mahableshwarkar, Carsten Eickhoff, Ritambhara Singh

•preprint•Jul 3 2025

Multimodal deep learning holds promise for improving clinical prediction by integrating diverse patient data, including text, imaging, time-series, and structured demographics. Contrastive learning facilitates this integration by producing a unified representation that can be reused across tasks, reducing the need for separate models or encoders. Although contrastive learning has seen success in vision-language domains, its use in clinical settings remains largely limited to image and text pairs. We propose the Pipeline for Contrastive Modality Evaluation and Encoding (PiCME), which systematically assesses five clinical data types from MIMIC: discharge summaries, radiology reports, chest X-rays, demographics, and time-series. We pre-train contrastive models on all 26 combinations of two to five modalities and evaluate their utility on in-hospital mortality and phenotype prediction. To address performance plateaus with more modalities, we introduce a Modality-Gated LSTM that weights each modality according to its contrastively learned importance. Our results show that contrastive models remain competitive with supervised baselines, particularly in three-modality settings. Performance declines beyond three modalities, which supervised models fail to recover. The Modality-Gated LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from 51.27% to 62.26% in the five-modality setting. We also compare contrastively learned modality importance scores with attribution scores and evaluate generalization across demographic subgroups, highlighting strengths in interpretability and fairness. PiCME is the first to scale contrastive learning across all modality combinations in MIMIC, offering guidance for modality selection, training strategies, and equitable clinical prediction.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Lisa Herzog, Pascal Bühler, Ezequiel de la Rosa, Beate Sick, Susanne Wegener

•preprint•Jul 3 2025

Mechanical thrombectomy has become the standard of care in patients with stroke due to large vessel occlusion (LVO). However, only 50% of successfully treated patients show a favorable outcome. We developed and evaluated interpretable deep learning models to predict functional outcomes in terms of the modified Rankin Scale score alongside individualized treatment effects (ITEs) using data of 449 LVO stroke patients from a randomized clinical trial. Besides clinical variables, we considered non-contrast CT (NCCT) and angiography (CTA) scans which were integrated using novel foundation models to make use of advanced imaging information. Clinical variables had a good predictive power for binary functional outcome prediction (AUC of 0.719 [0.666, 0.774]) which could slightly be improved when adding CTA imaging (AUC of 0.737 [0.687, 0.795]). Adding NCCT scans or a combination of NCCT and CTA scans to clinical features yielded no improvement. The most important clinical predictor for functional outcome was pre-stroke disability. While estimated ITEs were well calibrated to the average treatment effect, discriminatory ability was limited indicated by a C-for-Benefit statistic of around 0.55 in all models. In summary, the models allowed us to jointly integrate CT imaging and clinical features while achieving state-of-the-art prediction performance and ITE estimates. Yet, further research is needed to particularly improve ITE estimation.

Mixed Modality Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Predicting Ten-Year Clinical Outcomes in Multiple Sclerosis with Radiomics-Based Machine Learning Models.

Tranfa M, Petracca M, Cuocolo R, Ugga L, Morra VB, Carotenuto A, Elefante A, Falco F, Lanzillo R, Moccia M, Scaravilli A, Brunetti A, Cocozza S, Quarantelli M, Pontillo G

•papers•Jul 3 2025

Identifying patients with multiple sclerosis (pwMS) at higher risk of clinical progression is essential to inform clinical management. We aimed to build prognostic models using machine learning (ML) algorithms predicting long-term clinical outcomes based on a systematic mapping of volumetric, radiomic, and macrostructural disconnection features from routine brain MRI scans of pwMS. In this longitudinal monocentric study, 3T structural MRI scans of pwMS were retrospectively analyzed. Based on a ten-year clinical follow-up (average duration=9.4±1.1 years), patients were classified according to confirmed disability progression (CDP) and cognitive impairment (CI) as assessed through the Expanded Disability Status Scale (EDSS) and the Brief International Cognitive Assessment of Multiple Sclerosis (BICAMS) battery, respectively. 3D-T1w and FLAIR images were automatically segmented to obtain volumes, disconnection scores (estimated based on lesion masks and normative tractography data), and radiomic features from 116 gray matter regions defined according to the Automated Anatomical Labelling (AAL) atlas. Three ML algorithms (Extra Trees, Logistic Regression, and Support Vector Machine) were used to build models predicting long-term CDP and CI based on MRI-derived features. Feature selection was performed on the training set with a multi-step process, and models were validated with a holdout approach, randomly splitting the patients into training (75%) and test (25%) sets. We studied 177 pwMS (M/F = 51/126; mean±SD age: 35.2±8.7 years). Long-term CDP and CI were observed in 71 and 55 patients, respectively. Regarding the CDP class prediction analysis, the feature selection identified 13-, 12-, and 10-feature subsets obtaining an accuracy on the test set of 0.71, 0.69, and 0.67 for the Extra Trees, Logistic Regression, and Support Vector Machine classifiers, respectively. Similarly, for the CI prediction, subsets of 16, 17, and 19 features were selected, with 0.69, 0.64, and 0.62 accuracy values on the test set, respectively. There were no significant differences in accuracy between ML models for CDP (p=0.65) or CI (p=0.31). Building on quantitative features derived from conventional MRI scans, we obtained long-term prognostic models, potentially informing patients' stratification and clinical decision-making. MS, multiple sclerosis; pwMS, people with MS; HC, healthy controls; ML, machine learning; DD, disease duration; EDSS, Expanded Disability Status Scale; TLV, total lesion volume; CDP, confirmed disability progression; CI, cognitive impairment; BICAMS, Brief International Cognitive Assessment of Multiple Sclerosis.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Quantification of Optical Coherence Tomography Features in >3500 Patients with Inherited Retinal Disease Reveals Novel Genotype-Phenotype Associations

Woof, W. A., de Guimaraes, T. A. C., Al-Khuzaei, S., Daich Varela, M., Shah, M., Naik, G., Sen, S., Bagga, P., Chan, Y. W., Mendes, B. S., Lin, S., Ghoshal, B., Liefers, B., Fu, D. J., Georgiou, M., da Silva, A. S., Nguyen, Q., Liu, Y., Fujinami-Yokokawa, Y., Sumodhee, D., Furman, J., Patel, P. J., Moghul, I., Moosajee, M., Sallum, J., De Silva, S. R., Lorenz, B., Herrmann, P., Holz, F. G., Fujinami, K., Webster, A. R., Mahroo, O. A., Downes, S. M., Madhusudhan, S., Balaskas, K., Michaelides, M., Pontikos, N.

•preprint•Jul 3 2025

PurposeTo quantify spectral-domain optical coherence tomography (SD-OCT) images cross-sectionally and longitudinally in a large cohort of molecularly characterized patients with inherited retinal disease (IRDs) from the UK. DesignRetrospective study of imaging data. ParticipantsPatients with a clinical and molecularly confirmed diagnosis of IRD who have undergone macular SD-OCT imaging at Moorfields Eye Hospital (MEH) between 2011 and 2019. We retrospectively identified 4,240 IRD patients from the MEH database (198 distinct IRD genes), including 69,664 SD-OCT macular volumes. MethodsEight features of interest were defined: retina, fovea, intraretinal cystic spaces (ICS), subretinal fluid (SRF), subretinal hyper-reflective material (SHRM), pigment epithelium detachment (PED), ellipsoid zone loss (EZ-loss) and retinal pigment epithelium loss (RPE-loss). Manual annotations of five b-scans per SD-OCT volume was performed for the retinal features by four graders based on a defined grading protocol. A total of 1,749 b-scans from 360 SD-OCT volumes across 275 patients were annotated for the eight retinal features for training and testing of a neural-network-based segmentation model, AIRDetect-OCT, which was then applied to the entire imaging dataset. Main Outcome MeasuresPerformance of AIRDetect-OCT, comparing to inter-grader agreement was evaluated using Dice score on a held-out dataset. Feature prevalence, volume and area were analysed cross-sectionally and longitudinally. ResultsThe inter-grader Dice score for manual segmentation was [≥]90% for retina, ICS, SRF, SHRM and PED, >77% for both EZ-loss and RPE-loss. Model-grader agreement was >80% for segmentation of retina, ICS, SRF, SHRM, and PED, and >68% for both EZ-loss and RPE-loss. Automatic segmentation was applied to 272,168 b-scans across 7,405 SD-OCT volumes from 3,534 patients encompassing 176 unique genes. Accounting for age, male patients exhibited significantly more EZ-loss (19.6mm2 vs 17.9mm2, p<2.8x10-4) and RPE-loss (7.79mm2 vs 6.15mm2, p<3.2x10-6) than females. RPE-loss was significantly higher in Asian patients than other ethnicities (9.37mm2 vs 7.29mm2, p<0.03). ICS average total volume was largest in RS1 (0.47mm3) and NR2E3 (0.25mm3), SRF in BEST1 (0.21mm3) and PED in EFEMP1 (0.34mm3). BEST1 and PROM1 showed significantly different patterns of EZ-loss (p<10-4) and RPE-loss (p<0.02) comparing the dominant to the recessive forms. Sectoral analysis revealed significantly increased EZ-loss in the inferior quadrant compared to superior quadrant for RHO ({Delta}=-0.414 mm2, p=0.036) and EYS ({Delta}=-0.908 mm2, p=1.5x10-4). In ABCA4 retinopathy, more severe genotypes (group A) were associated with faster progression of EZ-loss (2.80{+/-}0.62 mm2/yr), whilst the p.(Gly1961Glu) variant (group D) was associated with slower progression (0.56 {+/-}0.18 mm2/yr). There were also sex differences within groups with males in group A experiencing significantly faster rates of progression of RPE-loss (2.48 {+/-}1.40 mm2/yr vs 0.87 {+/-}0.62 mm2/yr, p=0.047), but lower rates in groups B, C, and D. ConclusionsAIRDetect-OCT, a novel deep learning algorithm, enables large-scale OCT feature quantification in IRD patients uncovering cross-sectional and longitudinal phenotype correlations with demographic and genotypic parameters.

OCT Segmentation Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

Kunyu Zhang, Qiang Li, Shujian Yu

•preprint•Jul 3 2025

Recent evidence suggests that modeling higher-order interactions (HOIs) in functional magnetic resonance imaging (fMRI) data can enhance the diagnostic accuracy of machine learning systems. However, effectively extracting and utilizing HOIs remains a significant challenge. In this work, we propose MvHo-IB, a novel multi-view learning framework that integrates both pairwise interactions and HOIs for diagnostic decision-making, while automatically compressing task-irrelevant redundant information. MvHo-IB introduces several key innovations: (1) a principled method that combines O-information from information theory with a matrix-based Renyi alpha-order entropy estimator to quantify and extract HOIs, (2) a purpose-built Brain3DCNN encoder to effectively utilize these interactions, and (3) a new multi-view learning information bottleneck objective to enhance representation learning. Experiments on three benchmark fMRI datasets demonstrate that MvHo-IB achieves state-of-the-art performance, significantly outperforming previous methods, including recent hypergraph-based techniques. The implementation of MvHo-IB is available at https://github.com/zky04/MvHo-IB.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

Development of a deep learning-based automated diagnostic system (DLADS) for classifying mammographic lesions - a first large-scale multi-institutional clinical trial in Japan.

Interpretable and generalizable deep learning model for preoperative assessment of microvascular invasion and outcome in hepatocellular carcinoma based on MRI: a multicenter study.

Cross-validation of an artificial intelligence tool for fracture classification and localization on conventional radiography in Dutch population.

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

A deep active learning framework for mitotic figure detection with minimal manual annotation and labelling.

PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Predicting Ten-Year Clinical Outcomes in Multiple Sclerosis with Radiomics-Based Machine Learning Models.

Quantification of Optical Coherence Tomography Features in >3500 Patients with Inherited Retinal Disease Reveals Novel Genotype-Phenotype Associations

MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

Ready to Sharpen Your Edge?