Latest Papers on Radiology AI.

Predicting Breath Hold Task Compliance From Head Motion.

Weng TB, Porwal G, Srinivasan D, Inglis B, Rodriguez S, Jacobs DR, Schreiner PJ, Sorond FA, Sidney S, Lewis C, Launer L, Erus G, Nasrallah IM, Bryan RN, Dula AN

•papers•Sep 8 2025

Cerebrovascular reactivity reflects changes in cerebral blood flow in response to an acute stimulus and is reflective of the brain's ability to match blood flow to demand. Functional MRI with a breath-hold task can be used to elicit this vasoactive response, but data validity hinges on subject compliance. Determining breath-hold compliance often requires external monitoring equipment. To develop a non-invasive and data-driven quality filter for breath-hold compliance using only measurements of head motion during imaging. Prospective cohort. Longitudinal data from healthy middle-aged subjects enrolled in the Coronary Artery Risk Development in Young Adults Brain MRI Study, N = 1141, 47.1% female. 3.0 Tesla gradient-echo MRI. Manual labelling of respiratory belt monitored data was used to determine breath hold compliance during MRI scan. A model to estimate the probability of non-compliance with the breath hold task was developed using measures of head motion. The model's ability to identify scans in which the participant was not performing the breath hold were summarized using performance metrics including sensitivity, specificity, recall, and F1 score. The model was applied to additional unmarked data to assess effects on population measures of CVR. Sensitivity analysis revealed exclusion of non-compliant scans using the developed model did not affect median cerebrovascular reactivity (Median [q1, q3] = 1.32 [0.96, 1.71]) compared to using manual review of respiratory belt data (1.33 [1.02, 1.74]) while reducing interquartile range. The final model based on a multi-layer perceptron machine learning classifier estimated non-compliance with an accuracy of 76.9% and an F1 score of 69.5%, indicating a moderate balance between precision and recall for the identification of scans in which the participant was not compliant. The developed model provides the probability of non-compliance with a breath-hold task, which could later be used as a quality filter or included in statistical analyses. TECHNICAL EFFICACY: Stage 3.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Automatic bone age assessment: a Turkish population study.

Öztürk S, Yüce M, Pamuk GG, Varlık C, Cimilli AT, Atay M

•papers•Sep 8 2025

Established methods for bone age assessment (BAA), such as the Greulich and Pyle atlas, suffer from variability due to population differences and observer discrepancies. Although automated BAA offers speed and consistency, limited research exists on its performance across different populations using deep learning. This study examines deep learning algorithms on the Turkish population to enhance bone age models by understanding demographic influences. We analyzed reports from Bağcılar Hospital's Health Information Management System between April 2012 and September 2023 using "bone age" as a keyword. Patient images were re-evaluated by an experienced radiologist and anonymized. A total of 2,730 hand radiographs from Bağcılar Hospital (Turkish population), 12,572 from the Radiological Society of North America (RSNA), and 6,185 from the Radiological Hand Pose Estimation (RHPE) public datasets were collected, along with corresponding bone ages and gender information. A random set of 546 radiographs (273 from Bağcılar, 273 from public datasets) was initially randomly split for an internal test set with bone age stratification; the remaining data were used for training and validation. BAAs were generated using a modified InceptionV3 model on 500 × 500-pixel images, selecting the model with the lowest mean absolute error (MAE) on the validation set. Three models were trained and tested based on dataset origin: Bağcılar (Turkish), public (RSNA-RHPE), and a Combined model. Internal test set predictions of the Combined model estimated bone age within less than 6, 12, 18, and 24 months at rates of 44%, 73%, 87%, and 94%, respectively. The MAE was 9.2 months in the overall internal test set, 7 months on the public test set, and 11.5 months on the Bağcılar internal test data. The Bağcılar-only model had an MAE of 12.7 months on the Bağcılar internal test data. Despite less training data, there was no significant difference between the combined and Bağcılar models on the Bağcılar dataset (P > 0.05). The public model showed an MAE of 16.5 months on the Bağcılar dataset, significantly worse than the other models (P < 0.05). We developed an automatic BAA model including the Turkish population, one of the few such studies using deep learning. Despite challenges from population differences and data heterogeneity, these models can be effectively used in various clinical settings. Model accuracy can improve over time with cumulative data, and publicly available datasets may further refine them. Our approach enables more accurate and efficient BAAs, supporting healthcare professionals where traditional methods are time-consuming and variable. The developed automated BAA model for the Turkish population offers a reliable and efficient alternative to traditional methods. By utilizing deep learning with diverse datasets from Bağcılar Hospital and publicly available sources, the model minimizes assessment time and reduces variability. This advancement enhances clinical decision-making, supports standardized BAA practices, and improves patient care in various healthcare settings.

X-Ray Registration Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

Adherence to the Checklist for Artificial Intelligence in Medical Imaging (CLAIM): an umbrella review with a comprehensive two-level analysis.

Koçak B, Köse F, Keleş A, Şendur A, Meşe İ, Karagülle M

•papers•Sep 8 2025

To comprehensively assess Checklist for Artificial Intelligence in Medical Imaging (CLAIM) adherence in medical imaging artificial intelligence (AI) literature by aggregating data from previous systematic and non-systematic reviews. A systematic search of PubMed, Scopus, and Google Scholar identified reviews using the CLAIM to evaluate medical imaging AI studies. Reviews were analyzed at two levels: review level (33 reviews; 1,458 studies) and study level (421 unique studies from 15 reviews). The CLAIM adherence metrics (scores and compliance rates), baseline characteristics, factors influencing adherence, and critiques of the CLAIM were analyzed. A review-level analysis of 26 reviews (874 studies) found a weighted mean CLAIM score of 25 [standard deviation (SD): 4] and a median of 26 [interquartile range (IQR): 8; 25th-75th percentiles: 20-28]. In a separate review-level analysis involving 18 reviews (993 studies), the weighted mean CLAIM compliance was 63% (SD: 11%), with a median of 66% (IQR: 4%; 25th-75th percentiles: 63%-67%). A study-level analysis of 421 unique studies published between 1997 and 2024 found a median CLAIM score of 26 (IQR: 6; 25th-75th percentiles: 23-29) and a median compliance of 68% (IQR: 16%; 25th-75th percentiles: 59%-75%). Adherence was independently associated with the journal impact factor quartile, publication year, and specific radiology subfields. After guideline publication, CLAIM compliance improved (P = 0.004). Multiple readers provided an evaluation in 85% (28/33) of reviews, but only 11% (3/28) included a reliability analysis. An item-wise evaluation identified 11 underreported items (missing in ≥50% of studies). Among the 10 identified critiques, the most common were item inapplicability to diverse study types and subjective interpretations of fulfillment. Our two-level analysis revealed considerable reporting gaps, underreported items, factors related to adherence, and common CLAIM critiques, providing actionable insights for researchers and journals to improve transparency, reproducibility, and reporting quality in AI studies. By combining data from systematic and non-systematic reviews on CLAIM adherence, our comprehensive findings may serve as targets to help researchers and journals improve transparency, reproducibility, and reporting quality in AI studies.

Meta Analysis Reproducibility Ethics

Evaluating artificial intelligence for a focal nodular hyperplasia diagnosis using magnetic resonance imaging: preliminary findings.

Kantarcı M, Kızılgöz V, Terzi R, Kılıç AE, Kabalcı H, Durmaz Ö, Tokgöz N, Harman M, Sağır Kahraman A, Avanaz A, Aydın S, Elpek GÖ, Yazol M, Aydınlı B

•papers•Sep 8 2025

This study aimed to evaluate the effectiveness of artificial intelligence (AI) in diagnosing focal nodular hyperplasia (FNH) of the liver using magnetic resonance imaging (MRI) and compare its performance with that of radiologists. In the first phase of the study, the MRIs of 60 patients (30 patients with FNH and 30 patients with no lesions or lesions other than FNH) were processed using a segmentation program and introduced to an AI model. After the learning process, the MRIs of 42 different patients that the AI model had no experience with were introduced to the system. In addition, a radiology resident and a radiology specialist evaluated patients with the same MR sequences. The sensitivity and specificity values were obtained from all three reviews. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the AI model were found to be 0.769, 0.966, 0.909, and 0.903, respectively. The sensitivity and specificity values were higher than those of the radiology resident and lower than those of the radiology specialist. The results of the specialist versus the AI model revealed a good agreement level, with a kappa (κ) value of 0.777. For the diagnosis of FNH, the sensitivity, specificity, PPV, and NPV of the AI device were higher than those of the radiology resident and lower than those of the radiology specialist. With additional studies focused on different specific lesions of the liver, AI models are expected to be able to diagnose each liver lesion with high accuracy in the future. AI is studied to provide assisted or automated interpretation of radiological images with an accurate and reproducible imaging diagnosis.

MRI Classification Abdominal Retrospective Clinical In Silico

Deep learning for named entity recognition in Turkish radiology reports.

Abdullahi AA, Ganiz MC, Koç U, Gökhan MB, Aydın C, Özdemir AB

•papers•Sep 8 2025

The primary objective of this research is to enhance the accuracy and efficiency of information extraction from radiology reports. In addressing this objective, the study aims to develop and evaluate a deep learning framework for named entity recognition (NER). We used a synthetic dataset of 1,056 Turkish radiology reports created and labeled by the radiologists in our research team. Due to privacy concerns, actual patient data could not be used; however, the synthetic reports closely mimic genuine reports in structure and content. We employed the four-stage DYGIE++ model for the experiments. First, we performed token encoding using four bidirectional encoder representations from transformers (BERT) models: BERTurk, BioBERTurk, PubMedBERT, and XLM-RoBERTa. Second, we introduced adaptive span enumeration, considering the word count of a sentence in Turkish. Third, we adopted span graph propagation to generate a multidirectional graph crucial for coreference resolution. Finally, we used a two-layered feed-forward neural network to classify the named entity. The experiments conducted on the labeled dataset showcase the approach's effectiveness. The study achieved an F1 score of 80.1 for the NER task, with the BioBERTurk model, which is pre-trained on Turkish Wikipedia, radiology reports, and biomedical texts, proving to be the most effective of the four BERT models used in the experiment. We show how different dataset labels affect the model's performance. The results demonstrate the model's ability to handle the intricacies of Turkish radiology reports, providing a detailed analysis of precision, recall, and F1 scores for each label. Additionally, this study compares its findings with related research in other languages. Our approach provides clinicians with more precise and comprehensive insights to improve patient care by extracting relevant information from radiology reports. This innovation in information extraction streamlines the diagnostic process and helps expedite patient treatment decisions.

Mixed Modality LLM Radiology Report Methodology In Silico Academic Lab Open Dataset

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Inference-Time Differentiable Levenberg-Marquardt Optimization

Minheng Chen, Youyong Kong

•preprint•Sep 8 2025

Intraoperative 2D/3D registration aligns preoperative 3D volumes with real-time 2D radiographs, enabling accurate localization of instruments and implants. A recent fully differentiable similarity learning framework approximates geodesic distances on SE(3), expanding the capture range of registration and mitigating the effects of substantial disturbances, but existing Euclidean approximations distort manifold structure and slow convergence. To address these limitations, we explore similarity learning in non-Euclidean spherical feature spaces to better capture and fit complex manifold structure. We extract feature embeddings using a CNN-Transformer encoder, project them into spherical space, and approximate their geodesic distances with Riemannian distances in the bi-invariant SO(4) space. This enables a more expressive and geometrically consistent deep similarity metric, enhancing the ability to distinguish subtle pose differences. During inference, we replace gradient descent with fully differentiable Levenberg-Marquardt optimization to accelerate convergence. Experiments on real and synthetic datasets show superior accuracy in both patient-specific and patient-agnostic scenarios.

Mixed Modality Registration Methodology In Silico Academic Lab

New imaging techniques and trends in radiology.

Kantarcı M, Aydın S, Oğul H, Kızılgöz V

•papers•Sep 8 2025

Radiography is a field of medicine inherently intertwined with technology. The dependency on technology is very high for obtaining images in ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI). Although the reduction in radiation dose is not applicable in US and MRI, advancements in technology have made it possible in CT, with ongoing studies aimed at further optimization. The resolution and diagnostic quality of images obtained through advancements in each modality are steadily improving. Additionally, technological progress has significantly shortened acquisition times for CT and MRI. The use of artificial intelligence (AI), which is becoming increasingly widespread worldwide, has also been incorporated into radiography. This technology can produce more accurate and reproducible results in US examinations. Machine learning offers great potential for improving image quality, creating more distinct and useful images, and even developing new US imaging modalities. Furthermore, AI technologies are increasingly prevalent in CT and MRI for image evaluation, image generation, and enhanced image quality.

Mixed Modality Image Synthesis Whole Body Review Concept GenAI

Automated Radiographic Total Sharp Score (ARTSS) in Rheumatoid Arthritis: A Solution to Reduce Inter-Intra Reader Variation and Enhancing Clinical Practice

Hajar Moradmand, Lei Ren

•preprint•Sep 8 2025

Assessing the severity of rheumatoid arthritis (RA) using the Total Sharp/Van Der Heijde Score (TSS) is crucial, but manual scoring is often time-consuming and subjective. This study introduces an Automated Radiographic Sharp Scoring (ARTSS) framework that leverages deep learning to analyze full-hand X-ray images, aiming to reduce inter- and intra-observer variability. The research uniquely accommodates patients with joint disappearance and variable-length image sequences. We developed ARTSS using data from 970 patients, structured into four stages: I) Image pre-processing and re-orientation using ResNet50, II) Hand segmentation using UNet.3, III) Joint identification using YOLOv7, and IV) TSS prediction using models such as VGG16, VGG19, ResNet50, DenseNet201, EfficientNetB0, and Vision Transformer (ViT). We evaluated model performance with Intersection over Union (IoU), Mean Average Precision (MAP), mean absolute error (MAE), Root Mean Squared Error (RMSE), and Huber loss. The average TSS from two radiologists was used as the ground truth. Model training employed 3-fold cross-validation, with each fold consisting of 452 training and 227 validation samples, and external testing included 291 unseen subjects. Our joint identification model achieved 99% accuracy. The best-performing model, ViT, achieved a notably low Huber loss of 0.87 for TSS prediction. Our results demonstrate the potential of deep learning to automate RA scoring, which can significantly enhance clinical practice. Our approach addresses the challenge of joint disappearance and variable joint numbers, offers timesaving benefits, reduces inter- and intra-reader variability, improves radiologist accuracy, and aids rheumatologists in making more informed decisions.

X-Ray Classification Musculoskeletal Methodology In Silico Academic Lab Benchmark SOTA

Curia: A Multi-Modal Foundation Model for Radiology

Corentin Dancette, Julien Khlaut, Antoine Saporta, Helene Philippe, Elodie Ferreres, Baptiste Callard, Théo Danielou, Léo Alberge, Léo Machado, Daniel Tordjman, Julie Dupuis, Korentin Le Floch, Jean Du Terrail, Mariam Moshiri, Laurent Dercle, Tom Boeken, Jules Gregory, Maxime Ronot, François Legou, Pascal Roux, Marc Sapoval, Pierre Manceron, Paul Hérent

•preprint•Sep 8 2025

AI-assisted radiological interpretation is based on predominantly narrow, single-task models. This approach is impractical for covering the vast spectrum of imaging modalities, diseases, and radiological findings. Foundation models (FMs) hold the promise of broad generalization across modalities and in low-data settings. However, this potential has remained largely unrealized in radiology. We introduce Curia, a foundation model trained on the entire cross-sectional imaging output of a major hospital over several years, which to our knowledge is the largest such corpus of real-world data-encompassing 150,000 exams (130 TB). On a newly curated 19-task external validation benchmark, Curia accurately identifies organs, detects conditions like brain hemorrhages and myocardial infarctions, and predicts outcomes in tumor staging. Curia meets or surpasses the performance of radiologists and recent foundation models, and exhibits clinically significant emergent properties in cross-modality, and low-data regimes. To accelerate progress, we release our base model's weights at https://huggingface.co/raidium/curia.

Mixed Modality Classification Whole Body Methodology In Silico Academic Lab Benchmark SOTA Open Code GenAI

MRI-Based Brain Tumor Detection through an Explainable EfficientNetV2 and MLP-Mixer-Attention Architecture

Mustafa Yurdakul, Şakir Taşdemir

•preprint•Sep 8 2025

Brain tumors are serious health problems that require early diagnosis due to their high mortality rates. Diagnosing tumors by examining Magnetic Resonance Imaging (MRI) images is a process that requires expertise and is prone to error. Therefore, the need for automated diagnosis systems is increasing day by day. In this context, a robust and explainable Deep Learning (DL) model for the classification of brain tumors is proposed. In this study, a publicly available Figshare dataset containing 3,064 T1-weighted contrast-enhanced brain MRI images of three tumor types was used. First, the classification performance of nine well-known CNN architectures was evaluated to determine the most effective backbone. Among these, EfficientNetV2 demonstrated the best performance and was selected as the backbone for further development. Subsequently, an attention-based MLP-Mixer architecture was integrated into EfficientNetV2 to enhance its classification capability. The performance of the final model was comprehensively compared with basic CNNs and the methods in the literature. Additionally, Grad-CAM visualization was used to interpret and validate the decision-making process of the proposed model. The proposed model's performance was evaluated using the five-fold cross-validation method. The proposed model demonstrated superior performance with 99.50% accuracy, 99.47% precision, 99.52% recall and 99.49% F1 score. The results obtained show that the model outperforms the studies in the literature. Moreover, Grad-CAM visualizations demonstrate that the model effectively focuses on relevant regions of MRI images, thus improving interpretability and clinical reliability. A robust deep learning model for clinical decision support systems has been obtained by combining EfficientNetV2 and attention-based MLP-Mixer, providing high accuracy and interpretability in brain tumor classification.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Filter Papers

Tags

Predicting Breath Hold Task Compliance From Head Motion.

Automatic bone age assessment: a Turkish population study.

Adherence to the Checklist for Artificial Intelligence in Medical Imaging (CLAIM): an umbrella review with a comprehensive two-level analysis.

Evaluating artificial intelligence for a focal nodular hyperplasia diagnosis using magnetic resonance imaging: preliminary findings.

Deep learning for named entity recognition in Turkish radiology reports.

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Inference-Time Differentiable Levenberg-Marquardt Optimization

New imaging techniques and trends in radiology.

Automated Radiographic Total Sharp Score (ARTSS) in Rheumatoid Arthritis: A Solution to Reduce Inter-Intra Reader Variation and Enhancing Clinical Practice

Curia: A Multi-Modal Foundation Model for Radiology

MRI-Based Brain Tumor Detection through an Explainable EfficientNetV2 and MLP-Mixer-Attention Architecture

Ready to Sharpen Your Edge?