Latest Papers on Radiology AI.

A CNN Autoencoder for Learning Latent Disc Geometry from Segmented Lumbar Spine MRI.

Perrone M, Moore DM, Ukeba D, Martin JT

•papers•Sep 22 2025

Low back pain is the world's leading cause of disability and pathology of the lumbar intervertebral discs is frequently considered a driver of pain. The geometric characteristics of intervertebral discs offer valuable insights into their mechanical behavior and pathological conditions. In this study, we present a convolutional neural network (CNN) autoencoder to extract latent features from segmented disc MRI. Additionally, we interpret these latent features and demonstrate their utility in identifying disc pathology, providing a complementary perspective to standard geometric measures. We examined 195 sagittal T1-weighted MRI of the lumbar spine from a publicly available multi-institutional dataset. The proposed pipeline includes five main steps: (1) segmenting MRI, (2) training the CNN autoencoder and extracting latent geometric features, (3) measuring standard geometric features, (4) predicting disc narrowing with latent and/or standard geometric features and (5) determining the relationship between latent and standard geometric features. Our segmentation model achieved an intersection over union (IoU) of 0.82 (95% CI 0.80-0.84) and dice similarity coefficient (DSC) of 0.90 (95% CI 0.89-0.91). The minimum bottleneck size for which the CNN autoencoder converged was 4 × 1 after 350 epochs (IoU of 0.9984-95% CI 0.9979-0.9989). Combining latent and geometric features improved predictions of disc narrowing compared to use either feature set alone. Latent geometric features encoded for disc shape and angular orientation. This study presents a CNN autoencoder to extract latent features from segmented lumbar disc MRI, enhancing disc narrowing prediction and feature interpretability. Future work will integrate disc voxel intensity to analyze composition.

MRI Segmentation Musculoskeletal Methodology In Silico

CPT-4DMR: Continuous sPatial-Temporal Representation for 4D-MRI Reconstruction

Xinyang Wu, Muheng Li, Xia Li, Orso Pusterla, Sairos Safai, Philippe C. Cattin, Antony J. Lomax, Ye Zhang

•preprint•Sep 22 2025

Four-dimensional MRI (4D-MRI) is an promising technique for capturing respiratory-induced motion in radiation therapy planning and delivery. Conventional 4D reconstruction methods, which typically rely on phase binning or separate template scans, struggle to capture temporal variability, complicate workflows, and impose heavy computational loads. We introduce a neural representation framework that considers respiratory motion as a smooth, continuous deformation steered by a 1D surrogate signal, completely replacing the conventional discrete sorting approach. The new method fuses motion modeling with image reconstruction through two synergistic networks: the Spatial Anatomy Network (SAN) encodes a continuous 3D anatomical representation, while a Temporal Motion Network (TMN), guided by Transformer-derived respiratory signals, produces temporally consistent deformation fields. Evaluation using a free-breathing dataset of 19 volunteers demonstrates that our template- and phase-free method accurately captures both regular and irregular respiratory patterns, while preserving vessel and bronchial continuity with high anatomical fidelity. The proposed method significantly improves efficiency, reducing the total processing time from approximately five hours required by conventional discrete sorting methods to just 15 minutes of training. Furthermore, it enables inference of each 3D volume in under one second. The framework accurately reconstructs 3D images at any respiratory state, achieves superior performance compared to conventional methods, and demonstrates strong potential for application in 4D radiation therapy planning and real-time adaptive treatment.

MRI Reconstruction Chest Methodology In Silico Academic Lab Reproducibility Breakthrough

Neural Network-Driven Direct CBCT-Based Dose Calculation for Head-and-Neck Proton Treatment Planning

Muheng Li, Evangelia Choulilitsa, Lisa Fankhauser, Francesca Albertini, Antony Lomax, Ye Zhang

•preprint•Sep 22 2025

Accurate dose calculation on cone beam computed tomography (CBCT) images is essential for modern proton treatment planning workflows, particularly when accounting for inter-fractional anatomical changes in adaptive treatment scenarios. Traditional CBCT-based dose calculation suffers from image quality limitations, requiring complex correction workflows. This study develops and validates a deep learning approach for direct proton dose calculation from CBCT images using extended Long Short-Term Memory (xLSTM) neural networks. A retrospective dataset of 40 head-and-neck cancer patients with paired planning CT and treatment CBCT images was used to train an xLSTM-based neural network (CBCT-NN). The architecture incorporates energy token encoding and beam's-eye-view sequence modelling to capture spatial dependencies in proton dose deposition patterns. Training utilized 82,500 paired beam configurations with Monte Carlo-generated ground truth doses. Validation was performed on 5 independent patients using gamma analysis, mean percentage dose error assessment, and dose-volume histogram comparison. The CBCT-NN achieved gamma pass rates of 95.1 $\pm$ 2.7% using 2mm/2% criteria. Mean percentage dose errors were 2.6 $\pm$ 1.4% in high-dose regions ($>$90% of max dose) and 5.9 $\pm$ 1.9% globally. Dose-volume histogram analysis showed excellent preservation of target coverage metrics (Clinical Target Volume V95% difference: -0.6 $\pm$ 1.1%) and organ-at-risk constraints (parotid mean dose difference: -0.5 $\pm$ 1.5%). Computation time is under 3 minutes without sacrificing Monte Carlo-level accuracy. This study demonstrates the proof-of-principle of direct CBCT-based proton dose calculation using xLSTM neural networks. The approach eliminates traditional correction workflows while achieving comparable accuracy and computational efficiency suitable for adaptive protocols.

CT Reconstruction Neurological Retrospective Clinical In Silico Academic Lab Breakthrough

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

Yi Gu, Kuniaki Saito, Jiaxin Ma

•preprint•Sep 22 2025

As medical diagnoses increasingly leverage multimodal data, machine learning models are expected to effectively fuse heterogeneous information while remaining robust to missing modalities. In this work, we propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning to address real-world limitations such as modality imbalance and missingness. Our approach introduces learnable modality tokens for improving missingness-aware fusion of modalities and augments conventional unimodal contrastive objectives with fused multimodal representations. We validate our framework on large-scale clinical datasets for disease detection and prediction tasks, encompassing both visual and tabular modalities. Experimental results demonstrate that our method achieves state-of-the-art performance, particularly in challenging and practical scenarios where only a single modality is available. Furthermore, we show its adaptability through successful integration with a recent CT foundation model. Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning, offering a scalable, low-cost solution with significant potential for real-world clinical applications. The code is available at https://github.com/omron-sinicx/medical-modality-dropout.

CT Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models

Dingxin Lu, Shurui Wu, Xinyi Huang

•preprint•Sep 22 2025

With the rising global burden of chronic diseases and the multimodal and heterogeneous clinical data (medical imaging, free-text recordings, wearable sensor streams, etc.), there is an urgent need for a unified multimodal AI framework that can proactively predict individual health risks. We propose VL-RiskFormer, a hierarchical stacked visual-language multimodal Transformer with a large language model (LLM) inference head embedded in its top layer. The system builds on the dual-stream architecture of existing visual-linguistic models (e.g., PaLM-E, LLaVA) with four key innovations: (i) pre-training with cross-modal comparison and fine-grained alignment of radiological images, fundus maps, and wearable device photos with corresponding clinical narratives using momentum update encoders and debiased InfoNCE losses; (ii) a time fusion block that integrates irregular visit sequences into the causal Transformer decoder through adaptive time interval position coding; (iii) a disease ontology map adapter that injects ICD-10 codes into visual and textual channels in layers and infers comorbid patterns with the help of a graph attention mechanism. On the MIMIC-IV longitudinal cohort, VL-RiskFormer achieved an average AUROC of 0.90 with an expected calibration error of 2.7 percent.

Mixed Modality Classification Retrospective Clinical In Silico GenAI

Measurement Score-Based MRI Reconstruction with Automatic Coil Sensitivity Estimation

Tingjun Liu, Chicago Y. Park, Yuyang Hu, Hongyu An, Ulugbek S. Kamilov

•preprint•Sep 22 2025

Diffusion-based inverse problem solvers (DIS) have recently shown outstanding performance in compressed-sensing parallel MRI reconstruction by combining diffusion priors with physical measurement models. However, they typically rely on pre-calibrated coil sensitivity maps (CSMs) and ground truth images, making them often impractical: CSMs are difficult to estimate accurately under heavy undersampling and ground-truth images are often unavailable. We propose Calibration-free Measurement Score-based diffusion Model (C-MSM), a new method that eliminates these dependencies by jointly performing automatic CSM estimation and self-supervised learning of measurement scores directly from k-space data. C-MSM reconstructs images by approximating the full posterior distribution through stochastic sampling over partial measurement posterior scores, while simultaneously estimating CSMs. Experiments on the multi-coil brain fastMRI dataset show that C-MSM achieves reconstruction performance close to DIS with clean diffusion priors -- even without access to clean training data and pre-calibrated CSMs.

MRI Reconstruction Neurological Methodology In Silico

Influence of Classification Task and Distribution Shift Type on OOD Detection in Fetal Ultrasound

Chun Kit Wong, Anders N. Christensen, Cosmin I. Bercea, Julia A. Schnabel, Martin G. Tolsgaard, Aasa Feragen

•preprint•Sep 22 2025

Reliable out-of-distribution (OOD) detection is important for safe deployment of deep learning models in fetal ultrasound amidst heterogeneous image characteristics and clinical settings. OOD detection relies on estimating a classification model's uncertainty, which should increase for OOD samples. While existing research has largely focused on uncertainty quantification methods, this work investigates the impact of the classification task itself. Through experiments with eight uncertainty quantification methods across four classification tasks, we demonstrate that OOD detection performance significantly varies with the task, and that the best task depends on the defined ID-OOD criteria; specifically, whether the OOD sample is due to: i) an image characteristic shift or ii) an anatomical feature shift. Furthermore, we reveal that superior OOD detection does not guarantee optimal abstained prediction, underscoring the necessity to align task selection and uncertainty strategies with the specific downstream application in medical image analysis.

Ultrasound Classification Abdominal Methodology In Silico Academic Lab

Visual Instruction Pretraining for Domain-Specific Foundation Models

Yuxuan Li, Yicheng Zhang, Wenhao Tang, Yimian Dai, Ming-Ming Cheng, Xiang Li, Jian Yang

•preprint•Sep 22 2025

Modern computer vision is converging on a closed loop in which perception, reasoning and generation mutually reinforce each other. However, this loop remains incomplete: the top-down influence of high-level reasoning on the foundational learning of low-level perceptual features is not yet underexplored. This paper addresses this gap by proposing a new paradigm for pretraining foundation models in downstream domains. We introduce Visual insTruction Pretraining (ViTP), a novel approach that directly leverages reasoning to enhance perception. ViTP embeds a Vision Transformer (ViT) backbone within a Vision-Language Model and pretrains it end-to-end using a rich corpus of visual instruction data curated from target downstream domains. ViTP is powered by our proposed Visual Robustness Learning (VRL), which compels the ViT to learn robust and domain-relevant features from a sparse set of visual tokens. Extensive experiments on 16 challenging remote sensing and medical imaging benchmarks demonstrate that ViTP establishes new state-of-the-art performance across a diverse range of downstream tasks. The code is available at https://github.com/zcablii/ViTP.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

Explainable AI-driven analysis of radiology reports using text and image data: An experimental study.

Zamir MT, Khan SU, Gelbukh A, Felipe Riverón EM, Gelbukh I

•papers•Sep 22 2025

Artificial intelligence is increasingly being integrated into clinical diagnostics, yet its lack of transparency hinders trust and adoption among healthcare professionals. The explainable AI (XAI) has the potential to improve interpretability and reliability of AI-based decisions in clinical practice. This study evaluates the use of Explainable AI (XAI) for interpreting radiology reports to improve healthcare practitioners' confidence and comprehension of AI-assisted diagnostics. This study employed the Indiana University chest X-ray Dataset containing 3169 textual reports and 6471 images. Textual were being classified as either normal or abnormal by using a range of machine learning approaches. This includes traditional machine learning models and ensemble methods, deep learning models (LSTM), and advanced transformer-based language models (GPT-2, T5, LLaMA-2, LLaMA-3.1). For image-based classifications, convolution neural networks (CNNs) including DenseNet121, and DenseNet169 were used. Top performing models were interpreted using Explainable AI (XAI) methods SHAP and LIME to support clinical decision making by enhancing transparency and trust in model predictions. LLaMA-3.1 model achieved highest accuracy of 98% in classifying the textual radiology reports. Statistical analysis confirmed the model robustness, with Cohen's kappa (k=0.981) indicating near perfect agreement beyond chance, both Chi-Square and Fisher's Exact test revealing a high significant association between actual and predicted labels (p<0.0001). Although McNemar's Test yielded a non-significant result (p=0.25) suggests balance class performance. While the highest accuracy of 84% was achieved in the analysis of imaging data using the DenseNet169 and DenseNet121 models. To assess explainability, LIME and SHAP were applied to best performing models. These models consistently highlighted the medical related terms such as "opacity", "consolidation" and "pleural" are clear indication for abnormal finding in textual reports. The research underscores that explainability is an essential component of any AI systems used in diagnostics and helpful in the design and implementation of AI in the healthcare sector. Such approach improves the accuracy of the diagnosis and builds confidence in health workers, who in the future will use explainable AI in clinical settings, particularly in the application of AI explainability for medical purposes.

X-Ray Classification Chest Methodology In Silico Academic Lab GenAI

Linking dynamic connectivity states to cognitive decline and anatomical changes in Alzheimer's disease.

Tessadori J, Galazzo IB, Storti SF, Pini L, Brusini L, Cruciani F, Sona D, Menegaz G, Murino V

•papers•Sep 22 2025

Alterations in brain connectivity provide early indications of neurodegenerative diseases like Alzheimer's disease (AD). Here, we present a novel framework that integrates a Hidden Markov Model (HMM) within the architecture of a convolutional neural network (CNN) to analyze dynamic functional connectivity (dFC) in resting-state functional magnetic resonance imaging (rs-fMRI). Our unsupervised approach captures recurring connectivity states in a large cohort of subjects spanning the Alzheimer's disease continuum, including healthy controls, individuals with mild cognitive impairment (MCI), and patients with clinically diagnosed AD. We propose a deep neural model with embedded HMM dynamics to identify stable recurring brain states from resting-state fMRI. These states exhibit distinct connectivity patterns and are differentially expressed across the Alzheimer's disease continuum. Our analysis shows that the fraction of time each state is active varies systematically with disease severity, highlighting dynamic network alterations that track neurodegeneration. Our findings suggest that the disruption of dynamic connectivity patterns in AD may follow a two-stage trajectory, where early shifts toward integrative network states give way to reduced connectivity organization as the disease progresses. This framework offers a promising tool for early diagnosis and monitoring of AD, and may have broader applications in the study of other neurodegenerative conditions.

MRI Classification Neurological Methodology In Silico

Filter Papers

Tags

A CNN Autoencoder for Learning Latent Disc Geometry from Segmented Lumbar Spine MRI.

CPT-4DMR: Continuous sPatial-Temporal Representation for 4D-MRI Reconstruction

Neural Network-Driven Direct CBCT-Based Dose Calculation for Head-and-Neck Proton Treatment Planning

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models

Measurement Score-Based MRI Reconstruction with Automatic Coil Sensitivity Estimation

Influence of Classification Task and Distribution Shift Type on OOD Detection in Fetal Ultrasound

Visual Instruction Pretraining for Domain-Specific Foundation Models

Explainable AI-driven analysis of radiology reports using text and image data: An experimental study.

Linking dynamic connectivity states to cognitive decline and anatomical changes in Alzheimer's disease.

Ready to Sharpen Your Edge?