Latest Papers on Radiology AI. Tags: None

MeD-3D: A Multimodal Deep Learning Framework for Precise Recurrence Prediction in Clear Cell Renal Cell Carcinoma (ccRCC)

Hasaan Maqsood, Saif Ur Rehman Khan

•preprint•Jul 10 2025

Accurate prediction of recurrence in clear cell renal cell carcinoma (ccRCC) remains a major clinical challenge due to the disease complex molecular, pathological, and clinical heterogeneity. Traditional prognostic models, which rely on single data modalities such as radiology, histopathology, or genomics, often fail to capture the full spectrum of disease complexity, resulting in suboptimal predictive accuracy. This study aims to overcome these limitations by proposing a deep learning (DL) framework that integrates multimodal data, including CT, MRI, histopathology whole slide images (WSI), clinical data, and genomic profiles, to improve the prediction of ccRCC recurrence and enhance clinical decision-making. The proposed framework utilizes a comprehensive dataset curated from multiple publicly available sources, including TCGA, TCIA, and CPTAC. To process the diverse modalities, domain-specific models are employed: CLAM, a ResNet50-based model, is used for histopathology WSIs, while MeD-3D, a pre-trained 3D-ResNet18 model, processes CT and MRI images. For structured clinical and genomic data, a multi-layer perceptron (MLP) is used. These models are designed to extract deep feature embeddings from each modality, which are then fused through an early and late integration architecture. This fusion strategy enables the model to combine complementary information from multiple sources. Additionally, the framework is designed to handle incomplete data, a common challenge in clinical settings, by enabling inference even when certain modalities are missing.

Mixed Modality Classification Abdominal Methodology In Silico Academic Lab GenAI

Patient-specific vs Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting

Gauthier Rotsart de Hertaing, Dani Manjah, Benoit Macq

•preprint•Jul 10 2025

Background: Accurate forecasting of lung tumor motion is essential for precise dose delivery in proton therapy. While current markerless methods mostly rely on deep learning, transformer-based architectures remain unexplored in this domain, despite their proven performance in trajectory forecasting. Purpose: This work introduces a markerless forecasting approach for lung tumor motion using Vision Transformers (ViT). Two training strategies are evaluated under clinically realistic constraints: a patient-specific (PS) approach that learns individualized motion patterns, and a multi-patient (MP) model designed for generalization. The comparison explicitly accounts for the limited number of images that can be generated between planning and treatment sessions. Methods: Digitally reconstructed radiographs (DRRs) derived from planning 4DCT scans of 31 patients were used to train the MP model; a 32nd patient was held out for evaluation. PS models were trained using only the target patient's planning data. Both models used 16 DRRs per input and predicted tumor motion over a 1-second horizon. Performance was assessed using Average Displacement Error (ADE) and Final Displacement Error (FDE), on both planning (T1) and treatment (T2) data. Results: On T1 data, PS models outperformed MP models across all training set sizes, especially with larger datasets (up to 25,000 DRRs, p < 0.05). However, MP models demonstrated stronger robustness to inter-fractional anatomical variability and achieved comparable performance on T2 data without retraining. Conclusions: This is the first study to apply ViT architectures to markerless tumor motion forecasting. While PS models achieve higher precision, MP models offer robust out-of-the-box performance, well-suited for time-constrained clinical settings.

CT Detection Chest Methodology In Silico Academic Lab Breakthrough

GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation.

Wang S, Li G, Gao M, Zhuo L, Liu M, Ma Z, Zhao W, Fu X

•papers•Jul 10 2025

Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet .

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

Breast Ultrasound Tumor Generation via Mask Generator and Text-Guided Network:A Clinically Controllable Framework with Downstream Evaluation

Haoyu Pan, Hongxin Lin, Zetian Feng, Chuxuan Lin, Junyang Mo, Chu Zhang, Zijian Wu, Yi Wang, Qingqing Zheng

•preprint•Jul 10 2025

The development of robust deep learning models for breast ultrasound (BUS) image analysis is significantly constrained by the scarcity of expert-annotated data. To address this limitation, we propose a clinically controllable generative framework for synthesizing BUS images. This framework integrates clinical descriptions with structural masks to generate tumors, enabling fine-grained control over tumor characteristics such as morphology, echogencity, and shape. Furthermore, we design a semantic-curvature mask generator, which synthesizes structurally diverse tumor masks guided by clinical priors. During inference, synthetic tumor masks serve as input to the generative framework, producing highly personalized synthetic BUS images with tumors that reflect real-world morphological diversity. Quantitative evaluations on six public BUS datasets demonstrate the significant clinical utility of our synthetic images, showing their effectiveness in enhancing downstream breast cancer diagnosis tasks. Furthermore, visual Turing tests conducted by experienced sonographers confirm the realism of the generated images, indicating the framework's potential to support broader clinical applications.

Ultrasound Image Synthesis Breast Methodology In Silico Academic Lab GenAI

Compressive Imaging Reconstruction via Tensor Decomposed Multi-Resolution Grid Encoding

Zhenyu Jin, Yisi Luo, Xile Zhao, Deyu Meng

•preprint•Jul 10 2025

Compressive imaging (CI) reconstruction, such as snapshot compressive imaging (SCI) and compressive sensing magnetic resonance imaging (MRI), aims to recover high-dimensional images from low-dimensional compressed measurements. This process critically relies on learning an accurate representation of the underlying high-dimensional image. However, existing unsupervised representations may struggle to achieve a desired balance between representation ability and efficiency. To overcome this limitation, we propose Tensor Decomposed multi-resolution Grid encoding (GridTD), an unsupervised continuous representation framework for CI reconstruction. GridTD optimizes a lightweight neural network and the input tensor decomposition model whose parameters are learned via multi-resolution hash grid encoding. It inherently enjoys the hierarchical modeling ability of multi-resolution grid encoding and the compactness of tensor decomposition, enabling effective and efficient reconstruction of high-dimensional images. Theoretical analyses for the algorithm's Lipschitz property, generalization error bound, and fixed-point convergence reveal the intrinsic superiority of GridTD as compared with existing continuous representation models. Extensive experiments across diverse CI tasks, including video SCI, spectral SCI, and compressive dynamic MRI reconstruction, consistently demonstrate the superiority of GridTD over existing methods, positioning GridTD as a versatile and state-of-the-art CI reconstruction method.

MRI Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Attend-and-Refine: Interactive keypoint estimation and quantitative cervical vertebrae analysis for bone age assessment

Jinhee Kim, Taesung Kim, Taewoo Kim, Dong-Wook Kim, Byungduk Ahn, Yoon-Ji Kim, In-Seok Song, Jaegul Choo

•preprint•Jul 10 2025

In pediatric orthodontics, accurate estimation of growth potential is essential for developing effective treatment strategies. Our research aims to predict this potential by identifying the growth peak and analyzing cervical vertebra morphology solely through lateral cephalometric radiographs. We accomplish this by comprehensively analyzing cervical vertebral maturation (CVM) features from these radiographs. This methodology provides clinicians with a reliable and efficient tool to determine the optimal timings for orthodontic interventions, ultimately enhancing patient outcomes. A crucial aspect of this approach is the meticulous annotation of keypoints on the cervical vertebrae, a task often challenged by its labor-intensive nature. To mitigate this, we introduce Attend-and-Refine Network (ARNet), a user-interactive, deep learning-based model designed to streamline the annotation process. ARNet features Interaction-guided recalibration network, which adaptively recalibrates image features in response to user feedback, coupled with a morphology-aware loss function that preserves the structural consistency of keypoints. This novel approach substantially reduces manual effort in keypoint identification, thereby enhancing the efficiency and accuracy of the process. Extensively validated across various datasets, ARNet demonstrates remarkable performance and exhibits wide-ranging applicability in medical imaging. In conclusion, our research offers an effective AI-assisted diagnostic tool for assessing growth potential in pediatric orthodontics, marking a significant advancement in the field.

X-Ray Detection Musculoskeletal Methodology In Silico Academic Lab Breakthrough

MRI sequence focused on pancreatic morphology evaluation: three-shot turbo spin-echo with deep learning-based reconstruction.

Kadoya Y, Mochizuki K, Asano A, Miyakawa K, Kanatani M, Saito J, Abo H

•papers•Jul 10 2025

BackgroundHigher-resolution magnetic resonance imaging sequences are needed for the early detection of pancreatic cancer.PurposeTo compare the quality of our novel T2-weighted, high-contrast, thin-slice imaging sequence, with an improved spatial resolution and deep learning-based reconstruction (three-shot turbo spin-echo with deep learning-based reconstruction [3S-TSE-DLR]), for imaging the pancreas with imaging using three conventional sequences (half-Fourier acquisition single-shot turbo spin-echo [HASTE], fat-suppressed 3D T1-weighted [FS-3D-T1W] imaging, and magnetic resonance cholangiopancreatography [MRCP]).Material and MethodsPancreatic images of 50 healthy volunteers acquired with 3S-TSE-DLR, HASTE, FS-3D-T1W imaging, and MRCP were compared by two diagnostic radiologists. A 5-point scale was used for assessing motion artifacts, pancreatic margin sharpness, and the ability to identify the main pancreatic duct (MPD) on 3S-TSE-DLR, HASTE, and FS-3D-T1W imaging, respectively. The ability to identify MPD via MRCP was also evaluated.ResultsArtifact scores (the higher the score, the fewer the artifacts) were significantly higher for 3S-TSE-DLR than for HASTE, and significantly lower for 3S-TSE-DLR than for FS-3D-T1W imaging, for both radiologists. Sharpness scores were significantly higher for 3S-TSE-DLR than for HASTE and FS-3D-T1W imaging, for both radiologists. The rate of identification of MPD was significantly higher for 3S-TSE-DLR than for FS-3D-T1W imaging, for both radiologists, and significantly higher for 3S-TSE-DLR than for HASTE for one radiologist. The rate of identification of MPD was not significantly different between 3S-TSE-DLR and MRCP.Conclusion3S-TSE-DLR provides better image sharpness than conventional sequences, can identify MPD equally as well or better than HASTE, and shows identification performance comparable to that of MRCP.

MRI Reconstruction Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Non-invasive identification of TKI-resistant NSCLC: a multi-model AI approach for predicting EGFR/TP53 co-mutations.

Li J, Xu R, Wang D, Liang Z, Li Y, Wang Q, Bi L, Qi Y, Zhou Y, Li W

•papers•Jul 10 2025

To investigate the value of multi-model based on preoperative CT scans in predicting EGFR/TP53 co-mutation status. We retrospectively included 2171 patients with non-small cell lung cancer (NSCLC) with pre-treatment computed tomography (CT) scans and predicting epidermal growth factor receptor (EGFR) gene sequencing from West China Hospital between January 2013 and April 2024. The deep-learning model was built for predicting EGFR / tumor protein 53 (TP53) co-occurrence status. The model performance was evaluated by area under the curve (AUC) and Kaplan-Meier analysis. We further compared multi-dimension model with three one-dimension models separately, and we explored the value of combining clinical factors with machine-learning factors. Additionally, we investigated 546 patients with 56-panel next-generation sequencing and low-dose computed tomography (LDCT) to explore the biological mechanisms of radiomics. In our cohort of 2171 patients (1,153 males, 1,018 females; median age 60 years), single-dimensional models were developed using data from 1,055 eligible patients. The multi-dimensional model utilizing a Random Forest classifier achieved superior performance, yielding the highest AUC of 0.843 for predicting EGFR/TP53 co-mutations in the test set. The multi-dimensional model demonstrates promising potential for non-invasive prediction of EGFR and TP53 co-mutations, facilitating early and informed clinical decision-making in NSCLC patients at risk of treatment resistance.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Attention-based multimodal deep learning for interpretable and generalizable prediction of pathological complete response in breast cancer.

Nishizawa T, Maldjian T, Jiao Z, Duong TQ

•papers•Jul 10 2025

Accurate prediction of pathological complete response (pCR) to neoadjuvant chemotherapy has significant clinical utility in the management of breast cancer treatment. Although multimodal deep learning models have shown promise for predicting pCR from medical imaging and other clinical data, their adoption has been limited due to challenges with interpretability and generalizability across institutions. We developed a multimodal deep learning model combining post contrast-enhanced whole-breast MRI at pre- and post-treatment timepoints with non-imaging clinical features. The model integrates 3D convolutional neural networks and self-attention to capture spatial and cross-modal interactions. We utilized two public multi-institutional datasets to perform internal and external validation of the model. For model training and validation, we used data from the I-SPY 2 trial (N = 660). For external validation, we used the I-SPY 1 dataset (N = 114). Of the 660 patients in I-SPY 2, 217 patients achieved pCR (32.88%). Of the 114 patients in I-SPY 1, 29 achieved pCR (25.44%). The attention-based multimodal model yielded the best predictive performance with an AUC of 0.73 ± 0.04 on the internal data and an AUC of 0.71 ± 0.02 on the external dataset. The MRI-only model (internal AUC = 0.68 ± 0.03, external AUC = 0.70 ± 0.04) and the non-MRI clinical features-only model (internal AUC = 0.66 ± 0.08, external AUC = 0.71 ± 0.03) trailed in performance, indicating the combination of both modalities is most effective. We present a robust and interpretable deep learning framework for pCR prediction in breast cancer patients undergoing NAC. By combining imaging and clinical data with attention-based fusion, the model achieves strong predictive performance and generalizes across institutions.

MRI Classification Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

FF Swin-Unet: a strategy for automated segmentation and severity scoring of NAFLD.

Fan L, Lei Y, Song F, Sun X, Zhang Z

•papers•Jul 10 2025

Non-alcoholic fatty liver disease (NAFLD) is a significant risk factor for liver cancer and cardiovascular diseases, imposing substantial social and economic burdens. Computed tomography (CT) scans are crucial for diagnosing NAFLD and assessing its severity. However, current manual measurement techniques require considerable human effort and resources from radiologists, and there is a lack of standardized methods for classifying the severity of NAFLD in existing research. To address these challenges, we propose a novel method for NAFLD segmentation and automated severity scoring. The method consists of three key modules: (1) The Semi-automatization nnU-Net Module (SNM) constructs a high-quality dataset by combining manual annotations with semi-automated refinement; (2) The Focal Feature Fusion Swin-Unet Module (FSM) enhances liver and spleen segmentation through multi-scale feature fusion and Swin Transformer-based architectures; (3) The Automated Severity Scoring Module (ASSM) integrates segmentation results with radiological features to classify NAFLD severity. These modules are embedded in a Flask-RESTful API-based system, enabling users to upload abdominal CT data for automated preprocessing, segmentation, and scoring. The Focal Feature Fusion Swin-Unet (FF Swin-Unet) method significantly improves segmentation accuracy, achieving a Dice similarity coefficient (DSC) of 95.64% and a 95th percentile Hausdorff distance (HD95) of 15.94. The accuracy of the automated severity scoring is 90%. With model compression and ONNX deployment, the evaluation speed for each case is approximately 5 seconds. Compared to manual diagnosis, the system can process a large volume of data simultaneously, rapidly, and efficiently while maintaining the same level of diagnostic accuracy, significantly reducing the workload of medical professionals. Our research demonstrates that the proposed system has high accuracy in processing large volumes of CT data and providing automated NAFLD severity scores quickly and efficiently. This method has the potential to significantly reduce the workload of medical professionals and holds immense clinical application potential.

CT Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA Open Code

Filter Papers

Tags

MeD-3D: A Multimodal Deep Learning Framework for Precise Recurrence Prediction in Clear Cell Renal Cell Carcinoma (ccRCC)

Patient-specific vs Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting

GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation.

Breast Ultrasound Tumor Generation via Mask Generator and Text-Guided Network:A Clinically Controllable Framework with Downstream Evaluation

Compressive Imaging Reconstruction via Tensor Decomposed Multi-Resolution Grid Encoding

Attend-and-Refine: Interactive keypoint estimation and quantitative cervical vertebrae analysis for bone age assessment

MRI sequence focused on pancreatic morphology evaluation: three-shot turbo spin-echo with deep learning-based reconstruction.

Non-invasive identification of TKI-resistant NSCLC: a multi-model AI approach for predicting EGFR/TP53 co-mutations.

Attention-based multimodal deep learning for interpretable and generalizable prediction of pathological complete response in breast cancer.

FF Swin-Unet: a strategy for automated segmentation and severity scoring of NAFLD.

Ready to Sharpen Your Edge?