Latest Papers on Radiology AI. Tags: Mixed Modality

3D MedDiffusion: A 3D Medical Latent Diffusion Model for Controllable and High-quality Medical Image Generation.

Wang H, Liu Z, Sun K, Wang X, Shen D, Cui Z

•papers•Jul 2 2025

The generation of medical images presents significant challenges due to their high-resolution and three-dimensional nature. Existing methods often yield suboptimal performance in generating high-quality 3D medical images, and there is currently no universal generative framework for medical imaging. In this paper, we introduce a 3D Medical Latent Diffusion (3D MedDiffusion) model for controllable, high-quality 3D medical image generation. 3D MedDiffusion incorporates a novel, highly efficient Patch-Volume Autoencoder that compresses medical images into latent space through patch-wise encoding and recovers back into image space through volume-wise decoding. Additionally, we design a new noise estimator to capture both local details and global structural information during diffusion denoising process. 3D MedDiffusion can generate fine-detailed, high-resolution images (up to 512x512x512) and effectively adapt to various downstream tasks as it is trained on large-scale datasets covering CT and MRI modalities and different anatomical regions (from head to leg). Experimental results demonstrate that 3D MedDiffusion surpasses state-of-the-art methods in generative quality and exhibits strong generalizability across tasks such as sparse-view CT reconstruction, fast MRI reconstruction, and data augmentation for segmentationand classification. Source code and checkpoints are available at https://github.com/ShanghaiTech-IMPACT/3D-MedDiffusion.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

Retrieval-augmented generation elevates local LLM quality in radiology contrast media consultation.

Wada A, Tanaka Y, Nishizawa M, Yamamoto A, Akashi T, Hagiwara A, Hayakawa Y, Kikuta J, Shimoji K, Sano K, Kamagata K, Nakanishi A, Aoki S

•papers•Jul 2 2025

Large language models (LLMs) demonstrate significant potential in healthcare applications, but clinical deployment is limited by privacy concerns and insufficient medical domain training. This study investigated whether retrieval-augmented generation (RAG) can improve locally deployable LLM for radiology contrast media consultation. In 100 synthetic iodinated contrast media consultations we compared Llama 3.2-11B (baseline and RAG) with three cloud-based models-GPT-4o mini, Gemini 2.0 Flash and Claude 3.5 Haiku. A blinded radiologist ranked the five replies per case, and three LLM-based judges scored accuracy, safety, structure, tone, applicability and latency. Under controlled conditions, RAG eliminated hallucinations (0% vs 8%; χ²₍Yates₎ = 6.38, p = 0.012) and improved mean rank by 1.3 (Z = -4.82, p < 0.001), though performance gaps with cloud models persist. The RAG-enhanced model remained faster (2.6 s vs 4.9-7.3 s) while the LLM-based judges preferred it over GPT-4o mini, though the radiologist ranked GPT-4o mini higher. RAG thus provides meaningful improvements for local clinical LLMs while maintaining the privacy benefits of on-premise deployment.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

A novel few-shot learning framework for supervised diffeomorphic image registration network.

Chen K, Han H, Wei J, Zhang Y

•papers•Jul 2 2025

Image registration is a key technique in image processing and analysis. Due to its high complexity, the traditional registration frameworks often fail to meet real-time demands in practice. To address the real-time demand, several deep learning networks for registration have been proposed, including the supervised and the unsupervised networks. Unsupervised networks rely on large amounts of training data to minimize specific loss functions, but the lack of physical information constraints results in the lower accuracy compared with the supervised networks. However, the supervised networks in medical image registration face two major challenges: physical mesh folding and the scarcity of labeled training data. To address these two challenges, we propose a novel few-shot learning framework for image registration. The framework contains two parts: random diffeomorphism generator (RDG) and a supervised few-shot learning network for image registration. By randomly generating a complex vector field, the RDG produces a series of diffeomorphism. With the help of diffeomorphism generated by RDG, one can use only a few image data (theoretically, one image data is enough) to generate a series of labels for training the supervised few-shot learning network. Concerning the elimination of the physical mesh folding phenomenon, in the proposed network, the loss function is only required to ensure the smoothness of deformation (no other control for mesh folding elimination is necessary). The experimental results indicate that the proposed method demonstrates superior performance in eliminating physical mesh folding when compared to other existing learning-based methods. Our code is available at this link https://github.com/weijunping111/RDG-TMI.git.

Mixed Modality Registration Methodology In Silico Academic Lab Open Code

A novel neuroimaging based early detection framework for alzheimer disease using deep learning.

Alasiry A, Shinan K, Alsadhan AA, Alhazmi HE, Alanazi F, Ashraf MU, Muhammad T

•papers•Jul 2 2025

Alzheimer's disease (AD) is a progressive neurodegenerative disorder that significantly impacts cognitive function, posing a major global health challenge. Despite its rising prevalence, particularly in low and middle-income countries, early diagnosis remains inadequate, with projections estimating over 55 million affected individuals by 2022, expected to triple by 2050. Accurate early detection is critical for effective intervention. This study presents Neuroimaging-based Early Detection of Alzheimer's Disease using Deep Learning (NEDA-DL), a novel computer-aided diagnostic (CAD) framework leveraging a hybrid ResNet-50 and AlexNet architecture optimized with CUDA-based parallel processing. The proposed deep learning model processes MRI and PET neuroimaging data, utilizing depthwise separable convolutions to enhance computational efficiency. Performance evaluation using key metrics including accuracy, sensitivity, specificity, and F1-score demonstrates state-of-the-art classification performance, with the Softmax classifier achieving 99.87% accuracy. Comparative analyses further validate the superiority of NEDA-DL over existing methods. By integrating structural and functional neuroimaging insights, this approach enhances diagnostic precision and supports clinical decision-making in Alzheimer's disease detection.

Mixed Modality Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Enhanced security for medical images using a new 5D hyper chaotic map and deep learning based segmentation.

Subathra S, Thanikaiselvan V

•papers•Jul 2 2025

Medical image encryption is important for maintaining the confidentiality of sensitive medical data and protecting patient privacy. Contemporary healthcare systems store significant patient data in text and graphic form. This research proposes a New 5D hyperchaotic system combined with a customised U-Net architecture. Chaotic maps have become an increasingly popular method for encryption because of their remarkable characteristics, including statistical randomness and sensitivity to initial conditions. The significant region is segmented from the medical images using the U-Net network, and its statistics are utilised as initial conditions to generate the new random sequence. Initially, zig-zag scrambling confuses the pixel position of a medical image and applies further permutation with a new 5D hyperchaotic sequence. Two stages of diffusion are used, such as dynamic DNA flip and dynamic DNA XOR, to enhance the encryption algorithm's security against various attacks. The randomness of the New 5D hyperchaotic system is verified using the NIST SP800-22 statistical test, calculating the Lyapunov exponent and plotting the attractor diagram of the chaotic sequence. The algorithm validates with statistical measures such as PSNR, MSE, NPCR, UACI, entropy, and Chi-square values. Evaluation is performed for test images yields average horizontal, vertical, and diagonal correlation coefficients of -0.0018, -0.0002, and 0.0007, respectively, Shannon entropy of 7.9971, Kolmogorov Entropy value of 2.9469, NPCR of 99.61%, UACI of 33.49%, Chi-square "PASS" at both the 5% (293.2478) and 1% (310.4574) significance levels, key space is 2<sup>500</sup> and an average encryption time of approximately 2.93 s per 256 × 256 image on a standard desktop CPU. The performance comparisons use various encryption methods and demonstrate that the proposed method ensures secure reliability against various challenges.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Hybrid deep learning architecture for scalable and high-quality image compression.

Al-Khafaji M, Ramaha NTA

•papers•Jul 2 2025

The rapid growth of medical imaging data presents challenges for efficient storage and transmission, particularly in clinical and telemedicine applications where image fidelity is crucial. This study proposes a hybrid deep learning-based image compression framework that integrates Stationary Wavelet Transform (SWT), Stacked Denoising Autoencoder (SDAE), Gray-Level Co-occurrence Matrix (GLCM), and K-means clustering. The framework enables multiresolution decomposition, texture-aware feature extraction, and adaptive region-based compression. A custom loss function that combines Mean Squared Error (MSE) and Structural Similarity Index (SSIM) ensures high perceptual quality and compression efficiency. The proposed model was evaluated across multiple benchmark medical imaging datasets and achieved a Peak Signal-to-Noise Ratio (PSNR) of up to 50.36 dB, MS-SSIM of 0.9999, and an encoding-decoding time of 0.065 s. These results demonstrate the model's capability to outperform existing approaches while maintaining diagnostic integrity, scalability, and speed, making it suitable for real-time and resource-constrained clinical environments.

Mixed Modality Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Multi-scale fusion semantic enhancement network for medical image segmentation.

Zhang Z, Xu C, Li Z, Chen Y, Nie C

•papers•Jul 2 2025

The application of sophisticated computer vision techniques for medical image segmentation (MIS) plays a vital role in clinical diagnosis and treatment. Although Transformer-based models are effective at capturing global context, they are often ineffective at dealing with local feature dependencies. In order to improve this problem, we design a Multi-scale Fusion and Semantic Enhancement Network (MFSE-Net) for endoscopic image segmentation, which aims to capture global information and enhance detailed information. MFSE-Net uses a dual encoder architecture, with PVTv2 as the primary encoder to capture global features and CNNs as the secondary encoder to capture local details. The main encoder includes the LGDA (Large-kernel Grouped Deformable Attention) module for filtering noise and enhancing the semantic extraction of the four hierarchical features. The auxiliary encoder leverages the MLCF (Multi-Layered Cross-attention Fusion) module to integrate high-level semantic data from the deep CNN with fine spatial details from the shallow layers, enhancing the precision of boundaries and positioning. On the decoder side, we have introduced the PSE (Parallel Semantic Enhancement) module, which embeds the boundary and position information of the secondary encoder into the output characteristics of the backbone network. In the multi-scale decoding process, we also add SAM (Scale Aware Module) to recover global semantic information and offset for the loss of boundary details. Extensive experiments have shown that MFSE-Net overwhelmingly outperforms SOTA on the renal tumor and polyp datasets.

Mixed Modality Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA

Artificial Intelligence-Driven Cancer Diagnostics: Enhancing Radiology and Pathology through Reproducibility, Explainability, and Multimodality.

Khosravi P, Fuchs TJ, Ho DJ

•papers•Jul 2 2025

The integration of artificial intelligence (AI) in cancer research has significantly advanced radiology, pathology, and multimodal approaches, offering unprecedented capabilities in image analysis, diagnosis, and treatment planning. AI techniques provide standardized assistance to clinicians, in which many diagnostic and predictive tasks are manually conducted, causing low reproducibility. These AI methods can additionally provide explainability to help clinicians make the best decisions for patient care. This review explores state-of-the-art AI methods, focusing on their application in image classification, image segmentation, multiple instance learning, generative models, and self-supervised learning. In radiology, AI enhances tumor detection, diagnosis, and treatment planning through advanced imaging modalities and real-time applications. In pathology, AI-driven image analysis improves cancer detection, biomarker discovery, and diagnostic consistency. Multimodal AI approaches can integrate data from radiology, pathology, and genomics to provide comprehensive diagnostic insights. Emerging trends, challenges, and future directions in AI-driven cancer research are discussed, emphasizing the transformative potential of these technologies in improving patient outcomes and advancing cancer care. This article is part of a special series: Driving Cancer Discoveries with Computational Research, Data Science, and Machine Learning/AI.

Mixed Modality Classification Review Concept Academic Lab Reproducibility GenAI

Dual-Modality Virtual Biopsy System Integrating MRI and MG for Noninvasive Predicting HER2 Status in Breast Cancer.

Wang Q, Zhang ZQ, Huang CC, Xue HW, Zhang H, Bo F, Guan WT, Zhou W, Bai GJ

•papers•Jul 1 2025

Accurate determination of human epidermal growth factor receptor 2 (HER2) expression is critical for guiding targeted therapy in breast cancer. This study aimed to develop and validate a deep learning (DL)-based decision-making visual biomarker system (DM-VBS) for predicting HER2 status using radiomics and DL features derived from magnetic resonance imaging (MRI) and mammography (MG). Radiomics features were extracted from MRI, and DL features were derived from MG. Four submodels were constructed: Model I (MRI-radiomics) and Model III (mammography-DL) for distinguishing HER2-zero/low from HER2-positive cases, and Model II (MRI-radiomics) and Model IV (mammography-DL) for differentiating HER2-zero from HER2-low/positive cases. These submodels were integrated into a XGBoost model for ternary classification of HER2 status. Radiologists assessed imaging features associated with HER2 expression, and model performance was validated using two independent datasets from The Cancer Image Archive. A total of 550 patients were divided into training, internal validation, and external validation cohorts. Models I and III achieved an area under the curve (AUC) of 0.800-0.850 for distinguishing HER2-zero/low from HER2-positive cases, while Models II and IV demonstrated AUC values of 0.793-0.847 for differentiating HER2-zero from HER2-low/positive cases. The DM-VBS achieved average accuracy of 85.42%, 80.4%, and 89.68% for HER2-zero, -low, and -positive patients in the validation cohorts, respectively. Imaging features such as lesion size, number of lesions, enhancement type, and microcalcifications significantly differed across HER2 statuses, except between HER2-zero and -low groups. DM-VBS can predict HER2 status and assist clinicians in making treatment decisions for breast cancer.

Mixed Modality Classification Breast Retrospective Clinical In Silico Academic Lab

A Machine Learning Model for Predicting the HER2 Positive Expression of Breast Cancer Based on Clinicopathological and Imaging Features.

Qin X, Yang W, Zhou X, Yang Y, Zhang N

•papers•Jul 1 2025

To develop a machine learning (ML) model based on clinicopathological and imaging features to predict the Human Epidermal Growth Factor Receptor 2 (HER2) positive expression (HER2-p) of breast cancer (BC), and to compare its performance with that of a logistic regression (LR) model. A total of 2541 consecutive female patients with pathologically confirmed primary breast lesions were enrolled in this study. Based on chronological order, 2034 patients treated between January 2018 and December 2022 were designated as the retrospective development cohort, while 507 patients treated between January 2023 and May 2024 were designated as the prospective validation cohort. The patients were randomly divided into a train cohort (n=1628) and a test cohort (n=406) in an 8:2 ratio within the development cohort. Pretreatment mammography (MG) and breast MRI data, along with clinicopathological features, were recorded. Extreme Gradient Boosting (XGBoost) in combination with Artificial Neural Network (ANN) and multivariate LR analyses were employed to extract features associated with HER2 positivity in BC and to develop an ANN model (using XGBoost features) and an LR model, respectively. The predictive value was assessed using a receiver operating characteristic (ROC) curve. Following the application of Recursive Feature Elimination with Cross-Validation (RFE-CV) for feature dimensionality reduction, the XGBoost algorithm identified tumor size, suspicious calcifications, Ki-67 index, spiculation, and minimum apparent diffusion coefficient (minimum ADC) as key feature subsets indicative of HER2-p in BC. The constructed ANN model consistently outperformed the LR model, achieving the area under the curve (AUC) of 0.853 (95% CI: 0.837-0.872) in the train cohort, 0.821 (95% CI: 0.798-0.853) in the test cohort, and 0.809 (95% CI: 0.776-0.841) in the validation cohort. The ANN model, built using the significant feature subsets identified by the XGBoost algorithm with RFE-CV, demonstrates potential in predicting HER2-p in BC.

Mixed Modality Classification Breast Prospective Clinical Pilot Academic Lab

Filter Papers

Tags

3D MedDiffusion: A 3D Medical Latent Diffusion Model for Controllable and High-quality Medical Image Generation.

Retrieval-augmented generation elevates local LLM quality in radiology contrast media consultation.

A novel few-shot learning framework for supervised diffeomorphic image registration network.

A novel neuroimaging based early detection framework for alzheimer disease using deep learning.

Enhanced security for medical images using a new 5D hyper chaotic map and deep learning based segmentation.

Hybrid deep learning architecture for scalable and high-quality image compression.

Multi-scale fusion semantic enhancement network for medical image segmentation.

Artificial Intelligence-Driven Cancer Diagnostics: Enhancing Radiology and Pathology through Reproducibility, Explainability, and Multimodality.

Dual-Modality Virtual Biopsy System Integrating MRI and MG for Noninvasive Predicting HER2 Status in Breast Cancer.

A Machine Learning Model for Predicting the HER2 Positive Expression of Breast Cancer Based on Clinicopathological and Imaging Features.

Ready to Sharpen Your Edge?