Latest Papers on Radiology AI. Tags: None

Edge Computing for Physics-Driven AI in Computational MRI: A Feasibility Study

Yaşar Utku Alçalar, Yu Cao, Mehmet Akçakaya

•preprint•May 30 2025

Physics-driven artificial intelligence (PD-AI) reconstruction methods have emerged as the state-of-the-art for accelerating MRI scans, enabling higher spatial and temporal resolutions. However, the high resolution of these scans generates massive data volumes, leading to challenges in transmission, storage, and real-time processing. This is particularly pronounced in functional MRI, where hundreds of volumetric acquisitions further exacerbate these demands. Edge computing with FPGAs presents a promising solution for enabling PD-AI reconstruction near the MRI sensors, reducing data transfer and storage bottlenecks. However, this requires optimization of PD-AI models for hardware efficiency through quantization and bypassing traditional FFT-based approaches, which can be a limitation due to their computational demands. In this work, we propose a novel PD-AI computational MRI approach optimized for FPGA-based edge computing devices, leveraging 8-bit complex data quantization and eliminating redundant FFT/IFFT operations. Our results show that this strategy improves computational efficiency while maintaining reconstruction quality comparable to conventional PD-AI methods, and outperforms standard clinical methods. Our approach presents an opportunity for high-resolution MRI reconstruction on resource-constrained devices, highlighting its potential for real-world deployment.

MRI Reconstruction Methodology In Silico Academic Lab Reproducibility

Deep Modeling and Optimization of Medical Image Classification

Yihang Wu, Muhammad Owais, Reem Kateb, Ahmad Chaddad

•preprint•May 29 2025

Deep models, such as convolutional neural networks (CNNs) and vision transformer (ViT), demonstrate remarkable performance in image classification. However, those deep models require large data to fine-tune, which is impractical in the medical domain due to the data privacy issue. Furthermore, despite the feasible performance of contrastive language image pre-training (CLIP) in the natural domain, the potential of CLIP has not been fully investigated in the medical field. To face these challenges, we considered three scenarios: 1) we introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer, 2) we combine 12 deep models with two federated learning techniques to protect data privacy, and 3) we involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data. The experimental results indicate that maxvit shows the highest averaged (AVG) test metrics (AVG = 87.03\%) in HAM10000 dataset with multimodal learning, while convnext\_l demonstrates remarkable test with an F1-score of 83.98\% compared to swin\_b with 81.33\% in FL model. Furthermore, the use of support vector machine (SVM) can improve the overall test metrics with AVG of $\sim 2\%$ for swin transformer series in ISIC2018. Our codes are available at https://github.com/AIPMLab/SkinCancerSimulation.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code

Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning

Jinquan Guan, Qi Chen, Lizhou Liang, Yuhang Liu, Vu Minh Hieu Phan, Minh-Son To, Jian Chen, Yutong Xie

•preprint•May 29 2025

Artificial intelligence (AI)-based chest X-ray (CXR) interpretation assistants have demonstrated significant progress and are increasingly being applied in clinical settings. However, contemporary medical AI models often adhere to a simplistic input-to-output paradigm, directly processing an image and an instruction to generate a result, where the instructions may be integral to the model's architecture. This approach overlooks the modeling of the inherent diagnostic reasoning in chest X-ray interpretation. Such reasoning is typically sequential, where each interpretive stage considers the images, the current task, and the contextual information from previous stages. This oversight leads to several shortcomings, including misalignment with clinical scenarios, contextless reasoning, and untraceable errors. To fill this gap, we construct CXRTrek, a new multi-stage visual question answering (VQA) dataset for CXR interpretation. The dataset is designed to explicitly simulate the diagnostic reasoning process employed by radiologists in real-world clinical settings for the first time. CXRTrek covers 8 sequential diagnostic stages, comprising 428,966 samples and over 11 million question-answer (Q&A) pairs, with an average of 26.29 Q&A pairs per sample. Building on the CXRTrek dataset, we propose a new vision-language large model (VLLM), CXRTrekNet, specifically designed to incorporate the clinical reasoning flow into the VLLM framework. CXRTrekNet effectively models the dependencies between diagnostic stages and captures reasoning patterns within the radiological context. Trained on our dataset, the model consistently outperforms existing medical VLLMs on the CXRTrek benchmarks and demonstrates superior generalization across multiple tasks on five diverse external datasets. The dataset and model can be found in our repository (https://github.com/guanjinquan/CXRTrek).

X-Ray LLM Radiology Report Chest Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA GenAI

Multimodal medical image-to-image translation via variational autoencoder latent space mapping.

Liang Z, Cheng M, Ma J, Hu Y, Li S, Tian X

•papers•May 29 2025

Medical image translation has become an essential tool in modern radiotherapy, providing complementary information for target delineation and dose calculation. However, current approaches are constrained by their modality-specific nature, requiring separate model training for each pair of imaging modalities. This limitation hinders the efficient deployment of comprehensive multimodal solutions in clinical practice. To develop a unified image translation method using variational autoencoder (VAE) latent space mapping, which enables flexible conversion between different medical imaging modalities to meet clinical demands. We propose a three-stage approach to construct a unified image translation model. Initially, a VAE is trained to learn a shared latent space for various medical images. A stacked bidirectional transformer is subsequently utilized to learn the mapping between different modalities within the latent space under the guidance of the image modality. Finally, the VAE decoder is fine-tuned to improve image quality. Our internal dataset collected paired imaging data from 87 head and neck cases, with each case containing cone beam computed tomography (CBCT), computed tomography (CT), MR T1c, and MR T2W images. The effectiveness of this strategy is quantitatively evaluated on our internal dataset and a public dataset by the mean absolute error (MAE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Additionally, the dosimetry characteristics of the synthetic CT images are evaluated, and subjective quality assessments of the synthetic MR images are conducted to determine their clinical value. The VAE with the Kullback‒Leibler (KL)-16 image tokenizer demonstrates superior image reconstruction ability, achieving a Fréchet inception distance (FID) of 4.84, a PSNR of 32.80 dB, and an SSIM of 92.33%. In synthetic CT tasks, the model shows greater accuracy in intramodality translations than in cross-modality translations, as evidenced by an MAE of 21.60 ± 8.80 Hounsfield unit (HU) in the CBCT-to-CT task and 45.23 ± 13.21 HU/47.55 ± 13.88 in the MR T1c/T2w-to-CT tasks. For the cross-contrast MR translation tasks, the results are very close, with mean PSNR and SSIM values of 26.33 ± 1.36 dB and 85.21% ± 2.21%, respectively, for the T1c-to-T2w translation and 26.03 ± 1.67 dB and 85.73% ± 2.66%, respectively, for the T2w-to-T1c translation. Dosimetric results indicate that all the gamma pass rates for synthetic CTs are higher than 99% for photon intensity-modulated radiation therapy (IMRT) planning. However, the subjective quality assessment scores for synthetic MR images are lower than those for real MR images. The proposed three-stage approach successfully develops a unified image translation model that can effectively handle a wide range of medical image translation tasks. This flexibility and effectiveness make it a valuable tool for clinical applications.

Mixed Modality Image Synthesis Neurological Methodology In Silico Academic Lab GenAI

Deep learning reconstruction for improved image quality of ultra-high-resolution brain CT angiography: application in moyamoya disease.

Ma Y, Nakajima S, Fushimi Y, Funaki T, Otani S, Takiya M, Matsuda A, Kozawa S, Fukushima Y, Okuchi S, Sakata A, Yamamoto T, Sakamoto R, Chihara H, Mineharu Y, Arakawa Y, Nakamoto Y

•papers•May 29 2025

To investigate vessel delineation and image quality of ultra-high-resolution (UHR) CT angiography (CTA) reconstructed using deep learning reconstruction (DLR) optimised for brain CTA (DLR-brain) in moyamoya disease (MMD), compared with DLR optimised for body CT (DLR-body) and hybrid iterative reconstruction (Hybrid-IR). This retrospective study included 50 patients with suspected or diagnosed MMD who underwent UHR brain CTA. All images were reconstructed using DLR-brain, DLR-body, and Hybrid-IR. Quantitative analysis focussed on moyamoya perforator vessels in the basal ganglia and periventricular anastomosis. For these small vessels, edge sharpness, peak CT number, vessel contrast, full width at half maximum (FWHM), and image noise were measured and compared. Qualitative analysis was performed by visual assessment to compare vessel delineation and image quality. DLR-brain significantly improved edge sharpness, peak CT number, vessel contrast, and FWHM, and significantly reduced image noise compared with DLR-body and Hybrid-IR (P < 0.05). DLR-brain significantly outperformed the other algorithms in the visual assessment (P < 0.001). DLR-brain provided superior visualisation of small intracranial vessels compared with DLR-body and Hybrid-IR in UHR brain CTA.

CT Reconstruction Neurological Retrospective Clinical In Silico Academic Lab

ADC-MambaNet: A Lightweight U-Shaped Architecture with Mamba and Multi-Dimensional Priority Attention for Medical Image Segmentation.

Nguyen TN, Ho QH, Nguyen VQ, Pham VT, Tran TT

•papers•May 29 2025

Medical image segmentation is becoming a growing crucial step in assisting with disease detection and diagnosis. However, medical images often exhibit complex structures and textures, resulting in the need for highly complex methods. Particularly, when Deep Learning methods are utilized, they often require large-scale pretraining, leading to significant memory demands and increased computational costs. The well-known Convolutional Neural Networks (CNNs) have become the backbone of medical image segmentation tasks thanks to their effective feature extraction abilities. However, they often struggle to capture global context due to the limited sizes of their kernels. To address this, various Transformer-based models have been introduced to learn long-range dependencies through self-attention mechanisms. However, these architectures typically incur relatively high computational complexity.Methods: To address the aforementioned challenges, we propose a lightweight and computationally efficient model named ADC-MambaNet, which combines the conventional Depthwise Convolutional layers with the Mamba algorithm that can address the computational complexity of Transformers. In the proposed model, a new feature extractor named Harmonious Mamba-Convolution (HMC) block, and the Multi-Dimensional Priority Attention (MDPA) block have been designed. These blocks enhance the feature extraction process, thereby improving the overall performance of the model. In particular, the mechanisms enable the model to effectively capture local and global patterns from the feature maps while keeping the computational costs low. A novel loss function called the Balanced Normalized Cross Entropy is also introduced, bringing promising performance compared to other losses. Evaluations on five public medical image datasets: ISIC 2018 Lesion Segmentation, PH2, Data Science Bowl 2018, GlaS, and Lung X-ray demonstrate that ADC-MambaNet achieves higher evaluation scores while maintaining compact parameters and low computational complexity.Conclusion: ADC-MambaNet offers a promising solution for accurate and efficient medical image segmentation, especially in resource-limited or edge-computing environments. Implementation code will be publicly accessible at: https://github.com/nqnguyen812/mambaseg-model.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Artificial intelligence-enabled opportunistic identification of immune checkpoint inhibitor-related adverse events using [<sup>18</sup>F]FDG PET/CT.

Spielvogel CP, Lazarevic A, Zisser L, Haberl D, Eseroglou C, Beer L, Hacker M, Calabretta R

•papers•May 29 2025

PET Detection Whole Body Retrospective Clinical In Silico Academic Lab

CT-denoimer: efficient contextual transformer network for low-dose CT denoising.

Zhang Y, Xu F, Zhang R, Guo Y, Wang H, Wei B, Ma F, Meng J, Liu J, Lu H, Chen Y

•papers•May 29 2025

Low-dose computed tomography (LDCT) effectively reduces radiation exposure to patients, but introduces severe noise artifacts that affect diagnostic accuracy. Recently, Transformer-based network architectures have been widely applied to LDCT image denoising, generally achieving superior results compared to traditional convolutional methods. However, these methods are often hindered by high computational costs and struggles in capturing complex local contextual features, which negatively impact denoising performance. In this work, we propose CT-Denoimer, an efficient CT Denoising Transformer network that captures both global correlations and intricate, spatially varying local contextual details in CT images, enabling the generation of high-quality images. The core of our framework is a Transformer module that consists of two key components: the Multi-Dconv head Transposed Attention (MDTA) and the Mixed Contextual Feed-forward Network (MCFN). The MDTA block captures global correlations in the image with linear computational complexity, while the MCFN block manages multi-scale local contextual information, both static and dynamic, through a series of Enhanced Contextual Transformer (eCoT) modules. In addition, we incorporate Operation-Wise Attention Layers (OWALs) to enable collaborative refinement in the proposed CT-Denoimer, enhancing its ability to more effectively handle complex and varying noise patterns in LDCT images. Extensive experimental validation on both the AAPM-Mayo public dataset and a real-world clinical dataset demonstrated the state-of-the-art performance of the proposed CT-Denoimer. It achieved a peak signal-to-noise ratio (PSNR) of 33.681 dB, a structural similarity index measure (SSIM) of 0.921, an information fidelity criterion (IFC) of 2.857 and a visual information fidelity (VIF) of 0.349. Subjective assessment by radiologists gave an average score of 4.39, confirming its clinical applicability and clear advantages over existing methods. This study presents an innovative CT denoising Transformer network that sets a new benchmark in LDCT image denoising, excelling in both noise reduction and fine structure preservation.

CT Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Free-running isotropic three-dimensional cine magnetic resonance imaging with deep learning image reconstruction.

Erdem S, Erdem O, Stebbings S, Greil G, Hussain T, Zou Q

•papers•May 29 2025

Cardiovascular magnetic resonance (CMR) cine imaging is the gold standard for assessing ventricular volumes and function. It typically requires two-dimensional (2D) bSSFP sequences and multiple breath-holds, which can be challenging for patients with limited breath-holding capacity. Three-dimensional (3D) cardiovascular magnetic resonance angiography (MRA) usually suffers from lengthy acquisition. Free-running 3D cine imaging with deep learning (DL) reconstruction offers a potential solution by acquiring both cine and angiography simultaneously. To evaluate the efficiency and accuracy of a ferumoxytol-enhanced 3D cine imaging MR sequence combined with DL reconstruction and Heart-NAV technology in patients with congenital heart disease. This Institutional Review Board approved this prospective study that compared (i) functional and volumetric measurements between 3 and 2D cine images; (ii) contrast-to-noise ratio (CNR) between deep-learning (DL) and compressed sensing (CS)-reconstructed 3D cine images; and (iii) cross-sectional area (CSA) measurements between DL-reconstructed 3D cine images and the clinical 3D MRA images acquired using the bSSFP sequence. Paired t-tests were used to compare group measurements, and Bland-Altman analysis assessed agreement in CSA and volumetric data. Sixteen patients (seven males; median age 6 years) were recruited. 3D cine imaging showed slightly larger right ventricular (RV) volumes and lower RV ejection fraction (EF) compared to 2D cine, with a significant difference only in RV end-systolic volume (P = 0.02). Left ventricular (LV) volumes and EF were slightly higher, and LV mass was lower, without significant differences (P ≥ 0.05). DL-reconstructed 3D cine images showed significantly higher CNR in all pulmonary veins than CS-reconstructed 3D cine images (all P < 0.05). Highly accelerated free-running 3D cine imaging with DL reconstruction shortens acquisition times and provides comparable volumetric measurements to 2D cine, and comparable CSA to clinical 3D MRA.

MRI Reconstruction Cardiac Prospective Clinical Pilot Academic Lab

Estimating Head Motion in Structural MRI Using a Deep Neural Network Trained on Synthetic Artifacts

Charles Bricout, Samira Ebrahimi Kahou, Sylvain Bouix

•preprint•May 29 2025

Motion-related artifacts are inevitable in Magnetic Resonance Imaging (MRI) and can bias automated neuroanatomical metrics such as cortical thickness. Manual review cannot objectively quantify motion in anatomical scans, and existing automated approaches often require specialized hardware or rely on unbalanced noisy training data. Here, we train a 3D convolutional neural network to estimate motion severity using only synthetically corrupted volumes. We validate our method with one held-out site from our training cohort and with 14 fully independent datasets, including one with manual ratings, achieving a representative $R^2 = 0.65$ versus manual labels and significant thickness-motion correlations in 12/15 datasets. Furthermore, our predicted motion correlates with subject age in line with prior studies. Our approach generalizes across scanner brands and protocols, enabling objective, scalable motion assessment in structural MRI studies without prospective motion correction.

MRI Classification Neurological Methodology In Silico Academic Lab

Filter Papers

Tags

Edge Computing for Physics-Driven AI in Computational MRI: A Feasibility Study

Deep Modeling and Optimization of Medical Image Classification

Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning

Multimodal medical image-to-image translation via variational autoencoder latent space mapping.

Deep learning reconstruction for improved image quality of ultra-high-resolution brain CT angiography: application in moyamoya disease.

ADC-MambaNet: A Lightweight U-Shaped Architecture with Mamba and Multi-Dimensional Priority Attention for Medical Image Segmentation.

Artificial intelligence-enabled opportunistic identification of immune checkpoint inhibitor-related adverse events using [<sup>18</sup>F]FDG PET/CT.

CT-denoimer: efficient contextual transformer network for low-dose CT denoising.

Free-running isotropic three-dimensional cine magnetic resonance imaging with deep learning image reconstruction.

Estimating Head Motion in Structural MRI Using a Deep Neural Network Trained on Synthetic Artifacts

Ready to Sharpen Your Edge?