Latest Papers on Radiology AI. Tags: Mixed Modality

Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis

Xiaojiao Xiao, Qinmin Vivian Hu, Guanghui Wang

•preprint•Jul 22 2025

Medical image synthesis plays a crucial role in clinical workflows, addressing the common issue of missing imaging modalities due to factors such as extended scan times, scan corruption, artifacts, patient motion, and intolerance to contrast agents. The paper presents a novel image synthesis network, the Pyramid Hierarchical Masked Diffusion Model (PHMDiff), which employs a multi-scale hierarchical approach for more detailed control over synthesizing high-quality images across different resolutions and layers. Specifically, this model utilizes randomly multi-scale high-proportion masks to speed up diffusion model training, and balances detail fidelity and overall structure. The integration of a Transformer-based Diffusion model process incorporates cross-granularity regularization, modeling the mutual information consistency across each granularity's latent spaces, thereby enhancing pixel-level perceptual accuracy. Comprehensive experiments on two challenging datasets demonstrate that PHMDiff achieves superior performance in both the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), highlighting its capability to produce high-quality synthesized images with excellent structural integrity. Ablation studies further confirm the contributions of each component. Furthermore, the PHMDiff model, a multi-scale image synthesis framework across and within medical imaging modalities, shows significant advantages over other methods. The source code is available at https://github.com/xiaojiao929/PHMDiff

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Open Code

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy.

Kermani MZ, Tavakoli MB, Khorasani A, Abedi I, Sadeghi V, Amouheidari A

•papers•Jul 22 2025

Radiotherapy is a crucial treatment for brain tumor malignancies. To address the limitations of CT-based treatment planning, recent research has explored MR-only radiotherapy, requiring precise MR-to-CT synthesis. This study compares two deep learning approaches, supervised (Pix2Pix) and unsupervised (CycleGAN), for generating pseudo-CT (pCT) images from T1- and T2-weighted MR sequences. 3270 paired T1- and T2-weighted MRI images were collected and registered with corresponding CT images. After preprocessing, a supervised pCT generative model was trained using the Pix2Pix framework, and an unsupervised generative network (CycleGAN) was also trained to enable a comparative assessment of pCT quality relative to the Pix2Pix model. To assess differences between pCT and reference CT images, three key metrics (SSIM, PSNR, and MAE) were used. Additionally, a dosimetric evaluation was performed on selected cases to assess clinical relevance. The average SSIM, PSNR, and MAE for Pix2Pix on T1 images were 0.964 ± 0.03, 32.812 ± 5.21, and 79.681 ± 9.52 HU, respectively. Statistical analysis revealed that Pix2Pix significantly outperformed CycleGAN in generating high-fidelity pCT images (p < 0.05). There was no notable difference in the effectiveness of T1-weighted versus T2-weighted MR images for generating pCT (p > 0.05). Dosimetric evaluation confirmed comparable dose distributions between pCT and reference CT, supporting clinical feasibility. Both supervised and unsupervised methods demonstrated the capability to generate accurate pCT images from conventional T1- and T2-weighted MR sequences. While supervised methods like Pix2Pix achieve higher accuracy, unsupervised approaches such as CycleGAN offer greater flexibility by eliminating the need for paired training data, making them suitable for applications where paired data is unavailable.

Mixed Modality Image Synthesis Neurological Retrospective Clinical In Silico

Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation

Ghassen Baklouti, Julio Silva-Rodríguez, Jose Dolz, Houda Bahig, Ismail Ben Ayed

•preprint•Jul 21 2025

Parameter-efficient fine-tuning (PEFT) of pre-trained foundation models is increasingly attracting interest in medical imaging due to its effectiveness and computational efficiency. Among these methods, Low-Rank Adaptation (LoRA) is a notable approach based on the assumption that the adaptation inherently occurs in a low-dimensional subspace. While it has shown good performance, its implementation requires a fixed and unalterable rank, which might be challenging to select given the unique complexities and requirements of each medical imaging downstream task. Inspired by advancements in natural image processing, we introduce a novel approach for medical image segmentation that dynamically adjusts the intrinsic rank during adaptation. Viewing the low-rank representation of the trainable weight matrices as a singular value decomposition, we introduce an l_1 sparsity regularizer to the loss function, and tackle it with a proximal optimizer. The regularizer could be viewed as a penalty on the decomposition rank. Hence, its minimization enables to find task-adapted ranks automatically. Our method is evaluated in a realistic few-shot fine-tuning setting, where we compare it first to the standard LoRA and then to several other PEFT methods across two distinguishable tasks: base organs and novel organs. Our extensive experiments demonstrate the significant performance improvements driven by our method, highlighting its efficiency and robustness against suboptimal rank initialization. Our code is publicly available: https://github.com/ghassenbaklouti/ARENA

Mixed Modality Segmentation Abdominal Methodology In Silico Academic Lab Open Code

Imaging-aided diagnosis and treatment based on artificial intelligence for pulmonary nodules: A review.

Gao H, Li J, Wu Y, Tang Z, He X, Zhao F, Chen Y, He X

•papers•Jul 21 2025

Pulmonary nodules are critical indicators for the early detection of lung cancer; however, their diagnosis and management pose significant challenges due to the variability in nodule characteristics, reader fatigue, and limited clinical expertise, often leading to diagnostic errors. The rapid advancement of artificial intelligence (AI) presents promising solutions to address these issues. This review compares traditional rule-based methods, handcrafted feature-based machine learning, radiomics, deep learning, and hybrid models incorporating Transformers or attention mechanisms. It systematically compares their methodologies, clinical applications (diagnosis, treatment, prognosis), and dataset usage to evaluate performance, applicability, and limitations in pulmonary nodule management. AI advances have significantly improved pulmonary nodule management, with transformer-based models achieving leading accuracy in segmentation, classification, and subtyping. The fusion of multimodal imaging CT, PET, and MRI further enhances diagnostic precision. Additionally, AI aids treatment planning and prognosis prediction by integrating radiomics with clinical data. Despite these advances, challenges remain, including domain shift, high computational demands, limited interpretability, and variability across multi-center datasets. Artificial intelligence (AI) has transformative potential in improving the diagnosis and treatment of lung nodules, especially in improving the accuracy of lung cancer treatment and patient prognosis, where significant progress has been made.

Mixed Modality Classification Chest Review Concept GenAI Benchmark SOTA

LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning.

Che H, Jin H, Gu Z, Lin Y, Jin C, Chen H

•papers•Jul 21 2025

Large Language Models (LLMs) have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge, we present FedMRG, the first framework that leverages Federated Learning (FL) to enable privacy-preserving, multi-center development of LLM-driven MRG models, specifically designed to overcome the critical challenge of communication-efficient LLM training under multi-modal data heterogeneity. To start with, our framework tackles the fundamental challenge of communication overhead in federated LLM tuning by employing low-rank factorization to efficiently decompose parameter updates, significantly reducing gradient transmission costs and making LLM-driven MRG feasible in bandwidth-constrained FL settings. Furthermore, we observed the dual heterogeneity in MRG under the FL scenario: varying image characteristics across medical centers, as well as diverse reporting styles and terminology preferences. To address the data heterogeneity, we further enhance FedMRG with (1) client-aware contrastive learning in the MRG encoder, coupled with diagnosis-driven prompts, which capture both globally generalizable and locally distinctive features while maintaining diagnostic accuracy; and (2) a dual-adapter mutual boosting mechanism in the MRG decoder that harmonizes generic and specialized adapters to address variations in reporting styles and terminology. Through extensive evaluation of our established FL-MRG benchmark, we demonstrate the generalizability and adaptability of FedMRG, underscoring its potential in harnessing multi-center data and generating clinically accurate reports while maintaining communication efficiency.

Mixed Modality Report Generation Methodology In Silico Academic Lab GenAI

SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging

Salah Eddine Bekhouche, Gaby Maroun, Fadi Dornaika, Abdenour Hadid

•preprint•Jul 21 2025

Medical image segmentation is crucial for many healthcare tasks, including disease diagnosis and treatment planning. One key area is the segmentation of skin lesions, which is vital for diagnosing skin cancer and monitoring patients. In this context, this paper introduces SegDT, a new segmentation model based on diffusion transformer (DiT). SegDT is designed to work on low-cost hardware and incorporates Rectified Flow, which improves the generation quality at reduced inference steps and maintains the flexibility of standard diffusion models. Our method is evaluated on three benchmarking datasets and compared against several existing works, achieving state-of-the-art results while maintaining fast inference speeds. This makes the proposed model appealing for real-world medical applications. This work advances the performance and capabilities of deep learning models in medical image analysis, enabling faster, more accurate diagnostic tools for healthcare professionals. The code is made publicly available at \href{https://github.com/Bekhouche/SegDT}{GitHub}.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA Open Code

Cascaded Multimodal Deep Learning in the Differential Diagnosis, Progression Prediction, and Staging of Alzheimer's and Frontotemporal Dementia

Guarnier, G., Reinelt, J., Molloy, E. N., Mihai, P. G., Einaliyan, P., Valk, S., Modestino, A., Ugolini, M., Mueller, K., Wu, Q., Babayan, A., Castellaro, M., Villringer, A., Scherf, N., Thierbach, K., Schroeter, M. L., Alzheimers Disease Neuroimaging Initiative,, Australian Imaging Biomarkers and Lifestyle flagship study of ageing,, Frontotemporal Lobar Degeneration Neuroimaging Initiative,

•preprint•Jul 21 2025

Dementia is a complex condition whose multifaceted nature poses significant challenges in the diagnosis, prognosis, and treatment of patients. Despite the availability of large open-source data fueling a wealth of promising research, effective translation of preclinical findings to clinical practice remains difficult. This barrier is largely due to the complexity of unstructured and disparate preclinical and clinical data, which traditional analytical methods struggle to handle. Novel analytical techniques involving Deep Learning (DL), however, are gaining significant traction in this regard. Here, we have investigated the potential of a cascaded multimodal DL-based system (TelDem), assessing the ability to integrate and analyze a large, heterogeneous dataset (n=7,159 patients), applied to three clinically relevant use cases. Using a Cascaded Multi-Modal Mixing Transformer (CMT), we assessed TelDems validity and (using a Cross-Modal Fusion Norm - CMFN) model explainability in (i) differential diagnosis between healthy individuals, AD, and three sub-types of frontotemporal lobar degeneration (ii) disease staging from healthy cognition to mild cognitive impairment (MCI) and AD, and (iii) predicting progression from MCI to AD. Our findings show that the CMT enhances diagnostic and prognostic accuracy when incorporating multimodal data compared to unimodal modeling and that cerebrospinal fluid (CSF) biomarkers play a key role in accurate model decision making. These results reinforce the power of DL technology in tapping deeper into already existing data, thereby accelerating preclinical dementia research by utilizing clinically relevant information to disentangle complex dementia pathophysiology.

Mixed Modality Classification Neurological Retrospective Clinical In Silico

PXseg: automatic tooth segmentation, numbering and abnormal morphology detection based on CBCT and panoramic radiographs.

Wang R, Cheng F, Dai G, Zhang J, Fan C, Yu J, Li J, Jiang F

•papers•Jul 21 2025

PXseg, a novel approach for tooth segmentation, numbering and abnormal morphology detection in panoramic X-ray (PX), was designed and promoted through optimizing annotation and applying pre-training. Derived from multicenter, ctPXs generated from cone beam computed tomography (CBCT) with accurate 3D labels were utilized for pre-training, while conventional PXs (cPXs) with 2D labels were input for training. Visual and statistical analyses were conducted using the internal dataset to assess segmentation and numbering performances of PXseg and compared with the model without ctPX pre-training, while the accuracy of PXseg detecting abnormal teeth was evaluated using the external dataset consisting of cPXs with complex dental diseases. Besides, a diagnostic testing was performed to contrast diagnostic efficiency with and without PXseg's assistance. The DSC and F1-score of PXseg in tooth segmentation reached 0.882 and 0.902, which increased by 4.6% and 4.0% compared to the model without pre-training. For tooth numbering, the F1-score of PXseg reached 0.943 and increased by 2.2%. Based on the promotion in segmentation, the accuracy of abnormal tooth morphology detection exceeded 0.957 and was 4.3% higher. A website was constructed to assist in PX interpretation, and the diagnostic efficiency was greatly enhanced with the assistance of PXseg. The application of accurate labels in ctPX increased the pre-training weight of PXseg and improved the training effect, achieving promotions in tooth segmentation, numbering and abnormal morphology detection. Rapid and accurate results provided by PXseg streamlined the workflow of PX diagnosis, possessing significant clinical application prospect.

Mixed Modality Segmentation Retrospective Clinical In Silico Academic Lab Benchmark SOTA

The safety and accuracy of radiation-free spinal navigation using a short, scoliosis-specific BoneMRI-protocol, compared to CT.

Lafranca PPG, Rommelspacher Y, Walter SG, Muijs SPJ, van der Velden TA, Shcherbakova YM, Castelein RM, Ito K, Seevinck PR, Schlösser TPC

•papers•Jul 21 2025

Spinal navigation systems require pre- and/or intra-operative 3-D imaging, which expose young patients to harmful radiation. We assessed a scoliosis-specific MRI-protocol that provides T2-weighted MRI and AI-generated synthetic-CT (sCT) scans, through deep learning algorithms. This study aims to compare MRI-based synthetic-CT spinal navigation to CT for safety and accuracy of pedicle screw planning and placement at thoracic and lumbar levels. Spines of 5 cadavers were scanned with thin-slice CT and the scoliosis-specific MRI-protocol (to create sCT). Preoperatively, on both CT and sCT screw trajectories were planned. Subsequently, four spine surgeons performed surface-matched, navigated placement of 2.5 mm k-wires in all pedicles from T3 to L5. Randomization for CT/sCT, surgeon and side was performed (1:1 ratio). On postoperative CT-scans, virtual screws were simulated over k-wires. Maximum angulation, distance between planned and postoperative screw positions and medial breach rate (Gertzbein-Robbins classification) were assessed. 140 k-wires were inserted, 3 were excluded. There were no pedicle breaches > 2 mm. Of sCT-guided screws, 59 were grade A and 10 grade B. For the CT-guided screws, 47 were grade A and 21 grade B (p = 0.022). Average distance (± SD) between intraoperative and postoperative screw positions was 2.3 ± 1.5 mm in sCT-guided screws, and 2.4 ± 1.8 mm for CT (p = 0.78), average maximum angulation (± SD) was 3.8 ± 2.5° for sCT and 3.9 ± 2.9° for CT (p = 0.75). MRI-based, AI-generated synthetic-CT spinal navigation allows for safe and accurate planning and placement of thoracic and lumbar pedicle screws in a cadaveric model, without significant differences in distance and angulation between planned and postoperative screw positions compared to CT.

Mixed Modality Image Synthesis Musculoskeletal Retrospective Clinical Phantom/Animal Academic Lab Reproducibility

Facilitators and Barriers to Implementing AI in Routine Medical Imaging: Systematic Review and Qualitative Analysis.

Wenderott K, Krups J, Weigl M, Wooldridge AR

•papers•Jul 21 2025

Artificial intelligence (AI) is rapidly advancing in health care, particularly in medical imaging, offering potential for improved efficiency and reduced workload. However, there is little systematic evidence on process factors for successful AI technology implementation into clinical workflows. This study aimed to systematically assess and synthesize the facilitators and barriers to AI implementation reported in studies evaluating AI solutions in routine medical imaging. We conducted a systematic review of 6 medical databases. Using a qualitative content analysis, we extracted the reported facilitators and barriers, outcomes, and moderators in the implementation process of AI. Two reviewers analyzed and categorized the data separately. We then used epistemic network analysis to explore their relationships across different stages of AI implementation. Our search yielded 13,756 records. After screening, we included 38 original studies in our final review. We identified 12 key dimensions and 37 subthemes that influence the implementation of AI in health care workflows. Key dimensions included evaluation of AI use and fit into workflow, with frequency depending considerably on the stage of the implementation process. In total, 20 themes were mentioned as both facilitators and barriers to AI implementation. Studies often focused predominantly on performance metrics over the experiences or outcomes of clinicians. This systematic review provides a thorough synthesis of facilitators and barriers to successful AI implementation in medical imaging. Our study highlights the usefulness of AI technologies in clinical care and the fit of their integration into routine clinical workflows. Most studies did not directly report facilitators and barriers to AI implementation, underscoring the importance of comprehensive reporting to foster knowledge sharing. Our findings reveal a predominant focus on technological aspects of AI adoption in clinical work, highlighting the need for holistic, human-centric consideration to fully leverage the potential of AI in health care. PROSPERO CRD42022303439; https://www.crd.york.ac.uk/PROSPERO/view/CRD42022303439. RR2-10.2196/40485.

Mixed Modality Review Clinical Pilot Academic Lab Policy Ethics

Filter Papers

Tags

Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy.

Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation

Imaging-aided diagnosis and treatment based on artificial intelligence for pulmonary nodules: A review.

LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning.

SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging

Cascaded Multimodal Deep Learning in the Differential Diagnosis, Progression Prediction, and Staging of Alzheimer's and Frontotemporal Dementia

PXseg: automatic tooth segmentation, numbering and abnormal morphology detection based on CBCT and panoramic radiographs.

The safety and accuracy of radiation-free spinal navigation using a short, scoliosis-specific BoneMRI-protocol, compared to CT.

Facilitators and Barriers to Implementing AI in Routine Medical Imaging: Systematic Review and Qualitative Analysis.

Ready to Sharpen Your Edge?