Sort by:
Page 139 of 3993984 results

Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis

Xiaojiao Xiao, Qinmin Vivian Hu, Guanghui Wang

arxiv logopreprintJul 22 2025
Medical image synthesis plays a crucial role in clinical workflows, addressing the common issue of missing imaging modalities due to factors such as extended scan times, scan corruption, artifacts, patient motion, and intolerance to contrast agents. The paper presents a novel image synthesis network, the Pyramid Hierarchical Masked Diffusion Model (PHMDiff), which employs a multi-scale hierarchical approach for more detailed control over synthesizing high-quality images across different resolutions and layers. Specifically, this model utilizes randomly multi-scale high-proportion masks to speed up diffusion model training, and balances detail fidelity and overall structure. The integration of a Transformer-based Diffusion model process incorporates cross-granularity regularization, modeling the mutual information consistency across each granularity's latent spaces, thereby enhancing pixel-level perceptual accuracy. Comprehensive experiments on two challenging datasets demonstrate that PHMDiff achieves superior performance in both the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), highlighting its capability to produce high-quality synthesized images with excellent structural integrity. Ablation studies further confirm the contributions of each component. Furthermore, the PHMDiff model, a multi-scale image synthesis framework across and within medical imaging modalities, shows significant advantages over other methods. The source code is available at https://github.com/xiaojiao929/PHMDiff

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

Nand Kumar Yadav, Rodrigue Rizk, William CW Chen, KC Santosh

arxiv logopreprintJul 22 2025
Accurate and efficient medical image segmentation is crucial but challenging due to anatomical variability and high computational demands on volumetric data. Recent hybrid CNN-Transformer architectures achieve state-of-the-art results but add significant complexity. In this paper, we propose MLRU++, a Multiscale Lightweight Residual UNETR++ architecture designed to balance segmentation accuracy and computational efficiency. It introduces two key innovations: a Lightweight Channel and Bottleneck Attention Module (LCBAM) that enhances contextual feature encoding with minimal overhead, and a Multiscale Bottleneck Block (M2B) in the decoder that captures fine-grained details via multi-resolution feature aggregation. Experiments on four publicly available benchmark datasets (Synapse, BTCV, ACDC, and Decathlon Lung) demonstrate that MLRU++ achieves state-of-the-art performance, with average Dice scores of 87.57% (Synapse), 93.00% (ACDC), and 81.12% (Lung). Compared to existing leading models, MLRU++ improves Dice scores by 5.38% and 2.12% on Synapse and ACDC, respectively, while significantly reducing parameter count and computational cost. Ablation studies evaluating LCBAM and M2B further confirm the effectiveness of the proposed architectural components. Results suggest that MLRU++ offers a practical and high-performing solution for 3D medical image segmentation tasks. Source code is available at: https://github.com/1027865/MLRUPP

Semantic Segmentation for Preoperative Planning in Transcatheter Aortic Valve Replacement

Cedric Zöllner, Simon Reiß, Alexander Jaus, Amroalalaa Sholi, Ralf Sodian, Rainer Stiefelhagen

arxiv logopreprintJul 22 2025
When preoperative planning for surgeries is conducted on the basis of medical images, artificial intelligence methods can support medical doctors during assessment. In this work, we consider medical guidelines for preoperative planning of the transcatheter aortic valve replacement (TAVR) and identify tasks, that may be supported via semantic segmentation models by making relevant anatomical structures measurable in computed tomography scans. We first derive fine-grained TAVR-relevant pseudo-labels from coarse-grained anatomical information, in order to train segmentation models and quantify how well they are able to find these structures in the scans. Furthermore, we propose an adaptation to the loss function in training these segmentation models and through this achieve a +1.27% Dice increase in performance. Our fine-grained TAVR-relevant pseudo-labels and the computed tomography scans we build upon are available at https://doi.org/10.5281/zenodo.16274176.

A Benchmark Framework for the Right Atrium Cavity Segmentation From LGE-MRIs.

Bai J, Zhu J, Chen Z, Yang Z, Lu Y, Li L, Li Q, Wang W, Zhang H, Wang K, Gan J, Zhao J, Lu H, Li S, Huang J, Chen X, Zhang X, Xu X, Li L, Tian Y, Campello VM, Lekadir K

pubmed logopapersJul 22 2025
The right atrium (RA) is critical for cardiac hemodynamics but is often overlooked in clinical diagnostics. This study presents a benchmark framework for RA cavity segmentation from late gadolinium-enhanced magnetic resonance imaging (LGE-MRIs), leveraging a two-stage strategy and a novel 3D deep learning network, RASnet. The architecture addresses challenges in class imbalance and anatomical variability by incorporating multi-path input, multi-scale feature fusion modules, Vision Transformers, context interaction mechanisms, and deep supervision. Evaluated on datasets comprising 354 LGE-MRIs, RASnet achieves SOTA performance with a Dice score of 92.19% on a primary dataset and demonstrates robust generalizability on an independent dataset. The proposed framework establishes a benchmark for RA cavity segmentation, enabling accurate and efficient analysis for cardiac imaging applications. Open-source code (https://github.com/zjinw/RAS) and data (https://zenodo.org/records/15524472) are provided to facilitate further research and clinical adoption.

Training Language Models for Estimating Priority Levels in Ultrasound Examination Waitlists: Algorithm Development and Validation.

Masayoshi K, Hashimoto M, Toda N, Mori H, Kobayashi G, Haque H, So M, Jinzaki M

pubmed logopapersJul 22 2025
Ultrasound examinations, while valuable, are time-consuming and often limited in availability. Consequently, many hospitals implement reservation systems; however, these systems typically lack prioritization for examination purposes. Hence, our hospital uses a waitlist system that prioritizes examination requests based on their clinical value when slots become available due to cancellations. This system, however, requires a manual review of examination purposes, which are recorded in free-form text. We hypothesized that artificial intelligence language models could preliminarily estimate the priority of requests before manual reviews. This study aimed to investigate potential challenges associated with using language models for estimating the priority of medical examination requests and to evaluate the performance of language models in processing Japanese medical texts. We retrospectively collected ultrasound examination requests from the waitlist system at Keio University Hospital, spanning January 2020 to March 2023. Each request comprised an examination purpose documented by the requesting physician and a 6-tier priority level assigned by a radiologist during the clinical workflow. We fine-tuned JMedRoBERTa, Luke, OpenCalm, and LLaMA2 under two conditions: (1) tuning only the final layer and (2) tuning all layers using either standard backpropagation or low-rank adaptation. We had 2335 and 204 requests in the training and test datasets post cleaning. When only the final layers were tuned, JMedRoBERTa outperformed the other models (Kendall coefficient=0.225). With full fine-tuning, JMedRoBERTa continued to perform best (Kendall coefficient=0.254), though with reduced margins compared with the other models. The radiologist's retrospective re-evaluation yielded a Kendall coefficient of 0.221. Language models can estimate the priority of examination requests with accuracy comparable with that of human radiologists. The fine-tuning results indicate that general-purpose language models can be adapted to domain-specific texts (ie, Japanese medical texts) with sufficient fine-tuning. Further research is required to address priority rank ambiguity, expand the dataset across multiple institutions, and explore more recent language models with potentially higher performance or better suitability for this task.

A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion

Yalda Zafari, Roaa Elalfy, Mohamed Mabrok, Somaya Al-Maadeed, Tamer Khattab, Essam A. Rashed

arxiv logopreprintJul 22 2025
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection, yet it remains a complex challenge due to subtle imaging findings and diagnostic ambiguity. Many existing AI approaches fall short by focusing on single view inputs or single-task outputs, limiting their clinical utility. To address these limitations, we propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views and jointly predicts diagnostic labels and BI-RADS scores for each breast. Our architecture integrates a hybrid CNN VSSM backbone, combining convolutional encoders for rich local feature extraction with Visual State Space Models (VSSMs) to capture global contextual dependencies. To improve robustness and interpretability, we incorporate a gated attention-based fusion module that dynamically weights information across views, effectively handling cases with missing data. We conduct extensive experiments across diagnostic tasks of varying complexity, benchmarking our proposed hybrid models against baseline CNN architectures and VSSM models in both single task and multi task learning settings. Across all tasks, the hybrid models consistently outperform the baselines. In the binary BI-RADS 1 vs. 5 classification task, the shared hybrid model achieves an AUC of 0.9967 and an F1 score of 0.9830. For the more challenging ternary classification, it attains an F1 score of 0.7790, while in the five-class BI-RADS task, the best F1 score reaches 0.4904. These results highlight the effectiveness of the proposed hybrid framework and underscore both the potential and limitations of multitask learning for improving diagnostic performance and enabling clinically meaningful mammography analysis.

CLIF-Net: Intersection-guided Cross-view Fusion Network for Infection Detection from Cranial Ultrasound

Yu, M., Peterson, M. R., Burgoine, K., Harbaugh, T., Olupot-Olupot, P., Gladstone, M., Hagmann, C., Cowan, F. M., Weeks, A., Morton, S. U., Mulondo, R., Mbabazi-Kabachelor, E., Schiff, S. J., Monga, V.

medrxiv logopreprintJul 22 2025
This paper addresses the problem of detecting possible serious bacterial infection (pSBI) of infancy, i.e. a clinical presentation consistent with bacterial sepsis in newborn infants using cranial ultrasound (cUS) images. The captured image set for each patient enables multiview imagery: coronal and sagittal, with geometric overlap. To exploit this geometric relation, we develop a new learning framework, called the intersection-guided Crossview Local-and Image-level Fusion Network (CLIF-Net). Our technique employs two distinct convolutional neural network branches to extract features from coronal and sagittal images with newly developed multi-level fusion blocks. Specifically, we leverage the spatial position of these images to locate the intersecting region. We then identify and enhance the semantic features from this region across multiple levels using cross-attention modules, facilitating the acquisition of mutually beneficial and more representative features from both views. The final enhanced features from the two views are then integrated and projected through the image-level fusion layer, outputting pSBI and non-pSBI class probabilities. We contend that our method of exploiting multi-view cUS images enables a first of its kind, robust 3D representation tailored for pSBI detection. When evaluated on a dataset of 302 cUS scans from Mbale Regional Referral Hospital in Uganda, CLIF-Net demonstrates substantially enhanced performance, surpassing the prevailing state-of-the-art infection detection techniques.

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

Nand Kumar Yadav, Rodrigue Rizk, William CW Chen, KC

arxiv logopreprintJul 22 2025
Accurate and efficient medical image segmentation is crucial but challenging due to anatomical variability and high computational demands on volumetric data. Recent hybrid CNN-Transformer architectures achieve state-of-the-art results but add significant complexity. In this paper, we propose MLRU++, a Multiscale Lightweight Residual UNETR++ architecture designed to balance segmentation accuracy and computational efficiency. It introduces two key innovations: a Lightweight Channel and Bottleneck Attention Module (LCBAM) that enhances contextual feature encoding with minimal overhead, and a Multiscale Bottleneck Block (M2B) in the decoder that captures fine-grained details via multi-resolution feature aggregation. Experiments on four publicly available benchmark datasets (Synapse, BTCV, ACDC, and Decathlon Lung) demonstrate that MLRU++ achieves state-of-the-art performance, with average Dice scores of 87.57% (Synapse), 93.00% (ACDC), and 81.12% (Lung). Compared to existing leading models, MLRU++ improves Dice scores by 5.38% and 2.12% on Synapse and ACDC, respectively, while significantly reducing parameter count and computational cost. Ablation studies evaluating LCBAM and M2B further confirm the effectiveness of the proposed architectural components. Results suggest that MLRU++ offers a practical and high-performing solution for 3D medical image segmentation tasks. Source code is available at: https://github.com/1027865/MLRUPP

MAN-GAN: a mask-adaptive normalization based generative adversarial networks for liver multi-phase CT image generation.

Zhao W, Chen W, Fan L, Shang Y, Wang Y, Situ W, Li W, Liu T, Yuan Y, Liu J

pubmed logopapersJul 22 2025
Liver multiphase enhanced computed tomography (MPECT) is vital in clinical practice, but its utility is limited by various factors. We aimed to develop a deep learning network capable of automatically generating MPECT images from standard non-contrast CT scans. Dataset 1 included 374 patients and was divided into three parts: a training set, a validation set and a test set. Dataset 2 included 144 patients with one specific liver disease and was used as an internal test dataset. We further collected another dataset comprising 83 patients for external validation. Then, we propose a Mask-Adaptive Normalization-based Generative Adversarial Network with Cycle-Consistency Loss (MAN-GAN) to achieve non-contrast CT to MPECT translation. To assess the efficiency of MAN-GAN, we conducted a comparative analysis with state-of-the-art methods commonly employed in diverse medical image synthesis tasks. Moreover, two subjective radiologist evaluation studies were performed to verify the clinical usefulness of the generated images. MAN-GAN outperformed the baseline network and other state-of-the-art methods in all generations of the three phases. These results were verified in internal and external datasets. According to radiological evaluation, the image quality of generated three phase images are all above average. Moreover, the similarities between real images and generated images in all three phases are satisfactory. MAN-GAN demonstrates the feasibility of liver MPECT image translation based on non-contrast images and achieves state-of-the-art performance via the subtraction strategy. It has great potential for solving the dilemma of liver CT contrast canning and aiding further liver interaction clinical scenarios.

Role of Brain Age Gap as a Mediator in the Relationship Between Cognitive Impairment Risk Factors and Cognition.

Tan WY, Huang X, Huang J, Robert C, Cui J, Chen CPLH, Hilal S

pubmed logopapersJul 22 2025
Cerebrovascular disease (CeVD) and cognitive impairment risk factors contribute to cognitive decline, but the role of brain age gap (BAG) in mediating this relationship remains unclear, especially in Southeast Asian populations. This study investigated the influence of cognitive impairment risk factors on cognition and examined how BAG mediates this relationship, particularly in individuals with varying CeVD burden. This cross-sectional study analyzed Singaporean community and memory clinic participants. Cognitive impairment risk factors were assessed using the Cognitive Impairment Scoring System (CISS), encompassing 11 sociodemographic and vascular factors. Cognition was assessed through a neuropsychological battery, evaluating global cognition and 6 cognitive domains: executive function, attention, memory, language, visuomotor speed, and visuoconstruction. Brain age was derived from structural MRI features using ensemble machine learning model. Propensity score matching balanced risk profiles between model training and the remaining sample. Structural equation modeling examined the mediation effect of BAG on CISS-cognition relationship, stratified by CeVD burden (high: CeVD+, low: CeVD-). The study included 1,437 individuals without dementia, with 646 in the matched sample (mean age 66.4 ± 6.0 years, 47% female, 60% with no cognitive impairment). Higher CISS was consistently associated with poorer cognitive performance across all domains, with the strongest negative associations in visuomotor speed (β = -2.70, <i>p</i> < 0.001) and visuoconstruction (β = -3.02, <i>p</i> < 0.001). Among the CeVD+ group, BAG significantly mediated the relationship between CISS and global cognition (proportion mediated: 19.95%, <i>p</i> = 0.01), with the strongest mediation effects in executive function (34.1%, <i>p</i> = 0.03) and language (26.6%, <i>p</i> = 0.008). BAG also mediated the relationship between CISS and memory (21.1%) and visuoconstruction (14.4%) in the CeVD+ group, but these effects diminished after statistical adjustments. Our findings suggest that BAG is a key intermediary linking cognitive impairment risk factors to cognitive function, particularly in individuals with high CeVD burden. This mediation effect is domain-specific, with executive function, language, and visuoconstruction being the most vulnerable to accelerated brain aging. Limitations of this study include the cross-sectional design, limiting causal inference, and the focus on Southeast Asian populations, limiting generalizability. Future longitudinal studies should verify these relationships and explore additional factors not captured in our model.
Page 139 of 3993984 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.