Sort by:
Page 34 of 45448 results

Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs

Zheng Sun, Yi Wei, Long Yu

arxiv logopreprintMay 29 2025
Multimodal Large Language Models (MLLMs) are of great application across many domains, such as multimodal understanding and generation. With the development of diffusion models (DM) and unified MLLMs, the performance of image generation has been significantly improved, however, the study of image screening is rare and its performance with MLLMs is unsatisfactory due to the lack of data and the week image aesthetic reasoning ability in MLLMs. In this work, we propose a complete solution to address these problems in terms of data and methodology. For data, we collect a comprehensive medical image screening dataset with 1500+ samples, each sample consists of a medical image, four generated images, and a multiple-choice answer. The dataset evaluates the aesthetic reasoning ability under four aspects: \textit{(1) Appearance Deformation, (2) Principles of Physical Lighting and Shadow, (3) Placement Layout, (4) Extension Rationality}. For methodology, we utilize long chains of thought (CoT) and Group Relative Policy Optimization with Dynamic Proportional Accuracy reward, called DPA-GRPO, to enhance the image aesthetic reasoning ability of MLLMs. Our experimental results reveal that even state-of-the-art closed-source MLLMs, such as GPT-4o and Qwen-VL-Max, exhibit performance akin to random guessing in image aesthetic reasoning. In contrast, by leveraging the reinforcement learning approach, we are able to surpass the score of both large-scale models and leading closed-source models using a much smaller model. We hope our attempt on medical image screening will serve as a regular configuration in image aesthetic reasoning in the future.

Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning

Jinquan Guan, Qi Chen, Lizhou Liang, Yuhang Liu, Vu Minh Hieu Phan, Minh-Son To, Jian Chen, Yutong Xie

arxiv logopreprintMay 29 2025
Artificial intelligence (AI)-based chest X-ray (CXR) interpretation assistants have demonstrated significant progress and are increasingly being applied in clinical settings. However, contemporary medical AI models often adhere to a simplistic input-to-output paradigm, directly processing an image and an instruction to generate a result, where the instructions may be integral to the model's architecture. This approach overlooks the modeling of the inherent diagnostic reasoning in chest X-ray interpretation. Such reasoning is typically sequential, where each interpretive stage considers the images, the current task, and the contextual information from previous stages. This oversight leads to several shortcomings, including misalignment with clinical scenarios, contextless reasoning, and untraceable errors. To fill this gap, we construct CXRTrek, a new multi-stage visual question answering (VQA) dataset for CXR interpretation. The dataset is designed to explicitly simulate the diagnostic reasoning process employed by radiologists in real-world clinical settings for the first time. CXRTrek covers 8 sequential diagnostic stages, comprising 428,966 samples and over 11 million question-answer (Q&A) pairs, with an average of 26.29 Q&A pairs per sample. Building on the CXRTrek dataset, we propose a new vision-language large model (VLLM), CXRTrekNet, specifically designed to incorporate the clinical reasoning flow into the VLLM framework. CXRTrekNet effectively models the dependencies between diagnostic stages and captures reasoning patterns within the radiological context. Trained on our dataset, the model consistently outperforms existing medical VLLMs on the CXRTrek benchmarks and demonstrates superior generalization across multiple tasks on five diverse external datasets. The dataset and model can be found in our repository (https://github.com/guanjinquan/CXRTrek).

CT-denoimer: efficient contextual transformer network for low-dose CT denoising.

Zhang Y, Xu F, Zhang R, Guo Y, Wang H, Wei B, Ma F, Meng J, Liu J, Lu H, Chen Y

pubmed logopapersMay 29 2025
Low-dose computed tomography (LDCT) effectively reduces radiation exposure to patients, but introduces severe noise artifacts that affect diagnostic accuracy. Recently, Transformer-based network architectures have been widely applied to LDCT image denoising, generally achieving superior results compared to traditional convolutional methods. However, these methods are often hindered by high computational costs and struggles in capturing complex local contextual features, which negatively impact denoising performance. In this work, we propose CT-Denoimer, an efficient CT Denoising Transformer network that captures both global correlations and intricate, spatially varying local contextual details in CT images, enabling the generation of high-quality images. The core of our framework is a Transformer module that consists of two key components: the Multi-Dconv head Transposed Attention (MDTA) and the Mixed Contextual Feed-forward Network (MCFN). The MDTA block captures global correlations in the image with linear computational complexity, while the MCFN block manages multi-scale local contextual information, both static and dynamic, through a series of Enhanced Contextual Transformer (eCoT) modules. In addition, we incorporate Operation-Wise Attention Layers (OWALs) to enable collaborative refinement in the proposed CT-Denoimer, enhancing its ability to more effectively handle complex and varying noise patterns in LDCT images. Extensive experimental validation on both the AAPM-Mayo public dataset and a real-world clinical dataset demonstrated the state-of-the-art performance of the proposed CT-Denoimer. It achieved a peak signal-to-noise ratio (PSNR) of 33.681 dB, a structural similarity index measure (SSIM) of 0.921, an information fidelity criterion (IFC) of 2.857 and a visual information fidelity (VIF) of 0.349. Subjective assessment by radiologists gave an average score of 4.39, confirming its clinical applicability and clear advantages over existing methods. This study presents an innovative CT denoising Transformer network that sets a new benchmark in LDCT image denoising, excelling in both noise reduction and fine structure preservation.

Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI.

Wang Z, Xiao M, Zhou Y, Wang C, Wu N, Li Y, Gong Y, Chang S, Chen Y, Zhu L, Zhou J, Cai C, Wang H, Jiang X, Guo D, Yang G, Qu X

pubmed logopapersMay 28 2025
Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge necessitates extensive training data in deep learning reconstruction methods. In this work, we propose a novel and efficient approach, leveraging a dimension-reduced separable learning scheme that can perform exceptionally well even with highly limited training data. We design this new approach by incorporating spatiotemporal priors into the development of a Deep Separable Spatiotemporal Learning network (DeepSSL), which unrolls an iteration process of a 2D spatiotemporal reconstruction model with both temporal lowrankness and spatial sparsity. Intermediate outputs can also be visualized to provide insights into the network behavior and enhance interpretability. Extensive results on cardiac cine datasets demonstrate that the proposed DeepSSL surpasses stateof-the-art methods both visually and quantitatively, while reducing the demand for training cases by up to 75%. Additionally, its preliminary adaptability to unseen cardiac patients has been verified through a blind reader study conducted by experienced radiologists and cardiologists. Furthermore, DeepSSL enhances the accuracy of the downstream task of cardiac segmentation and exhibits robustness in prospectively undersampled real-time cardiac MRI. DeepSSL is efficient under highly limited training data and adaptive to patients and prospective undersampling. This approach holds promise in addressing the escalating demand for high-dimensional data reconstruction in MRI applications.

Large Scale MRI Collection and Segmentation of Cirrhotic Liver.

Jha D, Susladkar OK, Gorade V, Keles E, Antalek M, Seyithanoglu D, Cebeci T, Aktas HE, Kartal GD, Kaymakoglu S, Erturk SM, Velichko Y, Ladner DP, Borhani AA, Medetalibeyoglu A, Durak G, Bagci U

pubmed logopapersMay 28 2025
Liver cirrhosis represents the end stage of chronic liver disease, characterized by extensive fibrosis and nodular regeneration that significantly increases mortality risk. While magnetic resonance imaging (MRI) offers a non-invasive assessment, accurately segmenting cirrhotic livers presents substantial challenges due to morphological alterations and heterogeneous signal characteristics. Deep learning approaches show promise for automating these tasks, but progress has been limited by the absence of large-scale, annotated datasets. Here, we present CirrMRI600+, the first comprehensive dataset comprising 628 high-resolution abdominal MRI scans (310 T1-weighted and 318 T2-weighted sequences, totaling nearly 40,000 annotated slices) with expert-validated segmentation labels for cirrhotic livers. The dataset includes demographic information, clinical parameters, and histopathological validation where available. Additionally, we provide benchmark results from 11 state-of-the-art deep learning experiments to establish performance standards. CirrMRI600+ enables the development and validation of advanced computational methods for cirrhotic liver analysis, potentially accelerating progress toward automated Cirrhosis visual staging and personalized treatment planning.

Integrating SEResNet101 and SE-VGG19 for advanced cervical lesion detection: a step forward in precision oncology.

Ye Y, Chen Y, Pan J, Li P, Ni F, He H

pubmed logopapersMay 28 2025
Cervical cancer remains a significant global health issue, with accurate differentiation between low-grade (LSIL) and high-grade squamous intraepithelial lesions (HSIL) crucial for effective screening and management. Current methods, such as Pap smears and HPV testing, often fall short in sensitivity and specificity. Deep learning models hold the potential to enhance the accuracy of cervical cancer screening but require thorough evaluation to ascertain their practical utility. This study compares the performance of two advanced deep learning models, SEResNet101 and SE-VGG19, in classifying cervical lesions using a dataset of 3,305 high-quality colposcopy images. We assessed the models based on their accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The SEResNet101 model demonstrated superior performance over SE-VGG19 across all evaluated metrics. Specifically, SEResNet101 achieved a sensitivity of 95%, a specificity of 97%, and an AUC of 0.98, compared to 89% sensitivity, 93% specificity, and an AUC of 0.94 for SE-VGG19. These findings suggest that SEResNet101 could significantly reduce both over- and under-treatment rates by enhancing diagnostic precision. Our results indicate that SEResNet101 offers a promising enhancement over existing screening methods, integrating advanced deep learning algorithms to significantly improve the precision of cervical lesion classification. This study advocates for the inclusion of SEResNet101 in clinical workflows to enhance cervical cancer screening protocols, thereby improving patient outcomes. Future work should focus on multicentric trials to validate these findings and facilitate widespread clinical adoption.

An orchestration learning framework for ultrasound imaging: Prompt-Guided Hyper-Perception and Attention-Matching Downstream Synchronization.

Lin Z, Li S, Wang S, Gao Z, Sun Y, Lam CT, Hu X, Yang X, Ni D, Tan T

pubmed logopapersMay 27 2025
Ultrasound imaging is pivotal in clinical diagnostics due to its affordability, portability, safety, real-time capability, and non-invasive nature. It is widely utilized for examining various organs, such as the breast, thyroid, ovary, cardiac, and more. However, the manual interpretation and annotation of ultrasound images are time-consuming and prone to variability among physicians. While single-task artificial intelligence (AI) solutions have been explored, they are not ideal for scaling AI applications in medical imaging. Foundation models, although a trending solution, often struggle with real-world medical datasets due to factors such as noise, variability, and the incapability of flexibly aligning prior knowledge with task adaptation. To address these limitations, we propose an orchestration learning framework named PerceptGuide for general-purpose ultrasound classification and segmentation. Our framework incorporates a novel orchestration mechanism based on prompted hyper-perception, which adapts to the diverse inductive biases required by different ultrasound datasets. Unlike self-supervised pre-trained models, which require extensive fine-tuning, our approach leverages supervised pre-training to directly capture task-relevant features, providing a stronger foundation for multi-task and multi-organ ultrasound imaging. To support this research, we compiled a large-scale Multi-task, Multi-organ public ultrasound dataset (M<sup>2</sup>-US), featuring images from 9 organs and 16 datasets, encompassing both classification and segmentation tasks. Our approach employs four specific prompts-Object, Task, Input, and Position-to guide the model, ensuring task-specific adaptability. Additionally, a downstream synchronization training stage is introduced to fine-tune the model for new data, significantly improving generalization capabilities and enabling real-world applications. Experimental results demonstrate the robustness and versatility of our framework in handling multi-task and multi-organ ultrasound image processing, outperforming both specialist models and existing general AI solutions. Compared to specialist models, our method improves segmentation from 82.26% to 86.45%, classification from 71.30% to 79.08%, while also significantly reducing model parameters.

Development of a No-Reference CT Image Quality Assessment Method Using RadImageNet Pre-trained Deep Learning Models.

Ohashi K, Nagatani Y, Yamazaki A, Yoshigoe M, Iwai K, Uemura R, Shimomura M, Tanimura K, Ishida T

pubmed logopapersMay 27 2025
Accurate assessment of computed tomography (CT) image quality is crucial for ensuring diagnostic accuracy, optimizing imaging protocols, and preventing excessive radiation exposure. In clinical settings, where high-quality reference images are often unavailable, developing no-reference image quality assessment (NR-IQA) methods is essential. Recently, CT-NR-IQA methods using deep learning have been widely studied; however, significant challenges remain in handling multiple degradation factors and accurately reflecting real-world degradations. To address these issues, we propose a novel CT-NR-IQA method. Our approach utilizes a dataset that combines two degradation factors (noise and blur) to train convolutional neural network (CNN) models capable of handling multiple degradation factors. Additionally, we leveraged RadImageNet pre-trained models (ResNet50, DenseNet121, InceptionV3, and InceptionResNetV2), allowing the models to learn deep features from large-scale real clinical images, thus enhancing adaptability to real-world degradations without relying on artificially degraded images. The models' performances were evaluated by measuring the correlation between the subjective scores and predicted image quality scores for both artificially degraded and real clinical image datasets. The results demonstrated positive correlations between the subjective and predicted scores for both datasets. In particular, ResNet50 showed the best performance, with a correlation coefficient of 0.910 for the artificially degraded images and 0.831 for the real clinical images. These findings indicate that the proposed method could serve as a potential surrogate for subjective assessment in CT-NR-IQA.

Interpretable Machine Learning Models for Differentiating Glioblastoma From Solitary Brain Metastasis Using Radiomics.

Xia X, Wu W, Tan Q, Gou Q

pubmed logopapersMay 27 2025
To develop and validate interpretable machine learning models for differentiating glioblastoma (GB) from solitary brain metastasis (SBM) using radiomics features from contrast-enhanced T1-weighted MRI (CE-T1WI), and to compare the impact of low-order and high-order features on model performance. A cohort of 434 patients with histopathologically confirmed GB (226 patients) and SBM (208 patients) was retrospectively analyzed. Radiomic features were derived from CE-T1WI, with feature selection conducted through minimum redundancy maximum relevance and least absolute shrinkage and selection operator regression. Machine learning models, including GradientBoost and lightGBM (LGBM), were trained using low-order and high-order features. The performance of the models was assessed through receiver operating characteristic analysis and computation of the area under the curve, along with other indicators, including accuracy, specificity, and sensitivity. SHapley Additive Explanations (SHAP) analysis is used to measure the influence of each feature on the model's predictions. The performances of various machine learning models on both the training and validation datasets were notably different. For the training group, the LGBM, CatBoost, multilayer perceptron (MLP), and GradientBoost models achieved the highest AUC scores, all exceeding 0.9, demonstrating strong discriminative power. The LGBM model exhibited the best stability, with a minimal AUC difference of only 0.005 between the training and test sets, suggesting strong generalizability. Among the validation group results, the GradientBoost classifier achieved the maximum AUC of 0.927, closely followed by random forest at 0.925. GradientBoost also demonstrated high sensitivity (0.911) and negative predictive value (NPV, 0.889), effectively identifying true positives. The LGBM model showed the highest test accuracy (86.2%) and performed excellently in terms of sensitivity (0.911), NPV (0.895), and positive predictive value (PPV, 0.837). The models utilizing high-order features outperformed those based on low-order features in all the metrics. SHAP analysis further enhances model interpretability, providing insights into feature importance and contributions to classification decisions. Machine learning techniques based on radiomics can effectively distinguish GB from SBM, with gradient boosting tree-based models such as LGBMs demonstrating superior performance. High-order features significantly improve model accuracy and robustness. SHAP technology enhances the interpretability and transparency of models for distinguishing brain tumors, providing intuitive visualization of the contribution of radiomic features to classification.

Automatic identification of Parkinsonism using clinical multi-contrast brain MRI: a large self-supervised vision foundation model strategy.

Suo X, Chen M, Chen L, Luo C, Kemp GJ, Lui S, Sun H

pubmed logopapersMay 27 2025
Valid non-invasive biomarkers for Parkinson's disease (PD) and Parkinson-plus syndrome (PPS) are urgently needed. Based on our recent self-supervised vision foundation model the Shift Window UNET TRansformer (Swin UNETR), which uses clinical multi-contrast whole brain MRI, we aimed to develop an efficient and practical model ('SwinClassifier') for the discrimination of PD vs PPS using routine clinical MRI scans. We used 75,861 clinical head MRI scans including T1-weighted, T2-weighted and fluid attenuated inversion recovery imaging as a pre-training dataset to develop a foundation model, using self-supervised learning with a cross-contrast context recovery task. Then clinical head MRI scans from n = 1992 participants with PD and n = 1989 participants with PPS were used as a downstream PD vs PPS classification dataset. We then assessed SwinClassifier's performance in confusion matrices compared to a comparative self-supervised vanilla Vision Transformer (ViT) autoencoder ('ViTClassifier'), and to two convolutional neural networks (DenseNet121 and ResNet50) trained from scratch. SwinClassifier showed very good performance (F1 score 0.83, 95% confidence interval [CI] [0.79-0.87], AUC 0.89) in PD vs PPS discrimination in independent test datasets (n = 173 participants with PD and n = 165 participants with PPS). This self-supervised classifier with pretrained weights outperformed the ViTClassifier and convolutional classifiers trained from scratch (F1 score 0.77-0.82, AUC 0.83-0.85). Occlusion sensitivity mapping in the correctly-classified cases (n = 160 PD and n = 114 PPS) highlighted the brain regions guiding discrimination mainly in sensorimotor and midline structures including cerebellum, brain stem, ventricle and basal ganglia. Our self-supervised digital model based on routine clinical head MRI discriminated PD vs PPS with good accuracy and sensitivity. With incremental improvements the approach may be diagnostically useful in early disease. National Key Research and Development Program of China.
Page 34 of 45448 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.