Sort by:
Page 39 of 58575 results

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen

arxiv logopreprintJun 2 2025
Providing effective treatment and making informed clinical decisions are essential goals of modern medicine and clinical care. We are interested in simulating disease dynamics for clinical decision-making, leveraging recent advances in large generative models. To this end, we introduce the Medical World Model (MeWM), the first world model in medicine that visually predicts future disease states based on clinical decisions. MeWM comprises (i) vision-language models to serve as policy models, and (ii) tumor generative models as dynamics models. The policy model generates action plans, such as clinical treatments, while the dynamics model simulates tumor progression or regression under given treatment conditions. Building on this, we propose the inverse dynamics model that applies survival analysis to the simulated post-treatment tumor, enabling the evaluation of treatment efficacy and the selection of the optimal clinical action plan. As a result, the proposed MeWM simulates disease dynamics by synthesizing post-treatment tumors, with state-of-the-art specificity in Turing tests evaluated by radiologists. Simultaneously, its inverse dynamics model outperforms medical-specialized GPTs in optimizing individualized treatment protocols across all metrics. Notably, MeWM improves clinical decision-making for interventional physicians, boosting F1-score in selecting the optimal TACE protocol by 13%, paving the way for future integration of medical world models as the second readers.

Beyond Pixel Agreement: Large Language Models as Clinical Guardrails for Reliable Medical Image Segmentation

Jiaxi Sheng, Leyi Yu, Haoyue Li, Yifan Gao, Xin Gao

arxiv logopreprintJun 2 2025
Evaluating AI-generated medical image segmentations for clinical acceptability poses a significant challenge, as traditional pixelagreement metrics often fail to capture true diagnostic utility. This paper introduces Hierarchical Clinical Reasoner (HCR), a novel framework that leverages Large Language Models (LLMs) as clinical guardrails for reliable, zero-shot quality assessment. HCR employs a structured, multistage prompting strategy that guides LLMs through a detailed reasoning process, encompassing knowledge recall, visual feature analysis, anatomical inference, and clinical synthesis, to evaluate segmentations. We evaluated HCR on a diverse dataset across six medical imaging tasks. Our results show that HCR, utilizing models like Gemini 2.5 Flash, achieved a classification accuracy of 78.12%, performing comparably to, and in instances exceeding, dedicated vision models such as ResNet50 (72.92% accuracy) that were specifically trained for this task. The HCR framework not only provides accurate quality classifications but also generates interpretable, step-by-step reasoning for its assessments. This work demonstrates the potential of LLMs, when appropriately guided, to serve as sophisticated evaluators, offering a pathway towards more trustworthy and clinically-aligned quality control for AI in medical imaging.

Exploring <i>SLC25A42</i> as a Radiogenomic Marker from the Perioperative Stage to Chemotherapy in Hepatitis-Related Hepatocellular Carcinoma.

Dou L, Jiang J, Yao H, Zhang B, Wang X

pubmed logopapersJun 2 2025
<b><i>Background:</i></b> The molecular mechanisms driving hepatocellular carcinoma (HCC) and predict the chemotherapy sensitive remain unclear; therefore, identification of these key biomarkers is essential for early diagnosis and treatment of HCC. <b><i>Method:</i></b> We collected and processed Computed Tomography (CT) and clinical data from 116 patients with autoimmune hepatitis (AIH) and HCC who came to our hospital's Liver Cancer Center. We then identified and extracted important characteristic features of significant patient images and correlated them with mitochondria-related genes using machine learning techniques such as multihead attention networks, lasso regression, principal component analysis (PCA), and support vector machines (SVM). These genes were integrated into radiomics signature models to explore their role in disease progression. We further correlated these results with clinical variables to screen for driver genes and evaluate the predict ability of chemotherapy sensitive of key genes in liver cancer (LC) patients. Finally, qPCR was used to validate the expression of this gene in patient samples. <b><i>Results:</i></b> Our study utilized attention networks to identify disease regions in medical images with 97% accuracy and an AUC of 94%. We extracted 942 imaging features, identifying five key features through lasso regression that accurately differentiate AIH from HCC. Transcriptome analysis revealed 132 upregulated and 101 downregulated genes in AIH, with 45 significant genes identified by XGBOOST. In HCC analysis, PCA and random forest highlighted 11 key features. Among mitochondrial genes, <i>SLC25A42</i> correlated positively with normal tissue imaging features but negatively with cancerous tissues and was identified as a driver gene. Low expression of <i>SLC25A42</i> was associated with chemotherapy sensitive in HCC patients. <b><i>Conclusions:</i></b> In conclusion, machine learning modeling combined with genomic profiling provides a promising approach to identify the driver gene <i>SLC25A42</i> in LC, which may help improve diagnostic accuracy and chemotherapy sensitivity for this disease.

ViTU-net: A hybrid deep learning model with patch-based LSB approach for medical image watermarking and authentication using a hybrid metaheuristic algorithm.

Nanammal V, Rajalakshmi S, Remya V, Ranjith S

pubmed logopapersJun 2 2025
In modern healthcare, telemedicine, health records, and AI-driven diagnostics depend on medical image watermarking to secure chest X-rays for pneumonia diagnosis, ensuring data integrity, confidentiality, and authenticity. A 2024 study found over 70 % of healthcare institutions faced medical image data breaches. Yet, current methods falter in imperceptibility, robustness against attacks, and deployment efficiency. ViTU-Net integrates cutting-edge techniques to address these multifaceted challenges in medical image security and analysis. The model's core component, the Vision Transformer (ViT) encoder, efficiently captures global dependencies and spatial information, while the U-Net decoder enhances image reconstruction, with both components leveraging the Adaptive Hierarchical Spatial Attention (AHSA) module for improved spatial processing. Additionally, the patch-based LSB embedding mechanism ensures focused embedding of reversible fragile watermarks within each patch of the segmented non-diagnostic region (RONI), guided dynamically by adaptive masks derived from the attention mechanism, minimizing impact on diagnostic accuracy while maximizing precision and ensuring optimal utilization of spatial information. The hybrid meta-heuristic optimization algorithm, TuniBee Fusion, dynamically optimizes watermarking parameters, striking a balance between exploration and exploitation, thereby enhancing watermarking efficiency and robustness. The incorporation of advanced cryptographic techniques, including SHA-512 hashing and AES encryption, fortifies the model's security, ensuring the authenticity and confidentiality of watermarked medical images. A PSNR value of 60.7 dB, along with an NCC value of 0.9999 and an SSIM value of 1.00, underscores its effectiveness in preserving image quality, security, and diagnostic accuracy. Robustness analysis against a spectrum of attacks validates ViTU-Net's resilience in real-world scenarios.

Synthetic Ultrasound Image Generation for Breast Cancer Diagnosis Using cVAE-WGAN Models: An Approach Based on Generative Artificial Intelligence

Mondillo, G., Masino, M., Colosimo, S., Perrotta, A., Frattolillo, V., Abbate, F. G.

medrxiv logopreprintJun 2 2025
The scarcity and imbalance of medical image datasets hinder the development of robust computer-aided diagnosis (CAD) systems for breast cancer. This study explores the application of advanced generative models, based on generative artificial intelligence (GenAI), for the synthesis of digital breast ultrasound images. Using a hybrid Conditional Variational Autoencoder-Wasserstein Generative Adversarial Network (CVAE-WGAN) architecture, we developed a system to generate high-quality synthetic images conditioned on the class (malignant vs. normal/benign). These synthetic images, generated from the low-resolution BreastMNIST dataset and filtered for quality, were systematically integrated with real training data at different mixing ratios (W). The performance of a CNN classifier trained on these mixed datasets was evaluated against a baseline model trained only on real data balanced with SMOTE. The optimal integration (mixing weight W=0.25) produced a significant performance increase on the real test set: +8.17% in macro-average F1-score and +4.58% in accuracy compared to using real data alone. Analysis confirmed the originality of the generated samples. This approach offers a promising solution for overcoming data limitations in image-based breast cancer diagnostics, potentially improving the capabilities of CAD systems.

Fine-tuned large Language model for extracting newly identified acute brain infarcts based on computed tomography or magnetic resonance imaging reports.

Fujita N, Yasaka K, Kiryu S, Abe O

pubmed logopapersJun 2 2025
This study aimed to develop an automated early warning system using a large language model (LLM) to identify acute to subacute brain infarction from free-text computed tomography (CT) or magnetic resonance imaging (MRI) radiology reports. In this retrospective study, 5,573, 1,883, and 834 patients were included in the training (mean age, 67.5 ± 17.2 years; 2,831 males), validation (mean age, 61.5 ± 18.3 years; 994 males), and test (mean age, 66.5 ± 16.1 years; 488 males) datasets. An LLM (Japanese Bidirectional Encoder Representations from Transformers model) was fine-tuned to classify the CT and MRI reports into three groups (group 0, newly identified acute to subacute infarction; group 1, known acute to subacute infarction or old infarction; group 2, without infarction). The training and validation processes were repeated 15 times, and the best-performing model on the validation dataset was selected to further evaluate its performance on the test dataset. The best fine-tuned model exhibited sensitivities of 0.891, 0.905, and 0.959 for groups 0, 1, and 2, respectively, in the test dataset. The macrosensitivity (the average of sensitivity for all groups) and accuracy were 0.918 and 0.923, respectively. The model's performance in extracting newly identified acute brain infarcts was high, with an area under the receiver operating characteristic curve of 0.979 (95% confidence interval, 0.956-1.000). The average prediction time was 0.115 ± 0.037 s per patient. A fine-tuned LLM could extract newly identified acute to subacute brain infarcts based on CT or MRI findings with high performance.

Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models.

Lian C, Zhou HY, Liang D, Qin J, Wang L

pubmed logopapersJun 2 2025
Medical vision-language alignment through cross-modal contrastive learning shows promising performance in image-text matching tasks, such as retrieval and zero-shot classification. However, conventional cross-modal contrastive learning (CLIP-based) methods suffer from suboptimal visual representation capabilities, which also limits their effectiveness in vision-language alignment. In contrast, although the models pretrained via multimodal masked modeling struggle with direct cross-modal matching, they excel in visual representation. To address this contradiction, we propose ALTA (ALign Through Adapting), an efficient medical vision-language alignment method that utilizes only about 8% of the trainable parameters and less than 1/5 of the computational consumption required for masked record modeling. ALTA achieves superior performance in vision-language matching tasks like retrieval and zero-shot classification by adapting the pretrained vision model from masked record modeling. Additionally, we integrate temporal-multiview radiograph inputs to enhance the information consistency between radiographs and their corresponding descriptions in reports, further improving the vision-language alignment. Experimental evaluations show that ALTA outperforms the best-performing counterpart by over 4% absolute points in text-to-image accuracy and approximately 6% absolute points in image-to-text retrieval accuracy. The adaptation of vision-language models during efficient alignment also promotes better vision and language understanding. Code is publicly available at https://github.com/DopamineLcy/ALTA.

Efficiency and Quality of Generative AI-Assisted Radiograph Reporting.

Huang J, Wittbrodt MT, Teague CN, Karl E, Galal G, Thompson M, Chapa A, Chiu ML, Herynk B, Linchangco R, Serhal A, Heller JA, Abboud SF, Etemadi M

pubmed logopapersJun 2 2025
Diagnostic imaging interpretation involves distilling multimodal clinical information into text form, a task well-suited to augmentation by generative artificial intelligence (AI). However, to our knowledge, impacts of AI-based draft radiological reporting remain unstudied in clinical settings. To prospectively evaluate the association of radiologist use of a workflow-integrated generative model capable of providing draft radiological reports for plain radiographs across a tertiary health care system with documentation efficiency, the clinical accuracy and textual quality of final radiologist reports, and the model's potential for detecting unexpected, clinically significant pneumothorax. This prospective cohort study was conducted from November 15, 2023, to April 24, 2024, at a tertiary care academic health system. The association between use of the generative model and radiologist documentation efficiency was evaluated for radiographs documented with model assistance compared with a baseline set of radiographs without model use, matched by study type (chest or nonchest). Peer review was performed on model-assisted interpretations. Flagging of pneumothorax requiring intervention was performed on radiographs prospectively. The primary outcomes were association of use of the generative model with radiologist documentation efficiency, assessed by difference in documentation time with and without model use using a linear mixed-effects model; for peer review of model-assisted reports, the difference in Likert-scale ratings using a cumulative-link mixed model; and for flagging pneumothorax requiring intervention, sensitivity and specificity. A total of 23 960 radiographs (11 980 each with and without model use) were used to analyze documentation efficiency. Interpretations with model assistance (mean [SE], 159.8 [27.0] seconds) were faster than the baseline set of those without (mean [SE], 189.2 [36.2] seconds) (P = .02), representing a 15.5% documentation efficiency increase. Peer review of 800 studies showed no difference in clinical accuracy (χ2 = 0.68; P = .41) or textual quality (χ2 = 3.62; P = .06) between model-assisted interpretations and nonmodel interpretations. Moreover, the model flagged studies containing a clinically significant, unexpected pneumothorax with a sensitivity of 72.7% and specificity of 99.9% among 97 651 studies screened. In this prospective cohort study of clinical use of a generative model for draft radiological reporting, model use was associated with improved radiologist documentation efficiency while maintaining clinical quality and demonstrated potential to detect studies containing a pneumothorax requiring immediate intervention. This study suggests the potential for radiologist and generative AI collaboration to improve clinical care delivery.

Multi-modal large language models in radiology: principles, applications, and potential.

Shen Y, Xu Y, Ma J, Rui W, Zhao C, Heacock L, Huang C

pubmed logopapersJun 1 2025
Large language models (LLMs) and multi-modal large language models (MLLMs) represent the cutting-edge in artificial intelligence. This review provides a comprehensive overview of their capabilities and potential impact on radiology. Unlike most existing literature reviews focusing solely on LLMs, this work examines both LLMs and MLLMs, highlighting their potential to support radiology workflows such as report generation, image interpretation, EHR summarization, differential diagnosis generation, and patient education. By streamlining these tasks, LLMs and MLLMs could reduce radiologist workload, improve diagnostic accuracy, support interdisciplinary collaboration, and ultimately enhance patient care. We also discuss key limitations, such as the limited capacity of current MLLMs to interpret 3D medical images and to integrate information from both image and text data, as well as the lack of effective evaluation methods. Ongoing efforts to address these challenges are introduced.

Deep Learning in Digital Breast Tomosynthesis: Current Status, Challenges, and Future Trends.

Wang R, Chen F, Chen H, Lin C, Shuai J, Wu Y, Ma L, Hu X, Wu M, Wang J, Zhao Q, Shuai J, Pan J

pubmed logopapersJun 1 2025
The high-resolution three-dimensional (3D) images generated with digital breast tomosynthesis (DBT) in the screening of breast cancer offer new possibilities for early disease diagnosis. Early detection is especially important as the incidence of breast cancer increases. However, DBT also presents challenges in terms of poorer results for dense breasts, increased false positive rates, slightly higher radiation doses, and increased reading times. Deep learning (DL) has been shown to effectively increase the processing efficiency and diagnostic accuracy of DBT images. This article reviews the application and outlook of DL in DBT-based breast cancer screening. First, the fundamentals and challenges of DBT technology are introduced. The applications of DL in DBT are then grouped into three categories: diagnostic classification of breast diseases, lesion segmentation and detection, and medical image generation. Additionally, the current public databases for mammography are summarized in detail. Finally, this paper analyzes the main challenges in the application of DL techniques in DBT, such as the lack of public datasets and model training issues, and proposes possible directions for future research, including large language models, multisource domain transfer, and data augmentation, to encourage innovative applications of DL in medical imaging.
Page 39 of 58575 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.