Sort by:
Page 20 of 78779 results

Applications of Small Language Models in Medical Imaging Classification with a Focus on Prompt Strategies

Yiting Wang, Ziwei Wang, Jiachen Zhong, Di Zhu, Weiyi Li

arxiv logopreprintAug 18 2025
Large language models (LLMs) have shown remarkable capabilities in natural language processing and multi-modal understanding. However, their high computational cost, limited accessibility, and data privacy concerns hinder their adoption in resource-constrained healthcare environments. This study investigates the performance of small language models (SLMs) in a medical imaging classification task, comparing different models and prompt designs to identify the optimal combination for accuracy and usability. Using the NIH Chest X-ray dataset, we evaluate multiple SLMs on the task of classifying chest X-ray positions (anteroposterior [AP] vs. posteroanterior [PA]) under three prompt strategies: baseline instruction, incremental summary prompts, and correction-based reflective prompts. Our results show that certain SLMs achieve competitive accuracy with well-crafted prompts, suggesting that prompt engineering can substantially enhance SLM performance in healthcare applications without requiring deep AI expertise from end users.

3D Cardiac Anatomy Generation Using Mesh Latent Diffusion Models

Jolanta Mozyrska, Marcel Beetz, Luke Melas-Kyriazi, Vicente Grau, Abhirup Banerjee, Alfonso Bueno-Orovio

arxiv logopreprintAug 18 2025
Diffusion models have recently gained immense interest for their generative capabilities, specifically the high quality and diversity of the synthesized data. However, examples of their applications in 3D medical imaging are still scarce, especially in cardiology. Generating diverse realistic cardiac anatomies is crucial for applications such as in silico trials, electromechanical computer simulations, or data augmentations for machine learning models. In this work, we investigate the application of Latent Diffusion Models (LDMs) for generating 3D meshes of human cardiac anatomies. To this end, we propose a novel LDM architecture -- MeshLDM. We apply the proposed model on a dataset of 3D meshes of left ventricular cardiac anatomies from patients with acute myocardial infarction and evaluate its performance in terms of both qualitative and quantitative clinical and 3D mesh reconstruction metrics. The proposed MeshLDM successfully captures characteristics of the cardiac shapes at end-diastolic (relaxation) and end-systolic (contraction) cardiac phases, generating meshes with a 2.4% difference in population mean compared to the gold standard.

Multimodal large language models for medical image diagnosis: Challenges and opportunities.

Zhang A, Zhao E, Wang R, Zhang X, Wang J, Chen E

pubmed logopapersAug 18 2025
The integration of artificial intelligence (AI) into radiology has significantly improved diagnostic accuracy and workflow efficiency. Multimodal large language models (MLLMs), which combine natural language processing (NLP) and computer vision techniques, hold the potential to further revolutionize medical image analysis. Despite these advances, their widespread clinical adoption of MLLMs remains limited by challenges such as data quality, interpretability, ethical and regulatory compliance- including adherence to frameworks like the General Data Protection Regulation (GDPR) - computational demands, and generalizability across diverse patient populations. Addressing these interconnected challenges presents opportunities to enhance MLLM performance and reliability. Priorities for future research include improving model transparency, safeguarding data privacy through federated learning, optimizing multimodal fusion strategies, and establishing standardized evaluation frameworks. By overcoming these barriers, MLLMs can become essential tools in radiology, supporting clinical decision-making, and improving patient outcomes.

Multiphysics modelling enhanced by imaging and artificial intelligence for personalised cancer nanomedicine: Foundations for clinical digital twins.

Kashkooli FM, Bhandari A, Gu B, Kolios MC, Kohandel M, Zhan W

pubmed logopapersAug 18 2025
Nano-sized drug delivery systems have emerged as a more effective, versatile means for improving cancer treatment. However, the complexity of drug delivery to cancer involves intricate interactions between physiological and physicochemical processes across various temporal and spatial scales. Relying solely on experimental methods for developing and clinically translating nano-sized drug delivery systems is economically unfeasible. Multiphysics models, acting as open systems, offer a viable approach by allowing control over the individual and combined effects of various influencing factors on drug delivery outcomes. This provides an effective pathway for developing, optimising, and applying nano-sized drug delivery systems. These models are specifically designed to uncover the underlying mechanisms of drug delivery and to optimise effective delivery strategies. This review outlines the diverse applications of multiphysics simulations in advancing nanos-sized drug delivery systems for cancer treatment. The methods to develop these models and the integration of emerging technologies (i.e., medical imaging and artificial intelligence) are also addressed towards digital twins for personalised clinical translation of cancer nanomedicine. Multiphysics modelling tools are expected to become a powerful technology, expanding the scope of nano-sized drug delivery systems, thereby greatly enhancing cancer treatment outcomes and offering promising prospects for more effective patient care.

X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning

Chee Ng, Liliang Sun, Shaoqing Tang

arxiv logopreprintAug 17 2025
Chest X-ray imaging is crucial for diagnosing pulmonary and cardiac diseases, yet its interpretation demands extensive clinical experience and suffers from inter-observer variability. While deep learning models offer high diagnostic accuracy, their black-box nature hinders clinical adoption in high-stakes medical settings. To address this, we propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework leveraging Vision-Language Large Models (LVLMs) for intelligent chest X-ray diagnosis and interpretable report generation. X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts, then employing an LLM-based component with a structured Chain-of-Thought prompting strategy to reason and produce detailed natural language diagnostic reports. Evaluated on the CORDA dataset, X-Ray-CoT achieves competitive quantitative performance, with a Balanced Accuracy of 80.52% and F1 score of 78.65% for disease diagnosis, slightly surpassing existing black-box models. Crucially, it uniquely generates high-quality, explainable reports, as validated by preliminary human evaluations. Our ablation studies confirm the integral role of each proposed component, highlighting the necessity of multi-modal fusion and CoT reasoning for robust and transparent medical AI. This work represents a significant step towards trustworthy and clinically actionable AI systems in medical imaging.

Impact of Clinical Image Quality on Efficient Foundation Model Finetuning

Yucheng Tang, Pawel Rajwa, Alexander Ng, Yipei Wang, Wen Yan, Natasha Thorley, Aqua Asif, Clare Allen, Louise Dickinson, Francesco Giganti, Shonit Punwani, Daniel C. Alexander, Veeru Kasivisvanathan, Yipeng Hu

arxiv logopreprintAug 16 2025
Foundation models in medical imaging have shown promising label efficiency, achieving high downstream performance with only a fraction of annotated data. Here, we evaluate this in prostate multiparametric MRI using ProFound, a domain-specific vision foundation model pretrained on large-scale prostate MRI datasets. We investigate how variable image quality affects label-efficient finetuning by measuring the generalisability of finetuned models. Experiments systematically vary high-/low-quality image ratios in finetuning and evaluation sets. Our findings indicate that image quality distribution and its finetune-and-test mismatch significantly affect model performance. In particular: a) Varying the ratio of high- to low-quality images between finetuning and test sets leads to notable differences in downstream performance; and b) The presence of sufficient high-quality images in the finetuning set is critical for maintaining strong performance, whilst the importance of matched finetuning and testing distribution varies between different downstream tasks, such as automated radiology reporting and prostate cancer detection.When quality ratios are consistent, finetuning needs far less labeled data than training from scratch, but label efficiency depends on image quality distribution. Without enough high-quality finetuning data, pretrained models may fail to outperform those trained without pretraining. This highlights the importance of assessing and aligning quality distributions between finetuning and deployment, and the need for quality standards in finetuning data for specific downstream tasks. Using ProFound, we show the value of quantifying image quality in both finetuning and deployment to fully realise the data and compute efficiency benefits of foundation models.

Q-FSRU: Quantum-Augmented Frequency-Spectral Fusion for Medical Visual Question Answering

Rakesh Thakur, Yusra Tariq

arxiv logopreprintAug 16 2025
Solving tough clinical questions that require both image and text understanding is still a major challenge in healthcare AI. In this work, we propose Q-FSRU, a new model that combines Frequency Spectrum Representation and Fusion (FSRU) with a method called Quantum Retrieval-Augmented Generation (Quantum RAG) for medical Visual Question Answering (VQA). The model takes in features from medical images and related text, then shifts them into the frequency domain using Fast Fourier Transform (FFT). This helps it focus on more meaningful data and filter out noise or less useful information. To improve accuracy and ensure that answers are based on real knowledge, we add a quantum-inspired retrieval system. It fetches useful medical facts from external sources using quantum-based similarity techniques. These details are then merged with the frequency-based features for stronger reasoning. We evaluated our model using the VQA-RAD dataset, which includes real radiology images and questions. The results showed that Q-FSRU outperforms earlier models, especially on complex cases needing image-text reasoning. The mix of frequency and quantum information improves both performance and explainability. Overall, this approach offers a promising way to build smart, clear, and helpful AI tools for doctors.

Developing biomarkers and methods of risk stratification: Consensus statements from the International Kidney Cancer Symposium North America 2024 Think Tank.

Shapiro DD, Abel EJ, Albiges L, Battle D, Berg SA, Campbell MT, Cella D, Coleman K, Garmezy B, Geynisman DM, Hall T, Henske EP, Jonasch E, Karam JA, La Rosa S, Leibovich BC, Maranchie JK, Master VA, Maughan BL, McGregor BA, Msaouel P, Pal SK, Perez J, Plimack ER, Psutka SP, Riaz IB, Rini BI, Shuch B, Simon MC, Singer EA, Smith A, Staehler M, Tang C, Tannir NM, Vaishampayan U, Voss MH, Zakharia Y, Zhang Q, Zhang T, Carlo MI

pubmed logopapersAug 16 2025
Accurate prognostication and personalized treatment selection remain major challenges in kidney cancer. This consensus initiative aimed to provide actionable expert guidance on the development and clinical integration of prognostic and predictive biomarkers and risk stratification tools to improve patient care and guide future research. A modified Delphi method was employed to develop consensus statements among a multidisciplinary panel of experts in urologic oncology, medical oncology, radiation oncology, pathology, molecular biology, radiology, outcomes research, biostatistics, industry, and patient advocacy. Over 3 rounds, including an in-person meeting 20 initial statements were evaluated, refined, and voted on. Consensus was defined a priori as a median Likert score ≥8. Nineteen final consensus statements were endorsed. These span key domains including biomarker prioritization (favoring prognostic biomarkers), rigorous methodology for subgroup and predictive analyses, the development of multi-institutional prospective registries, incorporation of biomarkers in trial design, and improvements in data/biospecimen access. The panel also identified high-priority biomarker types (e.g., AI-based image analysis, ctDNA) for future research. This is the first consensus statement specifically focused on biomarker and risk model development for kidney cancer using a structured Delphi process. The recommendations emphasize the need for rigorous methodology, collaborative infrastructure, prospective data collection, and focus on clinically translatable biomarkers. The resulting framework is intended to guide researchers, cooperative groups, and stakeholders in advancing personalized care for patients with kidney cancer.

VariMix: A variety-guided data mixing framework for explainable medical image classifications.

Xiong X, Sun Y, Liu X, Ke W, Lam CT, Gao Q, Tong T, Li S, Tan T

pubmed logopapersAug 16 2025
Modern deep neural networks are highly over-parameterized, necessitating the use of data augmentation techniques to prevent overfitting and enhance generalization. Generative adversarial networks (GANs) are popular for synthesizing visually realistic images. However, these synthetic images often lack diversity and may have ambiguous class labels. Recent data mixing strategies address some of these issues by mixing image labels based on salient regions. Since the main diagnostic information is not always contained within the salient regions, we aim to address the resulting label mismatches in medical image classifications. We propose a variety-guided data mixing framework (VariMix), which exploits an absolute difference map (ADM) to address the label mismatch problems of mixed medical images. VariMix generates ADM using the image-to-image (I2I) GAN across multiple classes and allows for bidirectional mixing operations between the training samples. The proposed VariMix achieves the highest accuracy of 99.30% and 94.60% with a SwinT V2 classifier on a Chest X-ray (CXR) dataset and a Retinal dataset, respectively. It also achieves the highest accuracy of 87.73%, 99.28%, 95.13%, and 95.81% with a ConvNeXt classifier on a Breast Ultrasound (US) dataset, a CXR dataset, a Retinal dataset, and a Maternal-Fetal US dataset, respectively. Furthermore, the medical expert evaluation on generated images shows the great potential of our proposed I2I GAN in improving the accuracy of medical image classifications. Extensive experiments demonstrate the superiority of VariMix compared with the existing GAN- and Mixup-based methods on four public datasets using Swin Transformer V2 and ConvNeXt architectures. Furthermore, by projecting the source image to the hyperplanes of the classifiers, the proposed I2I GAN can generate hyperplane difference maps between the source image and the hyperplane image, demonstrating its ability to interpret medical image classifications. The source code is provided in https://github.com/yXiangXiong/VariMix.

Aphasia severity prediction using a multi-modal machine learning approach.

Hu X, Varkanitsa M, Kropp E, Betke M, Ishwar P, Kiran S

pubmed logopapersAug 15 2025
The present study examined an integrated multiple neuroimaging modality (T1 structural, Diffusion Tensor Imaging (DTI), and resting-state FMRI (rsFMRI)) to predict aphasia severity using Western Aphasia Battery-Revised Aphasia Quotient (WAB-R AQ) in 76 individuals with post-stroke aphasia. We employed Support Vector Regression (SVR) and Random Forest (RF) models with supervised feature selection and a stacked feature prediction approach. The SVR model outperformed RF, achieving an average root mean square error (RMSE) of 16.38±5.57, Pearson's correlation coefficient (r) of 0.70±0.13, and mean absolute error (MAE) of 12.67±3.27, compared to RF's RMSE of 18.41±4.34, r of 0.66±0.15, and MAE of 14.64±3.04. Resting-state neural activity and structural integrity emerged as crucial predictors of aphasia severity, appearing in the top 20% of predictor combinations for both SVR and RF. Finally, the feature selection method revealed that functional connectivity in both hemispheres and between homologous language areas is critical for predicting language outcomes in patients with aphasia. The statistically significant difference in performance between the model using only single modality and the optimal multi-modal SVR/RF model (which included both resting-state connectivity and structural information) underscores that aphasia severity is influenced by factors beyond lesion location and volume. These findings suggest that integrating multiple neuroimaging modalities enhances the prediction of language outcomes in aphasia beyond lesion characteristics alone, offering insights that could inform personalized rehabilitation strategies.
Page 20 of 78779 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.