Sort by:
Page 7 of 58574 results

A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

Ziruo Yi, Jinyu Liu, Ting Xiao, Mark V. Albert

arxiv logopreprintAug 4 2025
Radiology visual question answering (RVQA) provides precise answers to questions about chest X-ray images, alleviating radiologists' workload. While recent methods based on multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have shown promising progress in RVQA, they still face challenges in factual accuracy, hallucinations, and cross-modal misalignment. We introduce a multi-agent system (MAS) designed to support complex reasoning in RVQA, with specialized agents for context understanding, multimodal reasoning, and answer validation. We evaluate our system on a challenging RVQA set curated via model disagreement filtering, comprising consistently hard cases across multiple MLLMs. Extensive experiments demonstrate the superiority and effectiveness of our system over strong MLLM baselines, with a case study illustrating its reliability and interpretability. This work highlights the potential of multi-agent approaches to support explainable and trustworthy clinical AI applications that require complex reasoning.

Development and Validation of an Explainable MRI-Based Habitat Radiomics Model for Predicting p53-Abnormal Endometrial Cancer: A Multicentre Feasibility Study.

Jin W, Zhang H, Ning Y, Chen X, Zhang G, Li H, Zhang H

pubmed logopapersAug 4 2025
We developed an MRI-based habitat radiomics model (HRM) to predict p53-abnormal (p53abn) molecular subtypes of endometrial cancer (EC). Patients with pathologically confirmed EC were retrospectively enrolled from three hospitals and categorized into a training cohort (n = 270), test cohort 1 (n = 70), and test cohort 2 (n = 154). The tumour was divided into habitat sub-regions using diffusion-weighted imaging (DWI) and contrast-enhanced (CE) images with the K-means algorithm. Radiomics features were extracted from T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), DWI, and CE images. Three machine learning classifiers-logistic regression, support vector machines, and random forests-were applied to develop predictive models for p53abn EC. Model performance was validated using receiver operating characteristic (ROC) curves, and the model with the best predictive performance was selected as the HRM. A whole-region radiomics model (WRM) was also constructed, and a clinical model (CM) with five clinical features was developed. The SHApley Additive ExPlanations (SHAP) method was used to explain the outputs of the models. DeLong's test evaluated and compared the performance across the cohorts. A total of 1920 habitat radiomics features were considered. Eight features were selected for the HRM, ten for the WRM, and three clinical features for the CM. The HRM achieved the highest AUC: 0.855 (training), 0.769 (test1), and 0.766 (test2). The AUCs of the WRM were 0.707 (training), 0.703 (test1), and 0.738 (test2). The AUCs of the CM were 0.709 (training), 0.641 (test1), and 0.665 (test2). The MRI-based HRM successfully predicted p53abn EC. The results indicate that habitat combined with machine learning, radiomics, and SHAP can effectively predict p53abn EC, providing clinicians with intuitive insights and interpretability regarding the impact of risk factors in the model.

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.

Yao MS, Chae A, Saraiya P, Kahn CE, Witschey WR, Gee JC, Sagreiya H, Bastani O

pubmed logopapersAug 4 2025
Diagnostic imaging studies are increasingly important in the management of acutely presenting patients. However, ordering appropriate imaging studies in the emergency department is a challenging task with a high degree of variability among healthcare providers. To address this issue, recent work has investigated whether generative AI and large language models can be leveraged to recommend diagnostic imaging studies in accordance with evidence-based medical guidelines. However, it remains challenging to ensure that these tools can provide recommendations that correctly align with medical guidelines, especially given the limited diagnostic information available in acute care settings. In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that align with the American College of Radiology's Appropriateness Criteria, a set of evidence-based guidelines. To power our experiments, we introduce RadCases, a dataset of over 1500 annotated case summaries reflecting common patient presentations, and apply our framework to enable state-of-the-art language models to reason about appropriate imaging choices. Using our framework, state-of-the-art language models achieve accuracy comparable to clinicians in ordering imaging studies. Furthermore, we demonstrate that our language model-based pipeline can be used as an intelligent assistant by clinicians to support image ordering workflows and improve the accuracy of acute image ordering according to the American College of Radiology's Appropriateness Criteria. Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision-making in alignment with expert evidence-based guidelines.

Accurate and Interpretable Postmenstrual Age Prediction via Multimodal Large Language Model

Qifan Chen, Jin Cui, Cindy Duan, Yushuo Han, Yifei Shi

arxiv logopreprintAug 4 2025
Accurate estimation of postmenstrual age (PMA) at scan is crucial for assessing neonatal development and health. While deep learning models have achieved high accuracy in predicting PMA from brain MRI, they often function as black boxes, offering limited transparency and interpretability in clinical decision support. In this work, we address the dual challenge of accuracy and interpretability by adapting a multimodal large language model (MLLM) to perform both precise PMA prediction and clinically relevant explanation generation. We introduce a parameter-efficient fine-tuning (PEFT) strategy using instruction tuning and Low-Rank Adaptation (LoRA) applied to the Qwen2.5-VL-7B model. The model is trained on four 2D cortical surface projection maps derived from neonatal MRI scans. By employing distinct prompts for training and inference, our approach enables the MLLM to handle a regression task during training and generate clinically relevant explanations during inference. The fine-tuned model achieves a low prediction error with a 95 percent confidence interval of 0.78 to 1.52 weeks, while producing interpretable outputs grounded in developmental features, marking a significant step toward transparent and trustworthy AI systems in perinatal neuroscience.

S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework

Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou

arxiv logopreprintAug 4 2025
Radiology report generation (RRG) for diagnostic images, such as chest X-rays, plays a pivotal role in both clinical practice and AI. Traditional free-text reports suffer from redundancy and inconsistent language, complicating the extraction of critical clinical details. Structured radiology report generation (S-RRG) offers a promising solution by organizing information into standardized, concise formats. However, existing approaches often rely on classification or visual question answering (VQA) pipelines that require predefined label sets and produce only fragmented outputs. Template-based approaches, which generate reports by replacing keywords within fixed sentence patterns, further compromise expressiveness and often omit clinically important details. In this work, we present a novel approach to S-RRG that includes dataset construction, model training, and the introduction of a new evaluation framework. We first create a robust chest X-ray dataset (MIMIC-STRUC) that includes disease names, severity levels, probabilities, and anatomical locations, ensuring that the dataset is both clinically relevant and well-structured. We train an LLM-based model to generate standardized, high-quality reports. To assess the generated reports, we propose a specialized evaluation metric (S-Score) that not only measures disease prediction accuracy but also evaluates the precision of disease-specific details, thus offering a clinically meaningful metric for report quality that focuses on elements critical to clinical decision-making and demonstrates a stronger alignment with human assessments. Our approach highlights the effectiveness of structured reports and the importance of a tailored evaluation metric for S-RRG, providing a more clinically relevant measure of report quality.

Adapting foundation models for rapid clinical response: intracerebral hemorrhage segmentation in emergency settings.

Gerbasi A, Mazzacane F, Ferrari F, Del Bello B, Cavallini A, Bellazzi R, Quaglini S

pubmed logopapersAug 3 2025
Intracerebral hemorrhage (ICH) is a medical emergency that demands rapid and accurate diagnosis for optimal patient management. Hemorrhagic lesions' segmentation on CT scans is a necessary first step for acquiring quantitative imaging data that are becoming increasingly useful in the clinical setting. However, traditional manual segmentation is time-consuming and prone to inter-rater variability, creating a need for automated solutions. This study introduces a novel approach combining advanced deep learning models to segment extensive and morphologically variable ICH lesions in non-contrast CT scans. We propose a two-step methodology that begins with a user-defined loose bounding box around the lesion, followed by a fine-tuned YOLOv8-S object detection model to generate precise, slice-specific bounding boxes. These bounding boxes are then used to prompt the Medical Segment Anything Model for accurate lesion segmentation. Our pipeline achieves high segmentation accuracy with minimal supervision, demonstrating strong potential as a practical alternative to task-specific models. We evaluated the model on a dataset of 252 CT scans demonstrating high performance in segmentation accuracy and robustness. Finally, the resulting segmentation tool is integrated into a user-friendly web application prototype, offering clinicians a simple interface for lesion identification and radiomic quantification.

Advances in renal cancer: diagnosis, treatment, and emerging technologies.

Saida T, Iima M, Ito R, Ueda D, Nishioka K, Kurokawa R, Kawamura M, Hirata K, Honda M, Takumi K, Ide S, Sugawara S, Watabe T, Sakata A, Yanagawa M, Sofue K, Oda S, Naganawa S

pubmed logopapersAug 2 2025
This review provides a comprehensive overview of current practices and recent advancements in the diagnosis and treatment of renal cancer. It introduces updates in histological classification and explains the imaging characteristics of each tumour based on these changes. The review highlights state-of-the-art imaging modalities, including magnetic resonance imaging, computed tomography, positron emission tomography, and ultrasound, emphasising their crucial role in tumour characterisation and optimising treatment planning. Emerging technologies, such as radiomics and artificial intelligence, are also discussed for their transformative impact on enhancing diagnostic precision, prognostic prediction, and personalised patient management. Furthermore, the review explores current treatment options, including minimally invasive techniques such as cryoablation, radiofrequency ablation, and stereotactic body radiation therapy, as well as systemic therapies such as immune checkpoint inhibitors and targeted therapies.

Multimodal Attention-Aware Fusion for Diagnosing Distal Myopathy: Evaluating Model Interpretability and Clinician Trust

Mohsen Abbaspour Onari, Lucie Charlotte Magister, Yaoxin Wu, Amalia Lupi, Dario Creazzo, Mattia Tordin, Luigi Di Donatantonio, Emilio Quaia, Chao Zhang, Isel Grau, Marco S. Nobile, Yingqian Zhang, Pietro Liò

arxiv logopreprintAug 2 2025
Distal myopathy represents a genetically heterogeneous group of skeletal muscle disorders with broad clinical manifestations, posing diagnostic challenges in radiology. To address this, we propose a novel multimodal attention-aware fusion architecture that combines features extracted from two distinct deep learning models, one capturing global contextual information and the other focusing on local details, representing complementary aspects of the input data. Uniquely, our approach integrates these features through an attention gate mechanism, enhancing both predictive performance and interpretability. Our method achieves a high classification accuracy on the BUSI benchmark and a proprietary distal myopathy dataset, while also generating clinically relevant saliency maps that support transparent decision-making in medical diagnosis. We rigorously evaluated interpretability through (1) functionally grounded metrics, coherence scoring against reference masks and incremental deletion analysis, and (2) application-grounded validation with seven expert radiologists. While our fusion strategy boosts predictive performance relative to single-stream and alternative fusion strategies, both quantitative and qualitative evaluations reveal persistent gaps in anatomical specificity and clinical usefulness of the interpretability. These findings highlight the need for richer, context-aware interpretability methods and human-in-the-loop feedback to meet clinicians' expectations in real-world diagnostic settings.

M4CXR: Exploring Multitask Potentials of Multimodal Large Language Models for Chest X-Ray Interpretation.

Park J, Kim S, Yoon B, Hyun J, Choi K

pubmed logopapersAug 1 2025
The rapid evolution of artificial intelligence, especially in large language models (LLMs), has significantly impacted various domains, including healthcare. In chest X-ray (CXR) analysis, previous studies have employed LLMs, but with limitations: either underutilizing the LLMs' capability for multitask learning or lacking clinical accuracy. This article presents M4CXR, a multimodal LLM designed to enhance CXR interpretation. The model is trained on a visual instruction-following dataset that integrates various task-specific datasets in a conversational format. As a result, the model supports multiple tasks such as medical report generation (MRG), visual grounding, and visual question answering (VQA). M4CXR achieves state-of-the-art clinical accuracy in MRG by employing a chain-of-thought (CoT) prompting strategy, in which it identifies findings in CXR images and subsequently generates corresponding reports. The model is adaptable to various MRG scenarios depending on the available inputs, such as single-image, multiimage, and multistudy contexts. In addition to MRG, M4CXR performs visual grounding at a level comparable to specialized models and demonstrates outstanding performance in VQA. Both quantitative and qualitative assessments reveal M4CXR's versatility in MRG, visual grounding, and VQA, while consistently maintaining clinical accuracy.

Natural language processing and LLMs in liver imaging: a practical review of clinical applications.

López-Úbeda P, Martín-Noguerol T, Luna A

pubmed logopapersAug 1 2025
Liver diseases pose a significant global health challenge due to their silent progression and high mortality. Proper interpretation of radiology reports is essential for the evaluation and management of these conditions but is limited by variability in reporting styles and the complexity of unstructured medical language. In this context, Natural Language Processing (NLP) techniques and Large Language Models (LLMs) have emerged as promising tools to extract relevant clinical information from unstructured liver radiology reports. This work reviews, from a practical point of view, the current state of NLP and LLM applications for liver disease classification, clinical feature extraction, diagnostic support, and staging from reports. It also discusses existing limitations, such as the need for high-quality annotated data, lack of explainability, and challenges in clinical integration. With responsible and validated implementation, these technologies have the potential to transform liver clinical management by enabling faster and more accurate diagnoses and optimizing radiology workflows, ultimately improving patient care in liver diseases.
Page 7 of 58574 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.