Latest Papers on Radiology AI. Tags: Mixed Modality

Developing biomarkers and methods of risk stratification: Consensus statements from the International Kidney Cancer Symposium North America 2024 Think Tank.

Shapiro DD, Abel EJ, Albiges L, Battle D, Berg SA, Campbell MT, Cella D, Coleman K, Garmezy B, Geynisman DM, Hall T, Henske EP, Jonasch E, Karam JA, La Rosa S, Leibovich BC, Maranchie JK, Master VA, Maughan BL, McGregor BA, Msaouel P, Pal SK, Perez J, Plimack ER, Psutka SP, Riaz IB, Rini BI, Shuch B, Simon MC, Singer EA, Smith A, Staehler M, Tang C, Tannir NM, Vaishampayan U, Voss MH, Zakharia Y, Zhang Q, Zhang T, Carlo MI

•papers•Aug 16 2025

Accurate prognostication and personalized treatment selection remain major challenges in kidney cancer. This consensus initiative aimed to provide actionable expert guidance on the development and clinical integration of prognostic and predictive biomarkers and risk stratification tools to improve patient care and guide future research. A modified Delphi method was employed to develop consensus statements among a multidisciplinary panel of experts in urologic oncology, medical oncology, radiation oncology, pathology, molecular biology, radiology, outcomes research, biostatistics, industry, and patient advocacy. Over 3 rounds, including an in-person meeting 20 initial statements were evaluated, refined, and voted on. Consensus was defined a priori as a median Likert score ≥8. Nineteen final consensus statements were endorsed. These span key domains including biomarker prioritization (favoring prognostic biomarkers), rigorous methodology for subgroup and predictive analyses, the development of multi-institutional prospective registries, incorporation of biomarkers in trial design, and improvements in data/biospecimen access. The panel also identified high-priority biomarker types (e.g., AI-based image analysis, ctDNA) for future research. This is the first consensus statement specifically focused on biomarker and risk model development for kidney cancer using a structured Delphi process. The recommendations emphasize the need for rigorous methodology, collaborative infrastructure, prospective data collection, and focus on clinically translatable biomarkers. The resulting framework is intended to guide researchers, cooperative groups, and stakeholders in advancing personalized care for patients with kidney cancer.

Mixed Modality Classification Abdominal Review Concept Consortium Policy GenAI

Artificial intelligence across the cancer care continuum.

Riaz IB, Khan MA, Osterman TJ

•papers•Aug 15 2025

Artificial intelligence (AI) holds significant potential to enhance various aspects of oncology, spanning the cancer care continuum. This review provides an overview of current and emerging AI applications, from risk assessment and early detection to treatment and supportive care. AI-driven tools are being developed to integrate diverse data sources, including multi-omics and electronic health records, to improve cancer risk stratification and personalize prevention strategies. In screening and diagnosis, AI algorithms show promise in augmenting the accuracy and efficiency of medical image analysis and histopathology interpretation. AI also offers opportunities to refine treatment planning, optimize radiation therapy, and personalize systemic therapy selection. Furthermore, AI is explored for its potential to improve survivorship care by tailoring interventions and to enhance end-of-life care through improved symptom management and prognostic modeling. Beyond care delivery, AI augments clinical workflows, streamlines the dissemination of up-to-date evidence, and captures critical patient-reported outcomes for clinical decision support and outcomes assessment. However, the successful integration of AI into clinical practice requires addressing key challenges, including rigorous validation of algorithms, ensuring data privacy and security, and mitigating potential biases. Effective implementation necessitates interdisciplinary collaboration and comprehensive education for health care professionals. The synergistic interaction between AI and clinical expertise is crucial for realizing the potential of AI to contribute to personalized and effective cancer care. This review highlights the current state of AI in oncology and underscores the importance of responsible development and implementation.

Mixed Modality Classification Review In Silico Academic Lab Ethics

A Case Study on Colposcopy-Based Cervical Cancer Staging Reveals an Alarming Lack of Data Sharing Hindering the Adoption of Machine Learning in Clinical Practice

Schulz, M., Leha, A.

•preprint•Aug 15 2025

BackgroundThe inbuilt ability to adapt existing models to new applications has been one of the key drivers of the success of deep learning models. Thereby, sharing trained models is crucial for their adaptation to different populations and domains. Not sharing models prohibits validation and potentially following translation into clinical practice, and hinders scientific progress. In this paper we examine the current state of data and model sharing in the medical field using cervical cancer staging on colposcopy images as a case example. MethodsWe conducted a comprehensive literature search in PubMed to identify studies employing machine learning techniques in the analysis of colposcopy images. For studies where raw data was not directly accessible, we systematically inquired about accessing the pre-trained model weights and/or raw colposcopy image data by contacting the authors using various channels. ResultsWe included 46 studies and one publicly available dataset in our study. We retrieved data of the latter and inquired about data access for the 46 studies by contacting a total of 92 authors. We received 15 responses related to 14 studies (30%). The remaining 32 studies remained unresponsive (70%). Of the 15 responses received, two responses redirected our inquiry to other authors, two responses were initially pending, and 11 declined data sharing. Despite our follow-up efforts on all responses received, none of the inquiries led to actual data sharing (0%). The only available data source remained the publicly available dataset. ConclusionsDespite the long-standing demands for reproducible research and efforts to incentivize data sharing, such as the requirement of data availability statements, our case study reveals a persistent lack of data sharing culture. Reasons identified in this case study include a lack of resources to provide the data, data privacy concerns, ongoing trial registrations and low response rates to inquiries. Potential routes for improvement could include comprehensive data availability statements required by journals, data preparation and deposition in a repository as part of the publication process, an automatic maximal embargo time after which data will become openly accessible and data sharing rules set by funders.

Mixed Modality Classification Abdominal Review In Silico Academic Lab Reproducibility

UniDCF: A Foundation Model for Comprehensive Dentocraniofacial Hard Tissue Reconstruction

Chunxia Ren, Ning Zhu, Yue Lai, Gui Chen, Ruijie Wang, Yangyi Hu, Suyao Liu, Shuwen Mao, Hong Su, Yu Zhang, Li Xiao

•preprint•Aug 15 2025

Dentocraniofacial hard tissue defects profoundly affect patients' physiological functions, facial aesthetics, and psychological well-being, posing significant challenges for precise reconstruction. Current deep learning models are limited to single-tissue scenarios and modality-specific imaging inputs, resulting in poor generalizability and trade-offs between anatomical fidelity, computational efficiency, and cross-tissue adaptability. Here we introduce UniDCF, a unified framework capable of reconstructing multiple dentocraniofacial hard tissues through multimodal fusion encoding of point clouds and multi-view images. By leveraging the complementary strengths of each modality and incorporating a score-based denoising module to refine surface smoothness, UniDCF overcomes the limitations of prior single-modality approaches. We curated the largest multimodal dataset, comprising intraoral scans, CBCT, and CT from 6,609 patients, resulting in 54,555 annotated instances. Evaluations demonstrate that UniDCF outperforms existing state-of-the-art methods in terms of geometric precision, structural completeness, and spatial accuracy. Clinical simulations indicate UniDCF reduces reconstruction design time by 99% and achieves clinician-rated acceptability exceeding 94%. Overall, UniDCF enables rapid, automated, and high-fidelity reconstruction, supporting personalized and precise restorative treatments, streamlining clinical workflows, and enhancing patient outcomes.

Mixed Modality Reconstruction Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Mingzhe Hu, Zach Eidex, Shansong Wang, Mojtaba Safari, Qiang Li, Xiaofeng Yang

•preprint•Aug 15 2025

Radiology, radiation oncology, and medical physics require decision-making that integrates medical images, textual reports, and quantitative data under high-stakes conditions. With the introduction of GPT-5, it is critical to assess whether recent advances in large multimodal models translate into measurable gains in these safety-critical domains. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks: (1) VQA-RAD, a benchmark for visual question answering in radiology; (2) SLAKE, a semantically annotated, multilingual VQA dataset testing cross-modal grounding; and (3) a curated Medical Physics Board Examination-style dataset of 150 multiple-choice questions spanning treatment planning, dosimetry, imaging, and quality assurance. Across all datasets, GPT-5 achieved the highest accuracy, with substantial gains over GPT-4o up to +20.00% in challenging anatomical regions such as the chest-mediastinal, +13.60% in lung-focused questions, and +11.44% in brain-tissue interpretation. On the board-style physics questions, GPT-5 attained 90.7% accuracy (136/150), exceeding the estimated human passing threshold, while GPT-4o trailed at 78.0%. These results demonstrate that GPT-5 delivers consistent and often pronounced performance improvements over GPT-4o in both image-grounded reasoning and domain-specific numerical problem-solving, highlighting its potential to augment expert workflows in medical imaging and therapeutic physics.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA GenAI

Aphasia severity prediction using a multi-modal machine learning approach.

Hu X, Varkanitsa M, Kropp E, Betke M, Ishwar P, Kiran S

•papers•Aug 15 2025

The present study examined an integrated multiple neuroimaging modality (T1 structural, Diffusion Tensor Imaging (DTI), and resting-state FMRI (rsFMRI)) to predict aphasia severity using Western Aphasia Battery-Revised Aphasia Quotient (WAB-R AQ) in 76 individuals with post-stroke aphasia. We employed Support Vector Regression (SVR) and Random Forest (RF) models with supervised feature selection and a stacked feature prediction approach. The SVR model outperformed RF, achieving an average root mean square error (RMSE) of 16.38±5.57, Pearson's correlation coefficient (r) of 0.70±0.13, and mean absolute error (MAE) of 12.67±3.27, compared to RF's RMSE of 18.41±4.34, r of 0.66±0.15, and MAE of 14.64±3.04. Resting-state neural activity and structural integrity emerged as crucial predictors of aphasia severity, appearing in the top 20% of predictor combinations for both SVR and RF. Finally, the feature selection method revealed that functional connectivity in both hemispheres and between homologous language areas is critical for predicting language outcomes in patients with aphasia. The statistically significant difference in performance between the model using only single modality and the optimal multi-modal SVR/RF model (which included both resting-state connectivity and structural information) underscores that aphasia severity is influenced by factors beyond lesion location and volume. These findings suggest that integrating multiple neuroimaging modalities enhances the prediction of language outcomes in aphasia beyond lesion characteristics alone, offering insights that could inform personalized rehabilitation strategies.

Mixed Modality Registration Neurological Retrospective Clinical In Silico Academic Lab GenAI

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Rahman MM, Masry ME, Gnyawali SC, Xue Y, Gordillo G, Wachs JP

•papers•Aug 15 2025

Burn injuries represent a significant clinical challenge due to the complexity of accurately assessing burn depth, which directly influences the course of treatment and patient outcomes. Traditional diagnostic methods primarily rely on visual inspection by experienced burn surgeons. Studies report diagnostic accuracies of around 76% for experts, dropping to nearly 50% for less experienced clinicians. Such inaccuracies can result in suboptimal clinical decisions-delaying vital surgical interventions in severe cases or initiating unnecessary treatments for superficial burns. This diagnostic variability not only compromises patient care but also strains health care resources and increases the likelihood of adverse outcomes. Hence, a more consistent and precise approach to burn classification is urgently needed. The objective is to determine whether a multimodal integrated artificial intelligence (AI) system for accurate classification of burn depth can preserve diagnostic accuracy and provide an important resource when used as part of the electronic medical record (EMR). This study used a novel multimodal AI system, integrating digital photographs and ultrasound tissue Doppler imaging (TDI) data to accurately assess burn depth. These imaging modalities were accessed and processed through an EMR system, enabling real-time data retrieval and AI-assisted evaluation. TDI was instrumental in evaluating the biomechanical properties of subcutaneous tissues, using color-coded images to identify burn-induced changes in tissue stiffness and elasticity. The collected imaging data were uploaded to the EMR system (DrChrono), where they were processed by a vision-language model built on GPT-4 architecture. This model received expert-formulated prompts describing how to interpret both digital and TDI images, guiding the AI in making explainable classifications. This study evaluated whether a multimodal AI classifier, designed to identify first-, second-, and third-degree burns, could be effectively applied to imaging data stored within an EMR system. The classifier achieved an overall accuracy of 84.38%, significantly surpassing human performance benchmarks typically cited in the literature. This highlights the potential of the AI model to serve as a robust clinical decision support tool, especially in settings lacking highly specialized expertise. In addition to accuracy, the classifier demonstrated strong performance across multiple evaluation metrics. The classifier's ability to distinguish between burn severities was further validated by the area under the receiver operating characteristic: 0.97 for first-degree, 0.96 for second-degree, and a perfect 1.00 for third-degree burns, each with narrow 95% CIs. The storage of multimodal imaging data within the EMR, along with the ability for post hoc analysis by AI algorithms, offers significant advancements in burn care, enabling real-time burn depth prediction on currently available data. Using digital photos for superficial burns, easily diagnosed through physical examinations, reduces reliance on TDI, while TDI helps distinguish deep second- and third-degree burns, enhancing diagnostic efficiency.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Gertz RJ, Beste NC, Dratsch T, Lennartz S, Bremm J, Iuga AI, Bunck AC, Laukamp KR, Schönfeld M, Kottlors J

•papers•Aug 15 2025

This study evaluates the efficiency, accuracy, and cost-effectiveness of radiology reporting using audio multimodal large language models (LLMs) compared to conventional reporting with speech recognition software. We hypothesized that providing minimal audio input would enable a multimodal LLM to generate complete radiological reports. 480 reports from 80 retrospective multimodal imaging studies were reported by two board-certified radiologists using three workflows: conventional workflow (C-WF) with speech recognition software to generate findings and impressions separately and LLM-based workflow (LLM-WF) using the state-of-the-art LLMs GPT-4o and Claude Sonnet 3.5. Outcome measures included reporting time, corrections and personnel cost per report. Two radiologists assessed formal structure and report quality. Statistical analysis used ANOVA and Tukey's post hoc tests (p < 0.05). LLM-WF significantly reduced reporting time (GPT-4o/Sonnet 3.5: 38.9 s ± 22.7 s vs. C-WF: 88.0 s ± 60.9 s, p < 0.01), required fewer corrections (GPT-4o: 1.0 ± 1.1, Sonnet 3.5: 0.9 ± 1.0 vs. C-WF: 2.4 ± 2.5, p < 0.01), and lowered costs (GPT-4o: $2.3 ± $1.4, Sonnet 3.5: $2.4 ± $1.4 vs. C-WF: $3.0 ± $2.1, p < 0.01). Reports generated with Sonnet 3.5 were rated highest in quality, while GPT-4o and conventional reports showed no difference. Multimodal LLMs can generate high-quality radiology reports based solely on minimal audio input, with greater speed, fewer corrections, and reduced costs compared to conventional speech-based workflows. However, future implementation may involve licensing costs, and generalizability to broader clinical contexts warrants further evaluation. Question Comparing time, accuracy, cost, and report quality of reporting using audio input functionality of GPT-4o and Claude Sonnet 3.5 to conventional reporting with speech recognition. Findings Large language models enable radiological reporting via minimal audio input, reducing turnaround time and costs without quality loss compared to conventional reporting with speech recognition. Clinical relevance Large language model-based reporting from minimal audio input has the potential to improve efficiency and report quality, supporting more streamlined workflows in clinical radiology.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Recommendations for the use of functional medical imaging in the management of cancer of the cervix in New Zealand: a rapid review.

Feng S, Mdletshe S

•papers•Aug 15 2025

We aimed to review the role of functional imaging in cervical cancer to underscore its significance in the diagnosis and management of cervical cancer and in improving patient outcomes. This rapid literature review targeting the clinical guidelines for functional imaging in cervical cancer sourced literature from 2017 to 2023 using PubMed, Google Scholar, MEDLINE and Scopus. Keywords such as cervical cancer, cervical neoplasms, functional imaging, stag*, treatment response, monitor* and New Zealand or NZ were used with Boolean operators to maximise results. Emphasis was on English full research studies pertinent to New Zealand. The study quality of the reviewed articles was assessed using the Joanna Briggs Institute critical appraisal checklists. The search yielded a total of 21 papers after all duplicates and yields that did not meet the inclusion criteria were excluded. Only one paper was found to incorporate the New Zealand context. The papers reviewed yielded results that demonstrate the important role of functional imaging in cervical cancer diagnosis, staging and treatment response monitoring. Techniques such as dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI), diffusion-weighted magnetic resonance imaging (DW-MRI), computed tomography perfusion (CTP) and positron emission tomography computed tomography (PET/CT) provide deep insights into tumour behaviour, facilitating personalised care. Integration of artificial intelligence in image analysis promises increased accuracy of these modalities. Functional imaging could play a significant role in a unified approach in New Zealand to improve patient outcomes for cervical cancer management. Therefore, this study advocates for New Zealand's medical sector to harness functional imaging's potential in cervical cancer management.

Mixed Modality Classification Abdominal Review Concept Policy

Automating the Referral of Bone Metastases Patients With and Without the Use of Large Language Models.

Sangwon KL, Han X, Becker A, Zhang Y, Ni R, Zhang J, Alber DA, Alyakin A, Nakatsuka M, Fabbri N, Aphinyanaphongs Y, Yang JT, Chachoua A, Kondziolka D, Laufer I, Oermann EK

•papers•Aug 15 2025

Bone metastases, affecting more than 4.8% of patients with cancer annually, and particularly spinal metastases require urgent intervention to prevent neurological complications. However, the current process of manually reviewing radiological reports leads to potential delays in specialist referrals. We hypothesized that natural language processing (NLP) review of routine radiology reports could automate the referral process for timely multidisciplinary care of spinal metastases. We assessed 3 NLP models-a rule-based regular expression (RegEx) model, GPT-4, and a specialized Bidirectional Encoder Representations from Transformers (BERT) model (NYUTron)-for automated detection and referral of bone metastases. Study inclusion criteria targeted patients with active cancer diagnoses who underwent advanced imaging (computed tomography, MRI, or positron emission tomography) without previous specialist referral. We defined 2 separate tasks: task of identifying clinically significant bone metastatic terms (lexical detection), and identifying cases needing a specialist follow-up (clinical referral). Models were developed using 3754 hand-labeled advanced imaging studies in 2 phases: phase 1 focused on spine metastases, and phase 2 generalized to bone metastases. Standard McRae's line performance metrics were evaluated and compared across all stages and tasks. In the lexical detection, a simple RegEx achieved the highest performance (sensitivity 98.4%, specificity 97.6%, F1 = 0.965), followed by NYUTron (sensitivity 96.8%, specificity 89.9%, and F1 = 0.787). For the clinical referral task, RegEx also demonstrated superior performance (sensitivity 92.3%, specificity 87.5%, and F1 = 0.936), followed by a fine-tuned NYUTron model (sensitivity 90.0%, specificity 66.7%, and F1 = 0.750). An NLP-based automated referral system can accurately identify patients with bone metastases requiring specialist evaluation. A simple RegEx model excels in syntax-based identification and expert-informed rule generation for efficient referral patient recommendation in comparison with advanced NLP models. This system could significantly reduce missed follow-ups and enhance timely intervention for patients with bone metastases.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Filter Papers

Tags

Developing biomarkers and methods of risk stratification: Consensus statements from the International Kidney Cancer Symposium North America 2024 Think Tank.

Artificial intelligence across the cancer care continuum.

A Case Study on Colposcopy-Based Cervical Cancer Staging Reveals an Alarming Lack of Data Sharing Hindering the Adoption of Machine Learning in Clinical Practice

UniDCF: A Foundation Model for Comprehensive Dentocraniofacial Hard Tissue Reconstruction

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Aphasia severity prediction using a multi-modal machine learning approach.

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Recommendations for the use of functional medical imaging in the management of cancer of the cervix in New Zealand: a rapid review.

Automating the Referral of Bone Metastases Patients With and Without the Use of Large Language Models.

Ready to Sharpen Your Edge?