Sort by:
Page 4 of 81804 results

Q-FSRU: Quantum-Augmented Frequency-Spectral Fusion for Medical Visual Question Answering

Rakesh Thakur, Yusra Tariq

arxiv logopreprintAug 16 2025
Solving tough clinical questions that require both image and text understanding is still a major challenge in healthcare AI. In this work, we propose Q-FSRU, a new model that combines Frequency Spectrum Representation and Fusion (FSRU) with a method called Quantum Retrieval-Augmented Generation (Quantum RAG) for medical Visual Question Answering (VQA). The model takes in features from medical images and related text, then shifts them into the frequency domain using Fast Fourier Transform (FFT). This helps it focus on more meaningful data and filter out noise or less useful information. To improve accuracy and ensure that answers are based on real knowledge, we add a quantum-inspired retrieval system. It fetches useful medical facts from external sources using quantum-based similarity techniques. These details are then merged with the frequency-based features for stronger reasoning. We evaluated our model using the VQA-RAD dataset, which includes real radiology images and questions. The results showed that Q-FSRU outperforms earlier models, especially on complex cases needing image-text reasoning. The mix of frequency and quantum information improves both performance and explainability. Overall, this approach offers a promising way to build smart, clear, and helpful AI tools for doctors.

A prognostic model integrating radiomics and deep learning based on CT for survival prediction in laryngeal squamous cell carcinoma.

Jiang H, Xie K, Chen X, Ning Y, Yu Q, Lv F, Liu R, Zhou Y, Xia S, Peng J

pubmed logopapersAug 16 2025
Accurate prognostic prediction is crucial for patients with laryngeal squamous cell carcinoma (LSCC) to guide personalized treatment strategies. This study aimed to develop a comprehensive prognostic model leveraging clinical factors alongside radiomics and deep learning (DL) based on CT imaging to predict recurrence-free survival (RFS) in LSCC patients. We retrospectively enrolled 349 patients with LSCC from Center 1 (training set: n = 189; internal testing set: n = 82) and Center 2 (external testing set: n = 78). A combined model was developed using Cox regression analysis to predict RFS in LSCC patients by integrating independent clinical risk factors, radiomics score (RS), and deep learning score (DLS). Meanwhile, separate clinical, radiomics, and DL models were also constructed for comparison. Furthermore, the combined model was represented visually through a nomogram to provide personalized estimation of RFS, with its risk stratification capability evaluated using Kaplan-Meier analysis. The combined model achieved a higher C-index than did the clinical model, radiomics model, and DL model in the internal testing (0.810 vs. 0.634, 0.679, and 0.727, respectively) and external testing sets (0.742 vs. 0.602, 0.617, and 0.729, respectively). Additionally, following risk stratification via nomogram, patients in the low-risk group showed significantly higher survival probabilities compared to those in the high-risk group in the internal testing set [hazard ratio (HR) = 0.157, 95% confidence interval (CI): 0.063-0.392, p < 0.001] and external testing set (HR = 0.312, 95% CI: 0.137-0.711, p = 0.003). The proposed combined model demonstrated a reliable and accurate ability to predict RFS in patients with LSCC, potentially assisting in risk stratification.

TN5000: An Ultrasound Image Dataset for Thyroid Nodule Detection and Classification.

Zhang H, Liu Q, Han X, Niu L, Sun W

pubmed logopapersAug 16 2025
Accurate diagnosis of thyroid nodules using ultrasonography is a highly valuable, but challenging task. With the emergence of artificial intelligence, deep learning based methods can provide assistance to radiologists, whose performance heavily depends on the quantity and quality of training data, but current ultrasound image datasets for thyroid nodule either directly utilize the TI-RADS assessments as labels or are not publicly available. Faced with these issues, an open-access ultrasound image dataset for thyroid nodule detection and classification is proposed, i.e. the TN5000, which comprises 5,000 B-mode ultrasound images of thyroid nodule, as well as complete annotations and biopsy confirmations by expert radiologists. Additionally, the statistical characteristics of this proposed dataset have been analyzed clearly, some baseline methods for the detection and classification of thyroid nodules are recommended as the benchmark, along with their evaluation results. To our best knowledge, TN5000 is the largest open-access ultrasound image dataset of thyroid nodule with professional labeling, and is the first ultrasound image dataset designed both for the thyroid nodule detection and classification. These kinds of images with annotations can contribute to analyze the intrinsic properties of thyroid nodules and to determine the necessity of FNA biopsy, which are crucial in ultrasound diagnosis.

Comprehensive analysis of [<sup>18</sup>F]MFBG biodistribution normal patterns and variability in pediatric patients with neuroblastoma.

Wang P, Chen X, Yan X, Yan J, Yang S, Mao J, Li F, Su X

pubmed logopapersAug 15 2025
[<sup>18</sup>F]-meta-fluorobenzylguanidine ([<sup>18</sup>F]MFBG) PET/CT is a promising imaging modality for neural crest-derived tumors, particularly neuroblastoma. Accurate interpretation necessitates an understanding of normal biodistribution and variations in physiological uptake. This study aimed to systematically characterize the physiological distribution and variability of [<sup>18</sup>F]MFBG uptake in pediatric patients to enhance clinical interpretation and differentiate normal from pathological uptake. We retrospectively analyzed [<sup>18</sup>F]MFBG PET/CT scans from 169 pediatric neuroblastoma patients, including 20 in confirmed remission, for detailed biodistribution analysis. Organ uptake was quantified using both manual segmentation and deep learning(DL)-based automatic segmentation methods. Patterns of physiological uptake variants were categorized and illustrated using representative cases. [<sup>18</sup>F]MFBG demonstrated consistent physiological uptake in the salivary glands (SUVmax 9.8 ± 3.3), myocardium (7.1 ± 1.7), and adrenal glands (4.6 ± 0.9), with low activity in bone (0.6 ± 0.2) and muscle (0.8 ± 0.2). DL-based analysis confirmed uniform, mild uptake across vertebral and peripheral skeletal structures (SUVmean 0.47 ± 0.08). Three physiological liver uptake patterns were identified: uniform (43%), left-lobe predominant (31%), and marginal (26%). Asymmetric uptake in the pancreatic head, transient brown adipose tissue activity, gallbladder excretion, and symmetric epiphyseal uptake were also recorded. These variants were not associated with structural abnormalities or clinical recurrence and showed distinct patterns from pathological lesions. This study establishes a reference for normal [<sup>18</sup>F]MFBG biodistribution and physiological variants in children. Understanding these patterns is essential for accurate image interpretation and the avoidance of diagnostic pitfalls in pediatric neuroblastoma patients.

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Gertz RJ, Beste NC, Dratsch T, Lennartz S, Bremm J, Iuga AI, Bunck AC, Laukamp KR, Schönfeld M, Kottlors J

pubmed logopapersAug 15 2025
This study evaluates the efficiency, accuracy, and cost-effectiveness of radiology reporting using audio multimodal large language models (LLMs) compared to conventional reporting with speech recognition software. We hypothesized that providing minimal audio input would enable a multimodal LLM to generate complete radiological reports. 480 reports from 80 retrospective multimodal imaging studies were reported by two board-certified radiologists using three workflows: conventional workflow (C-WF) with speech recognition software to generate findings and impressions separately and LLM-based workflow (LLM-WF) using the state-of-the-art LLMs GPT-4o and Claude Sonnet 3.5. Outcome measures included reporting time, corrections and personnel cost per report. Two radiologists assessed formal structure and report quality. Statistical analysis used ANOVA and Tukey's post hoc tests (p < 0.05). LLM-WF significantly reduced reporting time (GPT-4o/Sonnet 3.5: 38.9 s ± 22.7 s vs. C-WF: 88.0 s ± 60.9 s, p < 0.01), required fewer corrections (GPT-4o: 1.0 ± 1.1, Sonnet 3.5: 0.9 ± 1.0 vs. C-WF: 2.4 ± 2.5, p < 0.01), and lowered costs (GPT-4o: $2.3 ± $1.4, Sonnet 3.5: $2.4 ± $1.4 vs. C-WF: $3.0 ± $2.1, p < 0.01). Reports generated with Sonnet 3.5 were rated highest in quality, while GPT-4o and conventional reports showed no difference. Multimodal LLMs can generate high-quality radiology reports based solely on minimal audio input, with greater speed, fewer corrections, and reduced costs compared to conventional speech-based workflows. However, future implementation may involve licensing costs, and generalizability to broader clinical contexts warrants further evaluation. Question Comparing time, accuracy, cost, and report quality of reporting using audio input functionality of GPT-4o and Claude Sonnet 3.5 to conventional reporting with speech recognition. Findings Large language models enable radiological reporting via minimal audio input, reducing turnaround time and costs without quality loss compared to conventional reporting with speech recognition. Clinical relevance Large language model-based reporting from minimal audio input has the potential to improve efficiency and report quality, supporting more streamlined workflows in clinical radiology.

UniDCF: A Foundation Model for Comprehensive Dentocraniofacial Hard Tissue Reconstruction

Chunxia Ren, Ning Zhu, Yue Lai, Gui Chen, Ruijie Wang, Yangyi Hu, Suyao Liu, Shuwen Mao, Hong Su, Yu Zhang, Li Xiao

arxiv logopreprintAug 15 2025
Dentocraniofacial hard tissue defects profoundly affect patients' physiological functions, facial aesthetics, and psychological well-being, posing significant challenges for precise reconstruction. Current deep learning models are limited to single-tissue scenarios and modality-specific imaging inputs, resulting in poor generalizability and trade-offs between anatomical fidelity, computational efficiency, and cross-tissue adaptability. Here we introduce UniDCF, a unified framework capable of reconstructing multiple dentocraniofacial hard tissues through multimodal fusion encoding of point clouds and multi-view images. By leveraging the complementary strengths of each modality and incorporating a score-based denoising module to refine surface smoothness, UniDCF overcomes the limitations of prior single-modality approaches. We curated the largest multimodal dataset, comprising intraoral scans, CBCT, and CT from 6,609 patients, resulting in 54,555 annotated instances. Evaluations demonstrate that UniDCF outperforms existing state-of-the-art methods in terms of geometric precision, structural completeness, and spatial accuracy. Clinical simulations indicate UniDCF reduces reconstruction design time by 99% and achieves clinician-rated acceptability exceeding 94%. Overall, UniDCF enables rapid, automated, and high-fidelity reconstruction, supporting personalized and precise restorative treatments, streamlining clinical workflows, and enhancing patient outcomes.

FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis

Ke Zou, Jocelyn Hui Lin Goh, Yukun Zhou, Tian Lin, Samantha Min Er Yew, Sahana Srinivasan, Meng Wang, Rui Santos, Gabor M. Somfai, Huazhu Fu, Haoyu Chen, Pearse A. Keane, Ching-Yu Cheng, Yih Chung Tham

arxiv logopreprintAug 15 2025
Foundation models (FMs) have shown great promise in medical image analysis by improving generalization across diverse downstream tasks. In ophthalmology, several FMs have recently emerged, but there is still no clear answer to fundamental questions: Which FM performs the best? Are they equally good across different tasks? What if we combine all FMs together? To our knowledge, this is the first study to systematically evaluate both single and fused ophthalmic FMs. To address these questions, we propose FusionFM, a comprehensive evaluation suite, along with two fusion approaches to integrate different ophthalmic FMs. Our framework covers both ophthalmic disease detection (glaucoma, diabetic retinopathy, and age-related macular degeneration) and systemic disease prediction (diabetes and hypertension) based on retinal imaging. We benchmarked four state-of-the-art FMs (RETFound, VisionFM, RetiZero, and DINORET) using standardized datasets from multiple countries and evaluated their performance using AUC and F1 metrics. Our results show that DINORET and RetiZero achieve superior performance in both ophthalmic and systemic disease tasks, with RetiZero exhibiting stronger generalization on external datasets. Regarding fusion strategies, the Gating-based approach provides modest improvements in predicting glaucoma, AMD, and hypertension. Despite these advances, predicting systemic diseases, especially hypertension in external cohort remains challenging. These findings provide an evidence-based evaluation of ophthalmic FMs, highlight the benefits of model fusion, and point to strategies for enhancing their clinical applicability.

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

Zhenhao Li, Long Yang, Xiaojie Yin, Haijun Yu, Jiazhou Wang, Hongbin Han, Weigang Hu, Yixing Huang

arxiv logopreprintAug 15 2025
Computed tomography (CT) is a cornerstone imaging modality for non-invasive, high-resolution visualization of internal anatomical structures. However, when the scanned object exceeds the scanner's field of view (FOV), projection data are truncated, resulting in incomplete reconstructions and pronounced artifacts near FOV boundaries. Conventional reconstruction algorithms struggle to recover accurate anatomy from such data, limiting clinical reliability. Deep learning approaches have been explored for FOV extension, with diffusion generative models representing the latest advances in image synthesis. Yet, conventional diffusion models are computationally demanding and slow at inference due to their iterative sampling process. To address these limitations, we propose an efficient CT FOV extension framework based on the image-to-image Schr\"odinger Bridge (I$^2$SB) diffusion model. Unlike traditional diffusion models that synthesize images from pure Gaussian noise, I$^2$SB learns a direct stochastic mapping between paired limited-FOV and extended-FOV images. This direct correspondence yields a more interpretable and traceable generative process, enhancing anatomical consistency and structural fidelity in reconstructions. I$^2$SB achieves superior quantitative performance, with root-mean-square error (RMSE) values of 49.8\,HU on simulated noisy data and 152.0HU on real data, outperforming state-of-the-art diffusion models such as conditional denoising diffusion probabilistic models (cDDPM) and patch-based diffusion methods. Moreover, its one-step inference enables reconstruction in just 0.19s per 2D slice, representing over a 700-fold speedup compared to cDDPM (135s) and surpassing diffusionGAN (0.58s), the second fastest. This combination of accuracy and efficiency makes I$^2$SB highly suitable for real-time or clinical deployment.

Artificial intelligence across the cancer care continuum.

Riaz IB, Khan MA, Osterman TJ

pubmed logopapersAug 15 2025
Artificial intelligence (AI) holds significant potential to enhance various aspects of oncology, spanning the cancer care continuum. This review provides an overview of current and emerging AI applications, from risk assessment and early detection to treatment and supportive care. AI-driven tools are being developed to integrate diverse data sources, including multi-omics and electronic health records, to improve cancer risk stratification and personalize prevention strategies. In screening and diagnosis, AI algorithms show promise in augmenting the accuracy and efficiency of medical image analysis and histopathology interpretation. AI also offers opportunities to refine treatment planning, optimize radiation therapy, and personalize systemic therapy selection. Furthermore, AI is explored for its potential to improve survivorship care by tailoring interventions and to enhance end-of-life care through improved symptom management and prognostic modeling. Beyond care delivery, AI augments clinical workflows, streamlines the dissemination of up-to-date evidence, and captures critical patient-reported outcomes for clinical decision support and outcomes assessment. However, the successful integration of AI into clinical practice requires addressing key challenges, including rigorous validation of algorithms, ensuring data privacy and security, and mitigating potential biases. Effective implementation necessitates interdisciplinary collaboration and comprehensive education for health care professionals. The synergistic interaction between AI and clinical expertise is crucial for realizing the potential of AI to contribute to personalized and effective cancer care. This review highlights the current state of AI in oncology and underscores the importance of responsible development and implementation.

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Mingzhe Hu, Zach Eidex, Shansong Wang, Mojtaba Safari, Qiang Li, Xiaofeng Yang

arxiv logopreprintAug 15 2025
Radiology, radiation oncology, and medical physics require decision-making that integrates medical images, textual reports, and quantitative data under high-stakes conditions. With the introduction of GPT-5, it is critical to assess whether recent advances in large multimodal models translate into measurable gains in these safety-critical domains. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks: (1) VQA-RAD, a benchmark for visual question answering in radiology; (2) SLAKE, a semantically annotated, multilingual VQA dataset testing cross-modal grounding; and (3) a curated Medical Physics Board Examination-style dataset of 150 multiple-choice questions spanning treatment planning, dosimetry, imaging, and quality assurance. Across all datasets, GPT-5 achieved the highest accuracy, with substantial gains over GPT-4o up to +20.00% in challenging anatomical regions such as the chest-mediastinal, +13.60% in lung-focused questions, and +11.44% in brain-tissue interpretation. On the board-style physics questions, GPT-5 attained 90.7% accuracy (136/150), exceeding the estimated human passing threshold, while GPT-4o trailed at 78.0%. These results demonstrate that GPT-5 delivers consistent and often pronounced performance improvements over GPT-4o in both image-grounded reasoning and domain-specific numerical problem-solving, highlighting its potential to augment expert workflows in medical imaging and therapeutic physics.
Page 4 of 81804 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.