Sort by:
Page 72 of 1611610 results

RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering.

Tayebi Arasteh S, Lotfinia M, Bressem K, Siepmann R, Adams L, Ferber D, Kuhl C, Kather JN, Nebelung S, Truhn D

pubmed logopapersJun 18 2025
<i>"Just Accepted" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Purpose To evaluate diagnostic accuracy of various large language models (LLMs) when answering radiology-specific questions with and without access to additional online, up-to-date information via retrieval-augmented generation (RAG). Materials and Methods The authors developed Radiology RAG (RadioRAG), an end-to-end framework that retrieves data from authoritative radiologic online sources in real-time. RAG incorporates information retrieval from external sources to supplement the initial prompt, grounding the model's response in relevant information. Using 80 questions from the RSNA Case Collection across radiologic subspecialties and 24 additional expert-curated questions with reference standard answers, LLMs (GPT-3.5-turbo, GPT-4, Mistral-7B, Mixtral-8 × 7B, and Llama3 [8B and 70B]) were prompted with and without RadioRAG in a zero-shot inference scenario (temperature ≤ 0.1, top- <i>P</i> = 1). RadioRAG retrieved context-specific information from www.radiopaedia.org. Accuracy of LLMs with and without RadioRAG in answering questions from each dataset was assessed. Statistical analyses were performed using bootstrapping while preserving pairing. Additional assessments included comparison of model with human performance and comparison of time required for conventional versus RadioRAG-powered question answering. Results RadioRAG improved accuracy for some LLMs, including GPT-3.5-turbo [74% (59/80) versus 66% (53/80), FDR = 0.03] and Mixtral-8 × 7B [76% (61/80) versus 65% (52/80), FDR = 0.02] on the RSNA-RadioQA dataset, with similar trends in the ExtendedQA dataset. Accuracy exceeded (FDR ≤ 0.007) that of a human expert (63%, (50/80)) for these LLMs, while not for Mistral-7B-instruct-v0.2, Llama3-8B, and Llama3-70B (FDR ≥ 0.21). RadioRAG reduced hallucinations for all LLMs (rates from 6-25%). RadioRAG increased estimated response time fourfold. Conclusion RadioRAG shows potential to improve LLM accuracy and factuality in radiology question answering by integrating real-time domain-specific data. ©RSNA, 2025.

Dual-scan self-learning denoising for application in ultralow-field MRI.

Zhang Y, He W, Wu J, Xu Z

pubmed logopapersJun 18 2025
This study develops a self-learning method to denoise MR images for use in ultralow field (ULF) applications. We propose use of a self-learning neural network for denoising 3D MRI obtained from two acquisitions (dual scan), which are utilized as training pairs. Based on the self-learning method Noise2Noise, an effective data augmentation method and integrated learning strategy for enhancing model performance are proposed. Experimental results demonstrate that (1) the proposed model can produce exceptional denoising results and outperform the traditional Noise2Noise method subjectively and objectively; (2) magnitude images can be effectively denoised comparing with several state-of-the-art methods on synthetic and real ULF data; and (3) the proposed method can yield better results on phase images and quantitative imaging applications than other denoisers due to the self-learning framework. Theoretical and experimental implementations show that the proposed self-learning model achieves improved performance on magnitude image denoising with synthetic and real-world data at ULF. Additionally, we test our method on calculated phase and quantification images, demonstrating its superior performance over several contrastive methods.

RECIST<sup>Surv</sup>: Hybrid Multi-task Transformer for Hepatocellular Carcinoma Response and Survival Evaluation.

Jiao R, Liu Q, Zhang Y, Pu B, Xue B, Cheng Y, Yang K, Liu X, Qu J, Jin C, Zhang Y, Wang Y, Zhang YD

pubmed logopapersJun 18 2025
Transarterial Chemoembolization (TACE) is a widely applied alternative treatment for patients with hepatocellular carcinoma who are not eligible for liver resection or transplantation. However, the clinical outcomes after TACE are highly heterogeneous. There remains an urgent need for effective and efficient strategies to accurately assess tumor response and predict long-term outcomes using longitudinal and multi-center datasets. To address this challenge, we here introduce RECIST<sup>Surv</sup>, a novel response-driven Transformer model that integrates multi-task learning with a response-driven co-attention mechanism to simultaneously perform liver and tumor segmentation, predict tumor response to TACE, and estimate overall survival based on longitudinal Computed Tomography (CT) imaging. The proposed Response-driven Co-attention layer models the interactions between pre-TACE and post-TACE features guided by the treatment response embedding. This design enables the model to capture complex relationships between imaging features, treatment response, and survival outcomes, thereby enhancing both prediction accuracy and interpretability. In a multi-center validation study, RECIST<sup>Surv</sup>-predicted prognosis has demonstrated superior precision than state-of-the-art methods with C-indexes ranging from 0.595 to 0.780. Furthermore, when integrated with multi-modal data, RECIST<sup>Surv</sup> has emerged as an independent prognostic factor in all three validation cohorts, with hazard ratio (HR) ranging from 1.693 to 20.7 (P = 0.001-0.042). Our results highlight the potential of RECIST<sup>Surv</sup> as a powerful tool for personalized treatment planning and outcome prediction in hepatocellular carcinoma patients undergoing TACE. The experimental code is made publicly available at https://github.com/rushier/RECISTSurv.

Quality control system for patient positioning and filling in meta-information for chest X-ray examinations.

Borisov AA, Semenov SS, Kirpichev YS, Arzamasov KM, Omelyanskaya OV, Vladzymyrskyy AV, Vasilev YA

pubmed logopapersJun 18 2025
During radiography, irregularities occur, leading to decrease in the diagnostic value of the images obtained. The purpose of this work was to develop a system for automated quality assurance of patient positioning in chest radiographs, with detection of suboptimal contrast, brightness, and metadata errors. The quality assurance system was trained and tested using more than 69,000 X-rays of the chest and other anatomical areas from the Unified Radiological Information Service (URIS) and several open datasets. Our dataset included studies regardless of a patient's gender and race, while the sole exclusion criterion being age below 18 years. A training dataset of radiographs labeled by expert radiologists was used to train an ensemble of modified deep convolutional neural networks architectures ResNet152V2 and VGG19 to identify various quality deficiencies. Model performance was accessed using area under the receiver operating characteristic curve (ROC-AUC), precision, recall, F1-score, and accuracy metrics. Seven neural network models were trained to classify radiographs by the following quality deficiencies: failure to capture the target anatomic region, chest rotation, suboptimal brightness, incorrect anatomical area, projection errors, and improper photometric interpretation. All metrics for each model exceed 95%, indicating high predictive value. All models were combined into a unified system for evaluating radiograph quality. The processing time per image is approximately 3 s. The system supports multiple use cases: integration into an automated radiographic workstations, external quality assurance system for radiology departments, acquisition quality audits for municipal health systems, and routing of studies to diagnostic AI models.

Sex, stature, and age estimation from skull using computed tomography images: Current status, challenges, and future perspectives.

Du Z, Navic P, Mahakkanukrauh P

pubmed logopapersJun 18 2025
The skull has long been recognized and utilized in forensic investigations, evolving from basic to complex analyses with modern technologies. Advances in radiology and technology have enhanced the ability to analyze biological identifiers-sex, stature, and age at death-from the skull. The use of computed tomography imaging helps practitioners to improve the accuracy and reliability of forensic analyses. Recently, artificial intelligence has increasingly been applied in digital forensic investigations to estimate sex, stature, and age from computed tomography images. The integration of artificial intelligence represents a significant shift in multidisciplinary collaboration, offering the potential for more accurate and reliable identification, along with advancements in academia. However, it is not yet fully developed for routine forensic work, as it remains largely in the research and development phase. Additionally, the limitations of artificial intelligence systems, such as the lack of transparency in algorithms, accountability for errors, and the potential for discrimination, must still be carefully considered. Based on scientific publications from the past decade, this article aims to provide an overview of the application of computed tomography imaging in estimating sex, stature, and age from the skull and to address issues related to future directions to further improvement.

Identification, characterisation and outcomes of pre-atrial fibrillation in heart failure with reduced ejection fraction.

Helbitz A, Nadarajah R, Mu L, Larvin H, Ismail H, Wahab A, Thompson P, Harrison P, Harris M, Joseph T, Plein S, Petrie M, Metra M, Wu J, Swoboda P, Gale CP

pubmed logopapersJun 18 2025
Atrial fibrillation (AF) in heart failure with reduced ejection fraction (HFrEF) has prognostic implications. Using a machine learning algorithm (FIND-AF), we aimed to explore clinical events and the cardiac magnetic resonance (CMR) characteristics of the pre-AF phenotype in HFrEF. A cohort of individuals aged ≥18 years with HFrEF without AF from the MATCH 1 and MATCH 2 studies (2018-2024) stratified by FIND-AF score. All received cardiac magnetic resonance using Cvi42 software for volumetric and T1/T2. The primary outcome was time to a composite of MACE inclusive of heart failure hospitalisation, myocardial infarction, stroke and all-cause mortality. Secondary outcomes included the association between CMR findings and FIND-AF score. Of 385 patients [mean age 61.7 (12.6) years, 39.0% women] with a median 2.5 years follow-up, the primary outcome occurred in 58 (30.2%) patients in the high FIND-AF risk group and 23 (11.9%) in the low FIND-AF risk group (hazard ratio 3.25, 95% CI 2.00-5.28, P < 0.001). Higher FIND-AF score was associated with higher indexed left ventricular mass (β = 4.7, 95% CI 0.5-8.9), indexed left atrial volume (β = 5.9, 95% CI 2.2-9.6), higher indexed left ventricular end-diastolic volume (β = 9.55, 95% CI 1.37-17.74, P = 0.022), native T1 signal (β = 18.0, 95% CI 7.0-29.1) and extracellular volume (β = 1.6, 95% CI 0.6-2.5). A pre-AF HFrEF subgroup with distinct CMR characteristics and poor prognosis may be identified, potentially guiding interventions to reduce clinical events.

Deep Learning-Based Adrenal Gland Volumetry for the Prediction of Diabetes.

Ku EJ, Yoon SH, Park SS, Yoon JW, Kim JH

pubmed logopapersJun 18 2025
The long-term association between adrenal gland volume (AGV) and type 2 diabetes (T2D) remains unclear. We aimed to determine the association between deep learning-based AGV and current glycemic status and incident T2D. In this observational study, adults who underwent abdominopelvic computed tomography (CT) for health checkups (2011-2012), but had no adrenal nodules, were included. AGV was measured from CT images using a three-dimensional nnU-Net deep learning algorithm. We assessed the association between AGV and T2D using a cross-sectional and longitudinal design. We used 500 CT scans (median age, 52.3 years; 253 men) for model development and a Multi-Atlas Labeling Beyond the Cranial Vault dataset for external testing. A clinical cohort included a total of 9708 adults (median age, 52.0 years; 5,769 men). The deep learning model demonstrated a dice coefficient of 0.71±0.11 for adrenal segmentation and a mean volume difference of 0.6± 0.9 mL in the external dataset. Participants with T2D at baseline had a larger AGV than those without (7.3 cm3 vs. 6.7 cm3 and 6.3 cm3 vs. 5.5 cm3 for men and women, respectively, all P<0.05). The optimal AGV cutoff values for predicting T2D were 7.2 cm3 in men and 5.5 cm3 in women. Over a median 7.0-year follow-up, T2D developed in 938 participants. Cumulative T2D risk was accentuated with high AGV compared with low AGV (adjusted hazard ratio, 1.27; 95% confidence interval, 1.11 to 1.46). AGV, measured using deep learning algorithms, is associated with current glycemic status and can significantly predict the development of T2D.

Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction

Vincent Roca, Marc Tommasi, Paul Andrey, Aurélien Bellet, Markus D. Schirmer, Hilde Henon, Laurent Puy, Julien Ramon, Grégory Kuchcinski, Martin Bretzner, Renaud Lopes

arxiv logopreprintJun 18 2025
$\textbf{Objective:}$ Brain-predicted age difference (BrainAGE) is a neuroimaging biomarker reflecting brain health. However, training robust BrainAGE models requires large datasets, often restricted by privacy concerns. This study evaluates the performance of federated learning (FL) for BrainAGE estimation in ischemic stroke patients treated with mechanical thrombectomy, and investigates its association with clinical phenotypes and functional outcomes. $\textbf{Methods:}$ We used FLAIR brain images from 1674 stroke patients across 16 hospital centers. We implemented standard machine learning and deep learning models for BrainAGE estimates under three data management strategies: centralized learning (pooled data), FL (local training at each site), and single-site learning. We reported prediction errors and examined associations between BrainAGE and vascular risk factors (e.g., diabetes mellitus, hypertension, smoking), as well as functional outcomes at three months post-stroke. Logistic regression evaluated BrainAGE's predictive value for these outcomes, adjusting for age, sex, vascular risk factors, stroke severity, time between MRI and arterial puncture, prior intravenous thrombolysis, and recanalisation outcome. $\textbf{Results:}$ While centralized learning yielded the most accurate predictions, FL consistently outperformed single-site models. BrainAGE was significantly higher in patients with diabetes mellitus across all models. Comparisons between patients with good and poor functional outcomes, and multivariate predictions of these outcomes showed the significance of the association between BrainAGE and post-stroke recovery. $\textbf{Conclusion:}$ FL enables accurate age predictions without data centralization. The strong association between BrainAGE, vascular risk factors, and post-stroke recovery highlights its potential for prognostic modeling in stroke care.

Automated MRI Tumor Segmentation using hybrid U-Net with Transformer and Efficient Attention

Syed Haider Ali, Asrar Ahmad, Muhammad Ali, Asifullah Khan, Muhammad Shahban, Nadeem Shaukat

arxiv logopreprintJun 18 2025
Cancer is an abnormal growth with potential to invade locally and metastasize to distant organs. Accurate auto-segmentation of the tumor and surrounding normal tissues is required for radiotherapy treatment plan optimization. Recent AI-based segmentation models are generally trained on large public datasets, which lack the heterogeneity of local patient populations. While these studies advance AI-based medical image segmentation, research on local datasets is necessary to develop and integrate AI tumor segmentation models directly into hospital software for efficient and accurate oncology treatment planning and execution. This study enhances tumor segmentation using computationally efficient hybrid UNet-Transformer models on magnetic resonance imaging (MRI) datasets acquired from a local hospital under strict privacy protection. We developed a robust data pipeline for seamless DICOM extraction and preprocessing, followed by extensive image augmentation to ensure model generalization across diverse clinical settings, resulting in a total dataset of 6080 images for training. Our novel architecture integrates UNet-based convolutional neural networks with a transformer bottleneck and complementary attention modules, including efficient attention, Squeeze-and-Excitation (SE) blocks, Convolutional Block Attention Module (CBAM), and ResNeXt blocks. To accelerate convergence and reduce computational demands, we used a maximum batch size of 8 and initialized the encoder with pretrained ImageNet weights, training the model on dual NVIDIA T4 GPUs via checkpointing to overcome Kaggle's runtime limits. Quantitative evaluation on the local MRI dataset yielded a Dice similarity coefficient of 0.764 and an Intersection over Union (IoU) of 0.736, demonstrating competitive performance despite limited data and underscoring the importance of site-specific model development for clinical deployment.

NERO: Explainable Out-of-Distribution Detection with Neuron-level Relevance

Anju Chhetri, Jari Korhonen, Prashnna Gyawali, Binod Bhattarai

arxiv logopreprintJun 18 2025
Ensuring reliability is paramount in deep learning, particularly within the domain of medical imaging, where diagnostic decisions often hinge on model outputs. The capacity to separate out-of-distribution (OOD) samples has proven to be a valuable indicator of a model's reliability in research. In medical imaging, this is especially critical, as identifying OOD inputs can help flag potential anomalies that might otherwise go undetected. While many OOD detection methods rely on feature or logit space representations, recent works suggest these approaches may not fully capture OOD diversity. To address this, we propose a novel OOD scoring mechanism, called NERO, that leverages neuron-level relevance at the feature layer. Specifically, we cluster neuron-level relevance for each in-distribution (ID) class to form representative centroids and introduce a relevance distance metric to quantify a new sample's deviation from these centroids, enhancing OOD separability. Additionally, we refine performance by incorporating scaled relevance in the bias term and combining feature norms. Our framework also enables explainable OOD detection. We validate its effectiveness across multiple deep learning architectures on the gastrointestinal imaging benchmarks Kvasir and GastroVision, achieving improvements over state-of-the-art OOD detection methods.
Page 72 of 1611610 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.