Latest Papers on Radiology AI. Tags: GenAI

Clinically Interpretable Deep Learning via Sparse BagNets for Epiretinal Membrane and Related Pathology Detection

Ofosu Mensah, S., Neubauer, J., Ayhan, M. S., Djoumessi Donteu, K. R., Koch, L. M., Uzel, M. M., Gelisken, F., Berens, P.

•preprint•Jun 6 2025

Epiretinal membrane (ERM) is a vitreoretinal interface disease that, if not properly addressed, can lead to vision impairment and negatively affect quality of life. For ERM detection and treatment planning, Optical Coherence Tomography (OCT) has become the primary imaging modality, offering non-invasive, high-resolution cross-sectional imaging of the retina. Deep learning models have also led to good ERM detection performance on OCT images. Nevertheless, most deep learning models cannot be easily understood by clinicians, which limits their acceptance in clinical practice. Post-hoc explanation methods have been utilised to support the uptake of models, albeit, with partial success. In this study, we trained a sparse BagNet model, an inherently interpretable deep learning model, to detect ERM in OCT images. It performed on par with a comparable black-box model and generalised well to external data. In a multitask setting, it also accurately predicted other changes related to the ERM pathophysiology. Through a user study with ophthalmologists, we showed that the visual explanations readily provided by the sparse BagNet model for its decisions are well-aligned with clinical expertise. We propose potential directions for clinical implementation of the sparse BagNet model to guide clinical decisions in practice.

OCT Classification Methodology In Silico Academic Lab GenAI

Dual-stage AI system for Pathologist-Free Tumor Detectionand subtyping in Oral Squamous Cell Carcinoma

Chaudhary, N., Muddemanavar, P., Singh, D. K., Rai, A., Mishra, D., SV, S., Augustine, J., Chandra, A., Chaurasia, A., Ahmad, T.

•preprint•Jun 6 2025

BackgroundAccurate histological grading of oral squamous cell carcinoma (OSCC) is critical for prognosis and treatment planning. Current methods lack automation for OSCC detection, subtyping, and differentiation from high-risk pre-malignant conditions like oral submucous fibrosis (OSMF). Further, analysis of whole-slide image (WSI) analysis is time-consuming and variable, limiting consistency. We present a clinically relevant deep learning framework that leverages weakly supervised learning and attention-based multiple instance learning (MIL) to enable automated OSCC grading and early prediction of malignant transformation from OSMF. MethodsWe conducted a multi-institutional retrospective cohort study using a curated dataset of 1,925 whole-slide images (WSIs), including 1,586 OSCC cases stratified into well-, moderately-, and poorly-differentiated subtypes (WD, MD, and PD), 128 normal controls, and 211 OSMF and OSMF with OSCC cases. We developed a two-stage deep learning pipeline named OralPatho. In stage one, an attention-based multiple instance learning (MIL) model was trained to perform binary classification (normal vs OSCC). In stage two, a gated attention mechanism with top-K patch selection was employed to classify the OSCC subtypes. Model performance was assessed using stratified 3-fold cross-validation and external validation on an independent dataset. FindingsThe binary classifier demonstrated robust performance with a mean F1-score exceeding 0.93 across all validation folds. The multiclass model achieved consistent macro-F1 scores of 0.72, 0.70, and 0.68, along with AUCs of 0.79 for WD, 0.71 for MD, and 0.61 for PD OSCC subtypes. Model generalizability was validated using an independent external dataset. Attention maps reliably highlighted clinically relevant histological features, supporting the systems interpretability and diagnostic alignment with expert pathological assessment. InterpretationThis study demonstrates the feasibility of attention-based, weakly supervised learning for accurate OSCC grading from whole-slide images. OralPatho combines high diagnostic performance with real-time interpretability, making it a scalable solution for both advanced pathology labs and resource-limited settings.

OCT Classification Retrospective Clinical In Silico Academic Lab GenAI

Quasi-supervised MR-CT image conversion based on unpaired data.

Zhu R, Ruan Y, Li M, Qian W, Yao Y, Teng Y

•papers•Jun 6 2025

In radiotherapy planning, acquiring both magnetic resonance (MR) and computed tomography (CT) images is crucial for comprehensive evaluation and treatment. However, simultaneous acquisition of MR and CT images is time-consuming, economically expensive, and involves ionizing radiation, which poses health risks to patients. The objective of this study is to generate CT images from radiation-free MR images using a novel quasi-supervised learning framework. In this work, we propose a quasi-supervised framework to explore the underlying relationship between unpaired MR and CT images. Normalized mutual information (NMI) is employed as a similarity metric to evaluate the correspondence between MR and CT scans. To establish optimal pairings, we compute an NMI matrix across the training set and apply the Hungarian algorithm for global matching. The resulting MR-CT pairs, along with their NMI scores, are treated as prior knowledge and integrated into the training process to guide the MR-to-CT image translation model. Experimental results indicate that the proposed method significantly outperforms existing unsupervised image synthesis methods in terms of both image quality and consistency of image features during the MR to CT image conversion process. The generated CT images show a higher degree of accuracy and fidelity to the original MR images, ensuring better preservation of anatomical details and structural integrity. This study proposes a quasi-supervised framework that converts unpaired MR and CT images into structurally consistent pseudo-pairs, providing informative priors to enhance cross-modality image synthesis. This strategy not only improves the accuracy and reliability of MR-CT conversion, but also reduces reliance on costly and scarce paired datasets. The proposed framework offers a practical 1 and scalable solution for real-world medical imaging applications, where paired annotations are often unavailable.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab GenAI

Role of Large Language Models for Suggesting Nerve Involvement in Upper Limbs MRI Reports with Muscle Denervation Signs.

Martín-Noguerol T, López-Úbeda P, Luna A, Gómez-Río M, Górriz JM

•papers•Jun 5 2025

Determining the involvement of specific peripheral nerves (PNs) in the upper limb associated with signs of muscle denervation can be challenging. This study aims to develop, compare, and validate various large language models (LLMs) to automatically identify and establish potential relationships between denervated muscles and their corresponding PNs. We collected 300 retrospective MRI reports in Spanish from upper limb examinations conducted between 2018 and 2024 that showed signs of muscle denervation. An expert radiologist manually annotated these reports based on the affected peripheral nerves (median, ulnar, radial, axillary, and suprascapular). BERT, DistilBERT, mBART, RoBERTa, and Medical-ELECTRA models were fine-tuned and evaluated on the reports. Additionally, an automatic voting system was implemented to consolidate predictions through majority voting. The voting system achieved the highest F1 scores for the median, ulnar, and radial nerves, with scores of 0.88, 1.00, and 0.90, respectively. Medical-ELECTRA also performed well, achieving F1 scores above 0.82 for the axillary and suprascapular nerves. In contrast, mBART demonstrated lower performance, particularly with an F1 score of 0.38 for the median nerve. Our voting system generally outperforms the individually tested LLMs in determining the specific PN likely associated with muscle denervation patterns detected in upper limb MRI reports. This system can thereby assist radiologists by suggesting the implicated PN when generating their radiology reports.

MRI LLM Radiology Report Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Analysis of Research Hotspots and Development Trends in the Diagnosis of Lung Diseases Using Low-Dose CT Based on Bibliometrics.

Liu X, Chen X, Jiang Y, Chen Y, Zhang D, Fan L

•papers•Jun 5 2025

Lung cancer is one of the main threats to global health, among lung diseases. Low-Dose Computed Tomography (LDCT) provides significant benefits for its screening but also brings new diagnostic challenges that require close attention. By searching the Web of Science core collection, we selected articles and reviews published in English between 2005 and June 2024 on topics such as "Low-dose", "CT image", and "Lung". These literatures were analyzed by bibliometric method, and CiteSpace software was used to explore the cooperation between countries, the cooperative relationship between authors, highly cited literature, and the distribution of keywords to reveal the research hotspots and trends in this field. The number of LDCT research articles show a trend of continuous growth between 2019 and 2022. The United States is at the forefront of research in this field, with a centrality of 0.31; China has also rapidly conducted research with a centrality of 0.26. The authors' co-occurrence map shows that research teams in this field are highly cooperative, and their research questions are closely related. The analysis of highly cited literature and keywords confirmed the significant advantages of LDCT in lung cancer screening, which can help reduce the mortality of lung cancer patients and improve the prognosis. "Lung cancer" and "CT" have always been high-frequency keywords, while "image quality" and "low dose CT" have become new hot keywords, indicating that LDCT using deep learning techniques has become a hot topic in early lung cancer research. The study revealed that advancements in CT technology have driven in-depth research from application challenges to image processing, with the research trajectory evolving from technical improvements to health risk assessments and subsequently to AI-assisted diagnosis. Currently, the research focus has shifted toward integrating deep learning with LDCT technology to address complex diagnostic challenges. The study also presents global research trends and geographical distributions of LDCT technology, along with the influence of key research institutions and authors. The comprehensive analysis aims to promote the development and application of LDCT technology in pulmonary disease diagnosis and enhance diagnostic accuracy and patient management efficiency. The future will focus on LDCT reconstruction algorithms to balance image noise and radiation dose. AI-assisted multimodal imaging supports remote diagnosis and personalized health management by providing dynamic analysis, risk assessment, and follow-up recommendations to support early diagnosis.

CT Classification Chest Review Concept Academic Lab GenAI

Stable Vision Concept Transformers for Medical Diagnosis

Lijie Hu, Songning Lai, Yuan Hua, Shu Yang, Jingfeng Zhang, Di Wang

•preprint•Jun 5 2025

Transparency is a paramount concern in the medical field, prompting researchers to delve into the realm of explainable AI (XAI). Among these XAI methods, Concept Bottleneck Models (CBMs) aim to restrict the model's latent space to human-understandable high-level concepts by generating a conceptual layer for extracting conceptual features, which has drawn much attention recently. However, existing methods rely solely on concept features to determine the model's predictions, which overlook the intrinsic feature embeddings within medical images. To address this utility gap between the original models and concept-based models, we propose Vision Concept Transformer (VCT). Furthermore, despite their benefits, CBMs have been found to negatively impact model performance and fail to provide stable explanations when faced with input perturbations, which limits their application in the medical field. To address this faithfulness issue, this paper further proposes the Stable Vision Concept Transformer (SVCT) based on VCT, which leverages the vision transformer (ViT) as its backbone and incorporates a conceptual layer. SVCT employs conceptual features to enhance decision-making capabilities by fusing them with image features and ensures model faithfulness through the integration of Denoised Diffusion Smoothing. Comprehensive experiments on four medical datasets demonstrate that our VCT and SVCT maintain accuracy while remaining interpretable compared to baselines. Furthermore, even when subjected to perturbations, our SVCT model consistently provides faithful explanations, thus meeting the needs of the medical field.

Mixed Modality Classification Methodology In Silico Academic Lab GenAI

Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study: Experimental Research.

Luo P, Fan C, Li A, Jiang T, Jiang A, Qi C, Gan W, Zhu L, Mou W, Zeng D, Tang B, Xiao M, Chu G, Liang Z, Shen J, Liu Z, Wei T, Cheng Q, Lin A, Chen X

•papers•Jun 5 2025

Computed Tomography (CT) is widely acknowledged as the gold standard for diagnosing thoracic diseases. However, the accuracy of interpretation significantly depends on radiologists' expertise. Large Language Models (LLMs) have shown considerable promise in various medical applications, particularly in radiology. This study aims to assess the performance of leading LLMs in analyzing unstructured chest CT reports and to examine how different questioning methodologies and fine-tuning strategies influence their effectiveness in enhancing chest CT diagnosis. This retrospective analysis evaluated 13,489 chest CT reports encompassing 13 common thoracic conditions across pulmonary, cardiovascular, pleural, and upper abdominal systems. Five LLMs (Claude-3.5-Sonnet, GPT-4, GPT-3.5-Turbo, Gemini-Pro, Qwen-Max) were assessed using dual questioning methodologies: multiple-choice and open-ended. Radiologist-curated datasets underwent rigorous preprocessing, including RadLex terminology standardization, multi-step diagnostic validation, and exclusion of ambiguous cases. Model performance was quantified via Subjective Answer Accuracy Rate (SAAR), Reference Answer Accuracy Rate (RAAR), and Area Under the Receiver Operating Characteristic (ROC) Curve analysis. GPT-3.5-Turbo underwent fine-tuning (100 iterations with one training epoch) on 200 high-performing cases to enhance diagnostic precision for initially misclassified conditions. GPT-4 demonstrated superior performance with the highest RAAR of 75.1% in multiple-choice questioning, followed by Qwen-Max (66.0%) and Claude-3.5 (63.5%), significantly outperforming GPT-3.5-Turbo (41.8%) and Gemini-Pro (40.8%) across the entire patient cohort. Multiple-choice questioning consistently improved both RAAR and SAAR for all models compared to open-ended questioning, with RAAR consistently surpassing SAAR. Model performance demonstrated notable variations across different diseases and organ conditions. Notably, fine-tuning substantially enhanced the performance of GPT-3.5-Turbo, which initially exhibited suboptimal results in most scenarios. This study demonstrated that general-purpose LLMs can effectively interpret chest CT reports, with performance varying significantly across models depending on the questioning methodology and fine-tuning approaches employed. For surgical practice, these findings provided evidence-based guidance for selecting appropriate AI tools to enhance preoperative planning, particularly for thoracic procedures. The integration of optimized LLMs into surgical workflows may improve decision-making efficiency, risk stratification, and diagnostic speed, potentially contributing to better surgical outcomes through more accurate preoperative assessment.

CT Classification Chest Retrospective Clinical In Silico Academic Lab GenAI

Epistasis regulates genetic control of cardiac hypertrophy.

Wang Q, Tang TM, Youlton M, Weldy CS, Kenney AM, Ronen O, Hughes JW, Chin ET, Sutton SC, Agarwal A, Li X, Behr M, Kumbier K, Moravec CS, Tang WHW, Margulies KB, Cappola TP, Butte AJ, Arnaout R, Brown JB, Priest JR, Parikh VN, Yu B, Ashley EA

•papers•Jun 5 2025

Although genetic variant effects often interact nonadditively, strategies to uncover epistasis remain in their infancy. Here we develop low-signal signed iterative random forests to elucidate the complex genetic architecture of cardiac hypertrophy, using deep learning-derived left ventricular mass estimates from 29,661 UK Biobank cardiac magnetic resonance images. We report epistatic variants near CCDC141, IGF1R, TTN and TNKS, identifying loci deemed insignificant in genome-wide association studies. Functional genomic and integrative enrichment analyses reveal that genes mapped from these loci share biological process gene ontologies and myogenic regulatory factors. Transcriptomic network analyses using 313 human hearts demonstrate strong co-expression correlations among these genes in healthy hearts, with significantly reduced connectivity in failing hearts. To assess causality, RNA silencing in human induced pluripotent stem cell-derived cardiomyocytes, combined with novel microfluidic single-cell morphology analysis, confirms that cardiomyocyte hypertrophy is nonadditively modifiable by interactions between CCDC141, TTN and IGF1R. Our results expand the scope of cardiac genetic regulation to epistasis.

MRI Segmentation Cardiac Retrospective Clinical In Silico Academic Lab GenAI

Regulating Generative AI in Radiology Practice: A Trilaminar Approach to Balancing Risk with Innovation.

Gowda V, Bizzo BC, Dreyer KJ

•papers•Jun 4 2025

Generative AI tools have proliferated across the market, garnered significant media attention, and increasingly found incorporation into the radiology practice setting. However, they raise a number of unanswered questions concerning governance and appropriate use. By their nature as general-purpose technologies, they strain the limits of existing FDA premarket review pathways to regulate them and introduce new sources of liability, privacy, and clinical risk. A multilayered governance approach is needed to balance innovation with safety. To address gaps in oversight, this piece establishes a trilaminar governance model for generative AI technologies. This treats federal regulations as a scaffold, upon which tiers of institutional guidelines and industry self-regulatory frameworks are added to create a comprehensive paradigm composed of interlocking parts. Doing so would provide radiologists with an effective risk management strategy for the future, foster continued technical development, and ultimately, promote patient care.

LLM Radiology Report Review Academic Lab Policy GenAI

An Unsupervised XAI Framework for Dementia Detection with Context Enrichment

Singh, D., Brima, Y., Levin, F., Becker, M., Hiller, B., Hermann, A., Villar-Munoz, I., Beichert, L., Bernhardt, A., Buerger, K., Butryn, M., Dechent, P., Duezel, E., Ewers, M., Fliessbach, K., D. Freiesleben, S., Glanz, W., Hetzer, S., Janowitz, D., Goerss, D., Kilimann, I., Kimmich, O., Laske, C., Levin, J., Lohse, A., Luesebrink, F., Munk, M., Perneczky, R., Peters, O., Preis, L., Priller, J., Prudlo, J., Prychynenko, D., Rauchmann, B.-S., Rostamzadeh, A., Roy-Kluth, N., Scheffler, K., Schneider, A., Droste zu Senden, L., H. Schott, B., Spottke, A., Synofzik, M., Wiltfang, J., Jessen, F., W

•preprint•Jun 4 2025

IntroductionExplainable Artificial Intelligence (XAI) methods enhance the diagnostic efficiency of clinical decision support systems by making the predictions of a convolutional neural networks (CNN) on brain imaging more transparent and trustworthy. However, their clinical adoption is limited due to limited validation of the explanation quality. Our study introduces a framework that evaluates XAI methods by integrating neuroanatomical morphological features with CNN-generated relevance maps for disease classification. MethodsWe trained a CNN using brain MRI scans from six cohorts: ADNI, AIBL, DELCODE, DESCRIBE, EDSD, and NIFD (N=3253), including participants that were cognitively normal, with amnestic mild cognitive impairment, dementia due to Alzheimers disease and frontotemporal dementia. Clustering analysis benchmarked different explanation space configurations by using morphological features as proxy-ground truth. We implemented three post-hoc explanations methods: i) by simplifying model decisions, ii) explanation-by-example, and iii) textual explanations. A qualitative evaluation by clinicians (N=6) was performed to assess their clinical validity. ResultsClustering performance improved in morphology enriched explanation spaces, improving both homogeneity and completeness of the clusters. Post hoc explanations by model simplification largely delineated converters and stable participants, while explanation-by-example presented possible cognition trajectories. Textual explanations gave rule-based summarization of pathological findings. Clinicians qualitative evaluation highlighted challenges and opportunities of XAI for different clinical applications. ConclusionOur study refines XAI explanation spaces and applies various approaches for generating explanations. Within the context of AI-based decision support system in dementia research we found the explanations methods to be promising towards enhancing diagnostic efficiency, backed up by the clinical assessments.

MRI Classification Neurological Methodology In Silico Academic Lab GenAI

Filter Papers

Tags

Clinically Interpretable Deep Learning via Sparse BagNets for Epiretinal Membrane and Related Pathology Detection

Dual-stage AI system for Pathologist-Free Tumor Detectionand subtyping in Oral Squamous Cell Carcinoma

Quasi-supervised MR-CT image conversion based on unpaired data.

Role of Large Language Models for Suggesting Nerve Involvement in Upper Limbs MRI Reports with Muscle Denervation Signs.

Analysis of Research Hotspots and Development Trends in the Diagnosis of Lung Diseases Using Low-Dose CT Based on Bibliometrics.

Stable Vision Concept Transformers for Medical Diagnosis

Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study: Experimental Research.

Epistasis regulates genetic control of cardiac hypertrophy.

Regulating Generative AI in Radiology Practice: A Trilaminar Approach to Balancing Risk with Innovation.

An Unsupervised XAI Framework for Dementia Detection with Context Enrichment

Ready to Sharpen Your Edge?