Latest Papers on Radiology AI. Category: papers, Sources: pubmed

Trustworthy AI for stage IV non-small cell lung cancer: Automatic segmentation and uncertainty quantification.

Dedeken S, Conze PH, Damerjian Pieters V, Gallinato O, Faure J, Colin T, Visvikis D

•papers•May 13 2025

Accurate segmentation of lung tumors is essential for advancing personalized medicine in non-small cell lung cancer (NSCLC). However, stage IV NSCLC presents significant challenges due to heterogeneous tumor morphology and the presence of associated conditions including infection, atelectasis and pleural effusion. The complexity of multicentric datasets further complicates robust segmentation across diverse clinical settings. In this study, we evaluate deep-learning-based approaches for automated segmentation of advanced-stage lung tumors using 3D architectures on 387 CT scans from the Deep-Lung-IV study. Through comprehensive experiments, we assess the impact of model design, HU windowing, and dataset size on delineation performance, providing practical guidelines for robust implementation. Additionally, we propose a confidence score using deep ensembles to quantify prediction uncertainty and automate the identification of complex cases that require further review. Our results demonstrate the potential of attention-based architectures and specific preprocessing strategies to improve segmentation quality in such a challenging clinical scenario, while emphasizing the importance of uncertainty estimation to build trustworthy AI systems in medical imaging. Code is available at: https://github.com/Sacha-Dedeken/SegStageIVNSCLC.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab Open Code

Diagnosis of thyroid cartilage invasion by laryngeal and hypopharyngeal cancers based on CT with deep learning.

Takano Y, Fujima N, Nakagawa J, Dobashi H, Shimizu Y, Kanaya M, Kano S, Homma A, Kudo K

•papers•May 13 2025

To develop a convolutional neural network (CNN) model to diagnose thyroid cartilage invasion by laryngeal and hypopharyngeal cancers observed on computed tomography (CT) images and evaluate the model's diagnostic performance. We retrospectively analyzed 91 cases of laryngeal or hypopharyngeal cancer treated surgically at our hospital during the period April 2010 through May 2023, and we divided the cases into datasets for training (n = 61) and testing (n = 30). We reviewed the CT images and pathological diagnoses in all cases to determine the invasion positive- or negative-status as a ground truth. We trained the new CNN model to classify thyroid cartilage invasion-positive or -negative status from the pre-treatment axial CT images by transfer learning from Residual Network 101 (ResNet101), using the training dataset. We then used the test dataset to evaluate the model's performance. Two radiologists, one with extensive head and neck imaging experience (senior reader) and the other with less experience (junior reader) reviewed the CT images of the test dataset to determine whether thyroid cartilage invasion was present. The following were obtained by the CNN model with the test dataset: area under the curve (AUC), 0.82; 90 % accuracy, 80 % sensitivity, and 95 % specificity. The CNN model showed a significant difference in AUCs compared to the junior reader (p = 0.035) but not the senior reader (p = 0.61). The CNN-based diagnostic model can be a useful supportive tool for the assessment of thyroid cartilage invasion in patients with laryngeal or hypopharyngeal cancer.

CT Classification Retrospective Clinical In Silico Academic Lab

A survey of deep-learning-based radiology report generation using multimodal inputs.

Wang X, Figueredo G, Li R, Zhang WE, Chen W, Chen X

•papers•May 13 2025

Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.

Mixed Modality Report Generation Review Academic Lab GenAI Open Dataset

Deep learning diagnosis of hepatic echinococcosis based on dual-modality plain CT and ultrasound images: a large-scale, multicenter, diagnostic study.

Zhang J, Zhang J, Tang H, Meng Y, Chen X, Chen J, Chen Y

•papers•May 12 2025

Given the current limited accuracy of imaging screening for Hepatic Echinococcosis (HCE) in under-resourced areas, the authors developed and validated a Multimodal Imaging system (HEAC) based on plain Computed Tomography (CT) combined with ultrasound for HCE screening in those areas. In this study, we developed a multimodal deep learning diagnostic system by integrating ultrasound and plain CT imaging data to differentiate hepatic echinococcosis, liver cysts, liver abscesses, and healthy liver conditions. We collected a dataset of 8979 cases spanning 18 years from eight hospitals in Xinjiang China, including both retrospective and prospective data. To enhance the robustness and generalization of the diagnostic model, after modeling CT and ultrasound images using EfficientNet3D and EfficientNet-B0, external and prospective tests were conducted, and the model's performance was compared with diagnoses made by experienced physicians. Across internal and external test sets, the fused model of CT and ultrasound consistently outperformed the individual modality models and physician diagnoses. In the prospective test set from the same center, the fusion model achieved an accuracy of 0.816, sensitivity of 0.849, specificity of 0.942, and an AUC of 0.963, significantly exceeding physician performance (accuracy 0.900, sensitivity 0.800, specificity 0.933). The external test sets across seven other centers demonstrated similar results, with the fusion model achieving an overall accuracy of 0.849, sensitivity of 0.859, specificity of 0.942, and AUC of 0.961. The multimodal deep learning diagnostic system that integrates CT and ultrasound significantly increases the diagnosis accuracy of HCE, liver cysts, and liver abscesses. It beats standard single-modal approaches and physician diagnoses by lowering misdiagnosis rates and increasing diagnostic reliability. It emphasizes the promise of multimodal imaging systems in tackling diagnostic issues in low-resource areas, opening the path for improved medical care accessibility and outcomes.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab

Prognostic Value Of Deep Learning Based RCA PCAT and Plaque Volume Beyond CT-FFR In Patients With Stent Implantation.

Huang Z, Tang R, Du X, Ding Y, Yang Z, Cao B, Li M, Wang X, Wang W, Li Z, Xiao J, Wang X

•papers•May 12 2025

The study aims to investigate the prognostic value of deep learning based pericoronary adipose tissue attenuation computed tomography (PCAT) and plaque volume beyond coronary computed tomography angiography (CTA) -derived fractional flow reserve (CT-FFR) in patients with percutaneous coronary intervention (PCI). A total of 183 patients with PCI who underwent coronary CTA were included in this retrospective study. Imaging assessment included PCAT, plaque volume, and CT-FFR, which were performed using an artificial intelligence (AI) assisted workstation. Kaplan-Meier survival curves analysis and multivariate Cox regression were used to estimate major adverse cardiovascular events (MACE), including non-fatal myocardial infraction (MI), stroke, and mortality. In total, 22 (12%) MACE occurred during a median follow-up period of 38.0 months (34.6-54.6 months). Kaplan-Meier analysis revealed that right coronary artery (RCA) PCAT (p = 0.007) and plaque volume (p = 0.008) were significantly associated with the increase in MACE. Multivariable Cox regression indicated that RCA PCAT (hazard ratios (HR): 2.94, 95%CI: 1.15-7.50, p = 0.025) and plaque volume (HR: 3.91, 95%CI: 1.20-12.75, p = 0.024)　were independent predictors of MACE after adjustment by clinical risk factors. However, CT-FFR was not independently associated with MACE in multivariable Cox regression (p = 0.271). Deep learning based RCA PCAT and plaque volume derived from coronary CTA were found to be more strongly associated with MACE than CTFFR in patients with PCI.

CT Segmentation Cardiac Retrospective Clinical In Silico Academic Lab

[Pulmonary vascular interventions: innovating through adaptation and advancing through differentiation].

Li J, Wan J

•papers•May 12 2025

Pulmonary vascular intervention technology, with its minimally invasive and precise advantages, has been a groundbreaking advancement in the treatment of pulmonary vascular diseases. Techniques such as balloon pulmonary angioplasty (BPA), pulmonary artery stenting, and percutaneous pulmonary artery denervation (PADN) have significantly improved the prognoses for conditions such as chronic thromboembolic pulmonary hypertension (CTEPH), pulmonary artery stenosis, and pulmonary arterial hypertension (PAH). Although based on coronary intervention (PCI) techniques such as guidewire manipulation and balloon dilatation, pulmonary vascular interventions require specific modifications to address the unique characteristics of the pulmonary circulation, low pressure, thin-walled vessels, and complex branching, to mitigate risks of perforation and thrombosis. Future directions include the development of dedicated instruments, multi-modality imaging guidance, artificial intelligence-assisted procedures, and molecular interventional therapies. These innovations aim to establish an independent theoretical framework for pulmonary vascular interventions, facilitating their transition from "adjuvant therapies" to "core treatments" in clinical practice.

Mixed Modality Detection Cardiac Review Concept Academic Lab

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

Güneş YC, Cesur T, Çamur E

•papers•May 12 2025

This study aimed to compare six large language models (LLMs) [Chat Generative Pre-trained Transformer (ChatGPT)o1-preview, ChatGPT-4o, ChatGPT-4o with canvas, Google Gemini 1.5 Pro, Claude 3.5 Sonnet, and Claude 3 Opus] in generating radiology references, assessing accuracy, fabrication, and bibliographic completeness. In this cross-sectional observational study, 120 open-ended questions were administered across eight radiology subspecialties (neuroradiology, abdominal, musculoskeletal, thoracic, pediatric, cardiac, head and neck, and interventional radiology), with 15 questions per subspecialty. Each question prompted the LLMs to provide responses containing four references with in-text citations and complete bibliographic details (authors, title, journal, publication year/month, volume, issue, page numbers, and PubMed Identifier). References were verified using Medline, Google Scholar, the Directory of Open Access Journals, and web searches. Each bibliographic element was scored for correctness, and a composite final score [(FS): 0-36] was calculated by summing the correct elements and multiplying this by a 5-point verification score for content relevance. The FS values were then categorized into a 5-point Likert scale reference accuracy score (RAS: 0 = fabricated; 4 = fully accurate). Non-parametric tests (Kruskal-Wallis, Tamhane's T2, Wilcoxon signed-rank test with Bonferroni correction) were used for statistical comparisons. Claude 3.5 Sonnet demonstrated the highest reference accuracy, with 80.8% fully accurate references (RAS 4) and a fabrication rate of 3.1%, significantly outperforming all other models (P < 0.001). Claude 3 Opus ranked second, achieving 59.6% fully accurate references and a fabrication rate of 18.3% (P < 0.001). ChatGPT-based models (ChatGPT-4o, ChatGPT-4o with canvas, and ChatGPT o1-preview) exhibited moderate accuracy, with fabrication rates ranging from 27.7% to 52.9% and <8% fully accurate references. Google Gemini 1.5 Pro had the lowest performance, achieving only 2.7% fully accurate references and the highest fabrication rate of 60.6% (P < 0.001). Reference accuracy also varied by subspecialty, with neuroradiology and cardiac radiology outperforming pediatric and head and neck radiology. Claude 3.5 Sonnet significantly outperformed all other models in generating verifiable radiology references, and Claude 3 Opus showed moderate performance. In contrast, ChatGPT models and Google Gemini 1.5 Pro delivered substantially lower accuracy with higher rates of fabricated references, highlighting current limitations in automated academic citation generation. The high accuracy of Claude 3.5 Sonnet can improve radiology literature reviews, research, and education with dependable references. The poor performance of other models, with high fabrication rates, risks misinformation in clinical and academic settings and highlights the need for refinement to ensure safe and effective use.

LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI Policy

AutoFRS: an externally validated, annotation-free approach to computational preoperative complication risk stratification in pancreatic surgery - an experimental study.

Kolbinger FR, Bhasker N, Schön F, Cser D, Zwanenburg A, Löck S, Hempel S, Schulze A, Skorobohach N, Schmeiser HM, Klotz R, Hoffmann RT, Probst P, Müller B, Bodenstedt S, Wagner M, Weitz J, Kühn JP, Distler M, Speidel S

•papers•May 12 2025

The risk of postoperative pancreatic fistula (POPF), one of the most dreaded complications after pancreatic surgery, can be predicted from preoperative imaging and tabular clinical routine data. However, existing studies suffer from limited clinical applicability due to a need for manual data annotation and a lack of external validation. We propose AutoFRS (automated fistula risk score software), an externally validated end-to-end prediction tool for POPF risk stratification based on multimodal preoperative data. We trained AutoFRS on preoperative contrast-enhanced computed tomography imaging and clinical data from 108 patients undergoing pancreatic head resection and validated it on an external cohort of 61 patients. Prediction performance was assessed using the area under the receiver operating characteristic curve (AUC) and balanced accuracy. In addition, model performance was compared to the updated alternative fistula risk score (ua-FRS), the current clinical gold standard method for intraoperative POPF risk stratification. AutoFRS achieved an AUC of 0.81 and a balanced accuracy of 0.72 in internal validation and an AUC of 0.79 and a balanced accuracy of 0.70 in external validation. In a patient subset with documented intraoperative POPF risk factors, AutoFRS (AUC: 0.84 ± 0.05) performed on par with the uaFRS (AUC: 0.85 ± 0.06). The AutoFRS web application facilitates annotation-free prediction of POPF from preoperative imaging and clinical data based on the AutoFRS prediction model. POPF can be predicted from multimodal clinical routine data without human data annotation, automating the risk prediction process. We provide additional evidence of the clinical feasibility of preoperative POPF risk stratification and introduce a software pipeline for future prospective evaluation.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Real-world Evaluation of Computer-aided Pulmonary Nodule Detection Software Sensitivity and False Positive Rate.

El Alam R, Jhala K, Hammer MM

•papers•May 12 2025

Evaluate the false positive rate (FPR) of nodule detection software in real-world use. A total of 250 nonenhanced chest computed tomography (CT) examinations were randomly selected from an academic institution and submitted to the ClearRead nodule detection system (Riverain Technologies). Detected findings were reviewed by a thoracic imaging fellow. Nodules were classified as true nodules, lymph nodes, or other findings (branching opacity, vessel, mucus plug, etc.), and FPR was recorded. FPR was compared with the initial published FPR in the literature. True diagnosis was based on pathology or follow-up stability. For cases with malignant nodules, we recorded whether malignancy was detected by clinical radiology report (which was performed without software assistance) and/or ClearRead. Twenty-one CTs were excluded due to a lack of thin-slice images, and 229 CTs were included. A total of 594 findings were reported by ClearRead, of which 362 (61%) were true nodules and 232 (39%) were other findings. Of the true nodules, 297 were solid nodules, of which 79 (27%) were intrapulmonary lymph nodes. The mean findings identified by ClearRead per scan was 2.59. ClearRead mean FPR was 1.36, greater than the published rate of 0.58 (P<0.0001). If we consider true lung nodules <6 mm as false positive, FPR is 2.19. A malignant nodule was present in 30 scans; ClearRead identified it in 26 (87%), and the clinical report identified it in 28 (93%) (P=0.32). In real-world use, ClearRead had a much higher FPR than initially reported but a similar sensitivity for malignant nodule detection compared with unassisted radiologists.

CT Detection Chest Retrospective Clinical Post Market Academic Lab

Paradigm-Shifting Attention-based Hybrid View Learning for Enhanced Mammography Breast Cancer Classification with Multi-Scale and Multi-View Fusion.

Zhao H, Zhang C, Wang F, Li Z, Gao S

•papers•May 12 2025

Breast cancer poses a serious threat to women's health, and its early detection is crucial for enhancing patient survival rates. While deep learning has significantly advanced mammographic image analysis, existing methods struggle to balance between view consistency with input adaptability. Furthermore, current models face challenges in accurately capturing multi-scale features, especially when subtle lesion variations across different scales are involved. To address this challenge, this paper proposes a Hybrid View Learning (HVL) paradigm that unifies traditional Single-View and Multi-View Learning approaches. The core component of this paradigm, our Attention-based Hybrid View Learning (AHVL) framework, incorporates two essential attention mechanisms: Contrastive Switch Attention (CSA) and Selective Pooling Attention (SPA). The CSA mechanism flexibly alternates between self-attention and cross-attention based on data integrity, integrating a pre-trained language model for contrastive learning to enhance model stability. Meanwhile, the SPA module employs multi-scale feature pooling and selection to capture critical features from mammographic images, overcoming the limitations of traditional models that struggle with fine-grained lesion detection. Experimental validation on the INbreast and CBIS-DDSM datasets shows that the AHVL framework outperforms both single-view and multi-view methods, especially under extreme view missing conditions. Even with an 80% missing rate on both datasets, AHVL maintains the highest accuracy and experiences the smallest performance decline in metrics like F1 score and AUC-PR, demonstrating its robustness and stability. This study redefines mammographic image analysis by leveraging attention-based hybrid view processing, setting a new standard for precise and efficient breast cancer diagnosis.

Mammography Classification Breast Retrospective Clinical In Silico Benchmark SOTA

Filter Papers

Tags

Trustworthy AI for stage IV non-small cell lung cancer: Automatic segmentation and uncertainty quantification.

Diagnosis of thyroid cartilage invasion by laryngeal and hypopharyngeal cancers based on CT with deep learning.

A survey of deep-learning-based radiology report generation using multimodal inputs.

Deep learning diagnosis of hepatic echinococcosis based on dual-modality plain CT and ultrasound images: a large-scale, multicenter, diagnostic study.

Prognostic Value Of Deep Learning Based RCA PCAT and Plaque Volume Beyond CT-FFR In Patients With Stent Implantation.

[Pulmonary vascular interventions: innovating through adaptation and advancing through differentiation].

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

AutoFRS: an externally validated, annotation-free approach to computational preoperative complication risk stratification in pancreatic surgery - an experimental study.

Real-world Evaluation of Computer-aided Pulmonary Nodule Detection Software Sensitivity and False Positive Rate.

Paradigm-Shifting Attention-based Hybrid View Learning for Enhanced Mammography Breast Cancer Classification with Multi-Scale and Multi-View Fusion.

Ready to Sharpen Your Edge?