Latest Papers on Radiology AI. Tags: In Silico

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

Güneş YC, Cesur T, Çamur E

•papers•May 12 2025

This study aimed to compare six large language models (LLMs) [Chat Generative Pre-trained Transformer (ChatGPT)o1-preview, ChatGPT-4o, ChatGPT-4o with canvas, Google Gemini 1.5 Pro, Claude 3.5 Sonnet, and Claude 3 Opus] in generating radiology references, assessing accuracy, fabrication, and bibliographic completeness. In this cross-sectional observational study, 120 open-ended questions were administered across eight radiology subspecialties (neuroradiology, abdominal, musculoskeletal, thoracic, pediatric, cardiac, head and neck, and interventional radiology), with 15 questions per subspecialty. Each question prompted the LLMs to provide responses containing four references with in-text citations and complete bibliographic details (authors, title, journal, publication year/month, volume, issue, page numbers, and PubMed Identifier). References were verified using Medline, Google Scholar, the Directory of Open Access Journals, and web searches. Each bibliographic element was scored for correctness, and a composite final score [(FS): 0-36] was calculated by summing the correct elements and multiplying this by a 5-point verification score for content relevance. The FS values were then categorized into a 5-point Likert scale reference accuracy score (RAS: 0 = fabricated; 4 = fully accurate). Non-parametric tests (Kruskal-Wallis, Tamhane's T2, Wilcoxon signed-rank test with Bonferroni correction) were used for statistical comparisons. Claude 3.5 Sonnet demonstrated the highest reference accuracy, with 80.8% fully accurate references (RAS 4) and a fabrication rate of 3.1%, significantly outperforming all other models (P < 0.001). Claude 3 Opus ranked second, achieving 59.6% fully accurate references and a fabrication rate of 18.3% (P < 0.001). ChatGPT-based models (ChatGPT-4o, ChatGPT-4o with canvas, and ChatGPT o1-preview) exhibited moderate accuracy, with fabrication rates ranging from 27.7% to 52.9% and <8% fully accurate references. Google Gemini 1.5 Pro had the lowest performance, achieving only 2.7% fully accurate references and the highest fabrication rate of 60.6% (P < 0.001). Reference accuracy also varied by subspecialty, with neuroradiology and cardiac radiology outperforming pediatric and head and neck radiology. Claude 3.5 Sonnet significantly outperformed all other models in generating verifiable radiology references, and Claude 3 Opus showed moderate performance. In contrast, ChatGPT models and Google Gemini 1.5 Pro delivered substantially lower accuracy with higher rates of fabricated references, highlighting current limitations in automated academic citation generation. The high accuracy of Claude 3.5 Sonnet can improve radiology literature reviews, research, and education with dependable references. The poor performance of other models, with high fabrication rates, risks misinformation in clinical and academic settings and highlights the need for refinement to ensure safe and effective use.

LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI Policy

JSover: Joint Spectrum Estimation and Multi-Material Decomposition from Single-Energy CT Projections

Qing Wu, Hongjiang Wei, Jingyi Yu, S. Kevin Zhou, Yuyao Zhang

•preprint•May 12 2025

Multi-material decomposition (MMD) enables quantitative reconstruction of tissue compositions in the human body, supporting a wide range of clinical applications. However, traditional MMD typically requires spectral CT scanners and pre-measured X-ray energy spectra, significantly limiting clinical applicability. To this end, various methods have been developed to perform MMD using conventional (i.e., single-energy, SE) CT systems, commonly referred to as SEMMD. Despite promising progress, most SEMMD methods follow a two-step image decomposition pipeline, which first reconstructs monochromatic CT images using algorithms such as FBP, and then performs decomposition on these images. The initial reconstruction step, however, neglects the energy-dependent attenuation of human tissues, introducing severe nonlinear beam hardening artifacts and noise into the subsequent decomposition. This paper proposes JSover, a fundamentally reformulated one-step SEMMD framework that jointly reconstructs multi-material compositions and estimates the energy spectrum directly from SECT projections. By explicitly incorporating physics-informed spectral priors into the SEMMD process, JSover accurately simulates a virtual spectral CT system from SE acquisitions, thereby improving the reliability and accuracy of decomposition. Furthermore, we introduce implicit neural representation (INR) as an unsupervised deep learning solver for representing the underlying material maps. The inductive bias of INR toward continuous image patterns constrains the solution space and further enhances estimation quality. Extensive experiments on both simulated and real CT datasets show that JSover outperforms state-of-the-art SEMMD methods in accuracy and computational efficiency.

CT Segmentation Whole Body Methodology In Silico Breakthrough

Accelerating prostate rs-EPI DWI with deep learning: Halving scan time, enhancing image quality, and validating in vivo.

Zhang P, Feng Z, Chen S, Zhu J, Fan C, Xia L, Min X

•papers•May 12 2025

This study aims to evaluate the feasibility and effectiveness of deep learning-based super-resolution techniques to reduce scan time while preserving image quality in high-resolution prostate diffusion-weighted imaging (DWI) with readout-segmented echo-planar imaging (rs-EPI). We retrospectively and prospectively analyzed prostate rs-EPI DWI data, employing deep learning super-resolution models, particularly the Multi-Scale Self-Similarity Network (MSSNet), to reconstruct low-resolution images into high-resolution images. Performance metrics such as structural similarity index (SSIM), Peak signal-to-noise ratio (PSNR), and normalized root mean squared error (NRMSE) were used to compare reconstructed images against the high-resolution ground truth (HRGT). Additionally, we evaluated the apparent diffusion coefficient (ADC) values and signal-to-noise ratio (SNR) across different models. The MSSNet model demonstrated superior performance in image reconstruction, achieving maximum SSIM values of 0.9798, and significant improvements in PSNR and NRMSE compared to other models. The deep learning approach reduced the rs-EPI DWI scan time by 54.4 % while maintaining image quality comparable to HRGT. Pearson correlation analysis revealed a strong correlation between ADC values from deep learning-reconstructed images and the ground truth, with differences remaining within 5 %. Furthermore, all models showed significant SNR enhancement, with MSSNet performing best across most cases. Deep learning-based super-resolution techniques, particularly MSSNet, effectively reduce scan time and enhance image quality in prostate rs-EPI DWI, making them promising tools for clinical applications.

MRI Reconstruction Abdominal Retrospective Clinical In Silico Academic Lab

Prognostic Value Of Deep Learning Based RCA PCAT and Plaque Volume Beyond CT-FFR In Patients With Stent Implantation.

Huang Z, Tang R, Du X, Ding Y, Yang Z, Cao B, Li M, Wang X, Wang W, Li Z, Xiao J, Wang X

•papers•May 12 2025

The study aims to investigate the prognostic value of deep learning based pericoronary adipose tissue attenuation computed tomography (PCAT) and plaque volume beyond coronary computed tomography angiography (CTA) -derived fractional flow reserve (CT-FFR) in patients with percutaneous coronary intervention (PCI). A total of 183 patients with PCI who underwent coronary CTA were included in this retrospective study. Imaging assessment included PCAT, plaque volume, and CT-FFR, which were performed using an artificial intelligence (AI) assisted workstation. Kaplan-Meier survival curves analysis and multivariate Cox regression were used to estimate major adverse cardiovascular events (MACE), including non-fatal myocardial infraction (MI), stroke, and mortality. In total, 22 (12%) MACE occurred during a median follow-up period of 38.0 months (34.6-54.6 months). Kaplan-Meier analysis revealed that right coronary artery (RCA) PCAT (p = 0.007) and plaque volume (p = 0.008) were significantly associated with the increase in MACE. Multivariable Cox regression indicated that RCA PCAT (hazard ratios (HR): 2.94, 95%CI: 1.15-7.50, p = 0.025) and plaque volume (HR: 3.91, 95%CI: 1.20-12.75, p = 0.024)　were independent predictors of MACE after adjustment by clinical risk factors. However, CT-FFR was not independently associated with MACE in multivariable Cox regression (p = 0.271). Deep learning based RCA PCAT and plaque volume derived from coronary CTA were found to be more strongly associated with MACE than CTFFR in patients with PCI.

CT Segmentation Cardiac Retrospective Clinical In Silico Academic Lab

Automated field-in-field planning for tangential breast radiation therapy based on digitally reconstructed radiograph.

Srikornkan P, Khamfongkhruea C, Intanin P, Thongsawad S

•papers•May 12 2025

The tangential field-in-field (FIF) technique is a widely used method in breast radiation therapy, known for its efficiency and the reduced number of fields required in treatment planning. However, it is labor-intensive, requiring manual shaping of the multileaf collimator (MLC) to minimize hot spots. This study aims to develop a novel automated FIF planning approach for tangential breast radiation therapy using Digitally Reconstructed Radiograph (DRR) images. A total of 78 patients were selected to train and test a fluence map prediction model based on U-Net architecture. DRR images were used as input data to predict the fluence maps. The predicted fluence maps for each treatment plan were then converted into MLC positions and exported as Digital Imaging and Communications in Medicine (DICOM) files. These files were used to recalculate the dose distribution and assess dosimetric parameters for both the PTV and OARs. The mean absolute error (MAE) between the predicted and original fluence map was 0.007 ± 0.002. The result of gamma analysis indicates strong agreement between the predicted and original fluence maps, with gamma passing rate values of 95.47 ± 4.27 for the 3 %/3 mm criteria, 94.65 ± 4.32 for the 3 %/2 mm criteria, and 83.4 ± 12.14 for the 2 %/2 mm criteria. The plan quality, in terms of tumor coverage and doses to organs at risk (OARs), showed no significant differences between the automated FIF and original plans. The automated plans yielded promising results, with plan quality comparable to the original.

X-Ray Segmentation Breast Retrospective Clinical In Silico Academic Lab

Paradigm-Shifting Attention-based Hybrid View Learning for Enhanced Mammography Breast Cancer Classification with Multi-Scale and Multi-View Fusion.

Zhao H, Zhang C, Wang F, Li Z, Gao S

•papers•May 12 2025

Breast cancer poses a serious threat to women's health, and its early detection is crucial for enhancing patient survival rates. While deep learning has significantly advanced mammographic image analysis, existing methods struggle to balance between view consistency with input adaptability. Furthermore, current models face challenges in accurately capturing multi-scale features, especially when subtle lesion variations across different scales are involved. To address this challenge, this paper proposes a Hybrid View Learning (HVL) paradigm that unifies traditional Single-View and Multi-View Learning approaches. The core component of this paradigm, our Attention-based Hybrid View Learning (AHVL) framework, incorporates two essential attention mechanisms: Contrastive Switch Attention (CSA) and Selective Pooling Attention (SPA). The CSA mechanism flexibly alternates between self-attention and cross-attention based on data integrity, integrating a pre-trained language model for contrastive learning to enhance model stability. Meanwhile, the SPA module employs multi-scale feature pooling and selection to capture critical features from mammographic images, overcoming the limitations of traditional models that struggle with fine-grained lesion detection. Experimental validation on the INbreast and CBIS-DDSM datasets shows that the AHVL framework outperforms both single-view and multi-view methods, especially under extreme view missing conditions. Even with an 80% missing rate on both datasets, AHVL maintains the highest accuracy and experiences the smallest performance decline in metrics like F1 score and AUC-PR, demonstrating its robustness and stability. This study redefines mammographic image analysis by leveraging attention-based hybrid view processing, setting a new standard for precise and efficient breast cancer diagnosis.

Mammography Classification Breast Retrospective Clinical In Silico Benchmark SOTA

Use of Artificial Intelligence in Recognition of Fetal Open Neural Tube Defect on Prenatal Ultrasound.

Kumar M, Arora U, Sengupta D, Nain S, Meena D, Yadav R, Perez M

•papers•May 12 2025

To compare the axial cranial ultrasound images of normal and open neural tube defect (NTD) fetuses using a deep learning (DL) model and to assess its predictive accuracy in identifying open NTD.It was a prospective case-control study. Axial trans-thalamic fetal ultrasound images of participants with open fetal NTD and normal controls between 14 and 28 weeks of gestation were taken after consent. The images were divided into training, testing, and validation datasets randomly in the ratio of 70:15:15. The images were further processed and classified using DL convolutional neural network (CNN) transfer learning (TL) models. The TL models were trained for 50 epochs. The data was analyzed in terms of Cohen kappa score, accuracy score, area under receiver operating curve (AUROC) score, F1 score validity, sensitivity, and specificity of the test.A total of 59 cases and 116 controls were fully followed. Efficient net B0, Visual Geometry Group (VGG), and Inception V3 TL models were used. Both Efficient net B0 and VGG16 models gave similar high training and validation accuracy (100 and 95.83%, respectively). Using inception V3, the training and validation accuracy was 98.28 and 95.83%, respectively. The sensitivity and specificity of Efficient NetB0 was 100 and 89%, respectively, and was the best.The analysis of the changes in axial images of the fetal cranium using the DL model, Efficient Net B0 proved to be an effective model to be used in clinical application for the identification of open NTD. · Open spina bifida is often missed due to the nonrecognition of the lemon sign on ultrasound.. · Image classification using DL identified open spina bifida with excellent accuracy.. · The research is clinically relevant in low- and middle-income countries..

Ultrasound Classification Neurological Prospective In Silico Academic Lab

MRI-Based Diagnostic Model for Alzheimer's Disease Using 3D-ResNet.

Chen D, Yang H, Li H, He X, Mu H

•papers•May 12 2025

Alzheimer's disease (AD), a progressive neurodegenerative disorder, is the leading cause of dementia worldwide and remains incurable once it begins. Therefore, early and accurate diagnosis is essential for effective intervention. Leveraging recent advances in deep learning, this study proposes a novel diagnostic model based on the 3D-ResNet architecture to classify three cognitive states: AD, mild cognitive impairment (MCI), and cognitively normal (CN) individuals, using MRI data. The model integrates the strengths of ResNet and 3D convolutional neural networks (3D-CNN), and incorporates a special attention mechanism(SAM) within the residual structure to enhance feature representation. The study utilized the ADNI dataset, comprising 800 brain MRI scans. The dataset was split in a 7:3 ratio for training and testing, and the network was trained using data augmentation and cross-validation strategies. The proposed model achieved 92.33% accuracy in the three-class classification task, and 97.61%, 95.83%, and 93.42% accuracy in binary classifications of AD vs. CN, AD vs. MCI, and CN vs. MCI, respectively, outperforming existing state-of-the-art methods. Furthermore, Grad-CAM heatmaps and 3D MRI reconstructions revealed that the cerebral cortex and hippocampus are critical regions for AD classification. These findings demonstrate a robust and interpretable AI-based diagnostic framework for AD, providing valuable technical support for its timely detection and clinical intervention.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Enhancing noninvasive pancreatic cystic neoplasm diagnosis with multimodal machine learning.

Huang W, Xu Y, Li Z, Li J, Chen Q, Huang Q, Wu Y, Chen H

•papers•May 12 2025

Pancreatic cystic neoplasms (PCNs) are a complex group of lesions with a spectrum of malignancy. Accurate differentiation of PCN types is crucial for patient management, as misdiagnosis can result in unnecessary surgeries or treatment delays, affecting the quality of life. The significance of developing a non-invasive, accurate diagnostic model is underscored by the need to improve patient outcomes and reduce the impact of these conditions. We developed a machine learning model capable of accurately identifying different types of PCNs in a non-invasive manner, by using a dataset comprising 449 MRI and 568 CT scans from adult patients, spanning from 2009 to 2022. The study's results indicate that our multimodal machine learning algorithm, which integrates both clinical and imaging data, significantly outperforms single-source data algorithms. Specifically, it demonstrated state-of-the-art performance in classifying PCN types, achieving an average accuracy of 91.2%, precision of 91.7%, sensitivity of 88.9%, and specificity of 96.5%. Remarkably, for patients with mucinous cystic neoplasms (MCNs), regardless of undergoing MRI or CT imaging, the model achieved a 100% prediction accuracy rate. It indicates that our non-invasive multimodal machine learning model offers strong support for the early screening of MCNs, and represents a significant advancement in PCN diagnosis for improving clinical practice and patient outcomes. We also achieved the best results on an additional pancreatic cancer dataset, which further proves the generality of our model.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Automatic CTA analysis for blood vessels and aneurysm features extraction in EVAR planning.

Robbi E, Ravanelli D, Allievi S, Raunig I, Bonvini S, Passerini A, Trianni A

•papers•May 12 2025

Endovascular Aneurysm Repair (EVAR) is a minimally invasive procedure crucial for treating abdominal aortic aneurysms (AAA), where precise pre-operative planning is essential. Current clinical methods rely on manual measurements, which are time-consuming and prone to errors. Although AI solutions are increasingly being developed to automate aspects of these processes, most existing approaches primarily focus on computing volumes and diameters, falling short of delivering a fully automated pre-operative analysis. This work presents BRAVE (Blood Vessels Recognition and Aneurysms Visualization Enhancement), the first comprehensive AI-driven solution for vascular segmentation and AAA analysis using pre-operative CTA scans. BRAVE offers exhaustive segmentation, identifying both the primary abdominal aorta and secondary vessels, often overlooked by existing methods, providing a complete view of the vascular structure. The pipeline performs advanced volumetric analysis of the aneurysm sac, quantifying thrombotic tissue and calcifications, and automatically identifies the proximal and distal sealing zones, critical for successful EVAR procedures. BRAVE enables fully automated processing, reducing manual intervention and improving clinical workflow efficiency. Trained on a multi-center open-access dataset, it demonstrates generalizability across different CTA protocols and patient populations, ensuring robustness in diverse clinical settings. This solution saves time, ensures precision, and standardizes the process, enhancing vascular surgeons' decision-making.

CT Segmentation Abdominal Methodology In Silico Academic Lab Open Dataset

Filter Papers

Tags

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

JSover: Joint Spectrum Estimation and Multi-Material Decomposition from Single-Energy CT Projections

Accelerating prostate rs-EPI DWI with deep learning: Halving scan time, enhancing image quality, and validating in vivo.

Prognostic Value Of Deep Learning Based RCA PCAT and Plaque Volume Beyond CT-FFR In Patients With Stent Implantation.

Automated field-in-field planning for tangential breast radiation therapy based on digitally reconstructed radiograph.

Paradigm-Shifting Attention-based Hybrid View Learning for Enhanced Mammography Breast Cancer Classification with Multi-Scale and Multi-View Fusion.

Use of Artificial Intelligence in Recognition of Fetal Open Neural Tube Defect on Prenatal Ultrasound.

MRI-Based Diagnostic Model for Alzheimer's Disease Using 3D-ResNet.

Enhancing noninvasive pancreatic cystic neoplasm diagnosis with multimodal machine learning.

Automatic CTA analysis for blood vessels and aneurysm features extraction in EVAR planning.

Ready to Sharpen Your Edge?