Latest Papers on Radiology AI. Tags: In Silico

Benchmarking Radiology Report Generation From Noisy Free-Texts.

Yuan Y, Zheng Y, Qu L

•papers•May 12 2025

Automatic radiology report generation can enhance diagnostic efficiency and accuracy. However, clean open-source imaging scan-report pairs are limited in scale and variety. Moreover, the vast amount of radiological texts available online is often too noisy to be directly employed. To address this challenge, we introduce a novel task called Noisy Report Refinement (NRR), which generates radiology reports from noisy free-texts. To achieve this, we propose a report refinement pipeline that leverages large language models (LLMs) enhanced with guided self-critique and report selection strategies. To address the inability of existing radiology report generation metrics in measuring cleanliness, radiological usefulness, and factual correctness across various modalities of reports in NRR task, we introduce a new benchmark, NRRBench, for NRR evaluation. This benchmark includes two online-sourced datasets and four clinically explainable LLM-based metrics: two metrics evaluate the matching rate of radiology entities and modality-specific template attributes respectively, one metric assesses report cleanliness, and a combined metric evaluates overall NRR performance. Experiments demonstrate that guided self-critique and report selection strategies significantly improve the quality of refined reports. Additionally, our proposed metrics show a much higher correlation with noisy rate and error count of reports than radiology report generation metrics in evaluating NRR.

Mixed Modality LLM Radiology Report Methodology In Silico Benchmark SOTA GenAI

Automatic Quantification of Ki-67 Labeling Index in Pediatric Brain Tumors Using QuPath

Spyretos, C., Pardo Ladino, J. M., Blomstrand, H., Nyman, P., Snodahl, O., Shamikh, A., Elander, N. O., Haj-Hosseini, N.

•preprint•May 12 2025

AO_SCPLOWBSTRACTC_SCPLOWThe quantification of the Ki-67 labeling index (LI) is critical for assessing tumor proliferation and prognosis in tumors, yet manual scoring remains a common practice. This study presents an automated workflow for Ki-67 scoring in whole slide images (WSIs) using an Apache Groovy code script for QuPath, complemented by a Python-based post-processing script, providing cell density maps and summary tables. The tissue and cell segmentation are performed using StarDist, a deep learning model, and adaptive thresholding to classify Ki-67 positive and negative nuclei. The pipeline was applied to a cohort of 632 pediatric brain tumor cases with 734 Ki-67-stained WSIs from the Childrens Brain Tumor Network. Medulloblastoma showed the highest Ki-67 LI (median: 19.84), followed by atypical teratoid rhabdoid tumor (median: 19.36). Moderate values were observed in brainstem glioma-diffuse intrinsic pontine glioma (median: 11.50), high-grade glioma (grades 3 & 4) (median: 9.50), and ependymoma (median: 5.88). Lower indices were found in meningioma (median: 1.84), while the lowest were seen in low-grade glioma (grades 1 & 2) (median: 0.85), dysembryoplastic neuroepithelial tumor (median: 0.63), and ganglioglioma (median: 0.50). The results aligned with the consensus of the oncology, demonstrating a significant correlation in Ki-67 LI across most of the tumor families/types, with high malignancy tumors showing the highest proliferation indices and lower malignancy tumors exhibiting lower Ki-67 LI. The automated approach facilitates the assessment of large amounts of Ki-67 WSIs in research settings.

Mixed Modality Segmentation Neurological Retrospective Clinical In Silico Academic Lab Open Code

Automated scout-image-based estimation of contrast agent dosing: a deep learning approach

Schirrmeister, R., Taleb, L., Friemel, P., Reisert, M., Bamberg, F., Weiss, J., Rau, A.

•preprint•May 12 2025

We developed and tested a deep-learning-based algorithm for the approximation of contrast agent dosage based on computed tomography (CT) scout images. We prospectively enrolled 817 patients undergoing clinically indicated CT imaging, predominantly of the thorax and/or abdomen. Patient weight was collected by study staff prior to the examination 1) with a weight scale and 2) as self-reported. Based on the scout images, we developed an EfficientNet convolutional neural network pipeline to estimate the optimal contrast agent dose based on patient weight and provide a browser-based user interface as a versatile open-source tool to account for different contrast agent compounds. We additionally analyzed the body-weight-informative CT features by synthesizing representative examples for different weights using in-context learning and dataset distillation. The cohort consisted of 533 thoracic, 70 abdominal and 229 thoracic-abdominal CT scout scans. Self-reported patient weight was statistically significantly lower than manual measurements (75.13 kg vs. 77.06 kg; p < 10-5, Wilcoxon signed-rank test). Our pipeline predicted patient weight with a mean absolute error of 3.90 {+/-} 0.20 kg (corresponding to a roughly 4.48 - 11.70 ml difference in contrast agent depending on the agent) in 5-fold cross-validation and is publicly available at https://tinyurl.com/ct-scout-weight. Interpretability analysis revealed that both larger anatomical shape and higher overall attenuation were predictive of body weight. Our open-source deep learning pipeline allows for the automatic estimation of accurate contrast agent dosing based on scout images in routine CT imaging studies. This approach has the potential to streamline contrast agent dosing workflows, improve efficiency, and enhance patient safety by providing quick and accurate weight estimates without additional measurements or reliance on potentially outdated records. The models performance may vary depending on patient positioning and scout image quality and the approach requires validation on larger patient cohorts and other clinical centers. Author SummaryAutomation of medical workflows using AI has the potential to increase reproducibility while saving costs and time. Here, we investigated automating the estimation of the required contrast agent dosage for CT examinations. We trained a deep neural network to predict the body weight from the initial 2D CT Scout images that are required prior to the actual CT examination. The predicted weight is then converted to a contrast agent dosage based on contrast-agent-specific conversion factors. To facilitate application in clinical routine, we developed a user-friendly browser-based user interface that allows clinicians to select a contrast agent or input a custom conversion factor to receive dosage suggestions, with local data processing in the browser. We also investigate what image characteristics predict body weight and find plausible relationships such as higher attenuation and larger anatomical shapes correlating with higher body weights. Our work goes beyond prior work by implementing a single model for a variety of anatomical regions, providing an accessible user interface and investigating the predictive characteristics of the images.

CT Classification Prospective In Silico Academic Lab Open Code

LiteMIL: A Computationally Efficient Transformer-Based MIL for Cancer Subtyping on Whole Slide Images.

Kussaibi, H.

•preprint•May 12 2025

PurposeAccurate cancer subtyping is crucial for effective treatment; however, it presents challenges due to overlapping morphology and variability among pathologists. Although deep learning (DL) methods have shown potential, their application to gigapixel whole slide images (WSIs) is often hindered by high computational demands and the need for efficient, context-aware feature aggregation. This study introduces LiteMIL, a computationally efficient transformer-based multiple instance learning (MIL) network combined with Phikon, a pathology-tuned self-supervised feature extractor, for robust and scalable cancer subtyping on WSIs. MethodsInitially, patches were extracted from TCGA-THYM dataset (242 WSIs, six subtypes) and subsequently fed in real-time to Phikon for feature extraction. To train MILs, features were arranged into uniform bags using a chunking strategy that maintains tissue context while increasing training data. LiteMIL utilizes a learnable query vector within an optimized multi-head attention module for effective feature aggregation. The models performance was evaluated against established MIL methods on the Thymic Dataset and three additional TCGA datasets (breast, lung, and kidney cancer). ResultsLiteMIL achieved 0.89 {+/-} 0.01 F1 score and 0.99 AUC on Thymic dataset, outperforming other MILs. LiteMIL demonstrated strong generalizability across the external datasets, scoring the best on breast and kidney cancer datasets. Compared to TransMIL, LiteMIL significantly reduces training time and GPU memory usage. Ablation studies confirmed the critical role of the learnable query and layer normalization in enhancing performance and stability. ConclusionLiteMIL offers a resource-efficient, robust solution. Its streamlined architecture, combined with the compact Phikon features, makes it suitable for integrating into routine histopathological workflows, particularly in resource-limited settings.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.

Uldin H, Saran S, Gandikota G, Iyengar KP, Vaishya R, Parmar Y, Rasul F, Botchu R

•papers•May 12 2025

Artificial Intelligence (AI) has transformed society and chatbots using Large Language Models (LLM) are playing an increasing role in scientific research. This study aims to assess and compare the efficacy of newer DeepSeek R1 and ChatGPT-4 and 4o models in answering scientific questions about recent research. We compared output generated from ChatGPT-4, ChatGPT-4o, and DeepSeek-R1 in response to ten standardized questions in the setting of musculoskeletal (MSK) radiology. These were independently analyzed by one MSK radiologist and one final-year MSK radiology trainee and graded using a Likert scale from 1 to 5 (1 being inaccurate to 5 being accurate). Five DeepSeek answers were significantly inaccurate and provided fictitious references only on prompting. All ChatGPT-4 and 4o answers were well-written with good content, the latter including useful and comprehensive references. ChatGPT-4o generates structured research answers to questions on recent MSK radiology research with useful references in all our cases, enabling reliable usage. DeepSeek-R1 generates articles that, on the other hand, may appear authentic to the unsuspecting eye but contain a higher amount of falsified and inaccurate information in the current version. Further iterations may improve these accuracies.

Mixed Modality LLM Radiology Report Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI Reproducibility

AI-based volumetric six-tissue body composition quantification from CT cardiac attenuation scans for mortality prediction: a multicentre study.

Yi J, Marcinkiewicz AM, Shanbhag A, Miller RJH, Geers J, Zhang W, Killekar A, Manral N, Lemley M, Buchwald M, Kwiecinski J, Zhou J, Kavanagh PB, Liang JX, Builoff V, Ruddy TD, Einstein AJ, Feher A, Miller EJ, Sinusas AJ, Berman DS, Dey D, Slomka PJ

•papers•May 12 2025

CT attenuation correction (CTAC) scans are routinely obtained during cardiac perfusion imaging, but currently only used for attenuation correction and visual calcium estimation. We aimed to develop a novel artificial intelligence (AI)-based approach to obtain volumetric measurements of chest body composition from CTAC scans and to evaluate these measures for all-cause mortality risk stratification. We applied AI-based segmentation and image-processing techniques on CTAC scans from a large international image-based registry at four sites (Yale University, University of Calgary, Columbia University, and University of Ottawa), to define the chest rib cage and multiple tissues. Volumetric measures of bone, skeletal muscle, subcutaneous adipose tissue, intramuscular adipose tissue (IMAT), visceral adipose tissue (VAT), and epicardial adipose tissue (EAT) were quantified between automatically identified T5 and T11 vertebrae. The independent prognostic value of volumetric attenuation and indexed volumes were evaluated for predicting all-cause mortality, adjusting for established risk factors and 18 other body composition measures via Cox regression models and Kaplan-Meier curves. The end-to-end processing time was less than 2 min per scan with no user interaction. Between 2009 and 2021, we included 11 305 participants from four sites participating in the REFINE SPECT registry, who underwent single-photon emission computed tomography cardiac scans. After excluding patients who had incomplete T5-T11 scan coverage, missing clinical data, or who had been used for EAT model training, the final study group comprised 9918 patients. 5451 (55%) of 9918 participants were male and 4467 (45%) of 9918 participants were female. Median follow-up time was 2·48 years (IQR 1·46-3·65), during which 610 (6%) patients died. High VAT, EAT, and IMAT attenuation were associated with an increased all-cause mortality risk (adjusted hazard ratio 2·39, 95% CI 1·92-2·96; p<0·0001, 1·55, 1·26-1·90; p<0·0001, and 1·30, 1·06-1·60; p=0·012, respectively). Patients with high bone attenuation were at reduced risk of death (0·77, 0·62-0·95; p=0·016). Likewise, high skeletal muscle volume index was associated with a reduced risk of death (0·56, 0·44-0·71; p<0·0001). CTAC scans obtained routinely during cardiac perfusion imaging contain important volumetric body composition biomarkers that can be automatically measured and offer important additional prognostic value. The National Heart, Lung, and Blood Institute, National Institutes of Health.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab

Groupwise image registration with edge-based loss for low-SNR cardiac MRI.

Lei X, Schniter P, Chen C, Ahmad R

•papers•May 12 2025

The purpose of this study is to perform image registration and averaging of multiple free-breathing single-shot cardiac images, where the individual images may have a low signal-to-noise ratio (SNR). To address low SNR encountered in single-shot imaging, especially at low field strengths, we propose a fast deep learning (DL)-based image registration method, called Averaging Morph with Edge Detection (AiM-ED). AiM-ED jointly registers multiple noisy source images to a noisy target image and utilizes a noise-robust pre-trained edge detector to define the training loss. We validate AiM-ED using synthetic late gadolinium enhanced (LGE) images from the MR extended cardiac-torso (MRXCAT) phantom and free-breathing single-shot LGE images from healthy subjects (24 slices) and patients (5 slices) under various levels of added noise. Additionally, we demonstrate the clinical feasibility of AiM-ED by applying it to data from patients (6 slices) scanned on a 0.55T scanner. Compared with a traditional energy-minimization-based image registration method and DL-based VoxelMorph, images registered using AiM-ED exhibit higher values of recovery SNR and three perceptual image quality metrics. An ablation study shows the benefit of both jointly processing multiple source images and using an edge map in AiM-ED. For single-shot LGE imaging, AiM-ED outperforms existing image registration methods in terms of image quality. With fast inference, minimal training data requirements, and robust performance at various noise levels, AiM-ED has the potential to benefit single-shot CMR applications.

MRI Registration Cardiac Retrospective Clinical In Silico Academic Lab

Deep Learning for Detecting Periapical Bone Rarefaction in Panoramic Radiographs: A Systematic Review and Critical Assessment.

da Silva-Filho JE, da Silva Sousa Z, de-Araújo APC, Fornagero LDS, Machado MP, de Aguiar AWO, Silva CM, de Albuquerque DF, Gurgel-Filho ED

•papers•May 12 2025

To evaluate deep learning (DL)-based models for detecting periapical bone rarefaction (PBRs) in panoramic radiographs (PRs), analyzing their feasibility and performance in dental practice. A search was conducted across seven databases and partial grey literature up to November 15, 2024, using Medical Subject Headings and entry terms related to DL, PBRs, and PRs. Studies assessing DL-based models for detecting and classifying PBRs in conventional PRs were included, while those using non-PR imaging or focusing solely on non-PBR lesions were excluded. Two independent reviewers performed screening, data extraction, and quality assessment using the Quality Assessment of Diagnostic Accuracy Studies-2 tool, with conflicts resolved by a third reviewer. Twelve studies met the inclusion criteria, mostly from Asia (58.3%). The risk of bias was moderate in 10 studies (83.3%) and high in 2 (16.7%). DL models showed moderate to high performance in PBR detection (sensitivity: 26-100%; specificity: 51-100%), with U-NET and YOLO being the most used algorithms. Only one study (8.3%) distinguished Periapical Granuloma from Periapical Cysts, revealing a classification gap. Key challenges included limited generalization due to small datasets, anatomical superimpositions in PRs, and variability in reported metrics, compromising models comparison. This review underscores that DL-based has the potential to become a valuable tool in dental image diagnostics, but it cannot yet be considered a definitive practice. Multicenter collaboration is needed to diversify data and democratize those tools. Standardized performance reporting is critical for fair comparability between different models.

X-Ray Detection Review In Silico Academic Lab

Deep learning diagnosis of hepatic echinococcosis based on dual-modality plain CT and ultrasound images: a large-scale, multicenter, diagnostic study.

Zhang J, Zhang J, Tang H, Meng Y, Chen X, Chen J, Chen Y

•papers•May 12 2025

Given the current limited accuracy of imaging screening for Hepatic Echinococcosis (HCE) in under-resourced areas, the authors developed and validated a Multimodal Imaging system (HEAC) based on plain Computed Tomography (CT) combined with ultrasound for HCE screening in those areas. In this study, we developed a multimodal deep learning diagnostic system by integrating ultrasound and plain CT imaging data to differentiate hepatic echinococcosis, liver cysts, liver abscesses, and healthy liver conditions. We collected a dataset of 8979 cases spanning 18 years from eight hospitals in Xinjiang China, including both retrospective and prospective data. To enhance the robustness and generalization of the diagnostic model, after modeling CT and ultrasound images using EfficientNet3D and EfficientNet-B0, external and prospective tests were conducted, and the model's performance was compared with diagnoses made by experienced physicians. Across internal and external test sets, the fused model of CT and ultrasound consistently outperformed the individual modality models and physician diagnoses. In the prospective test set from the same center, the fusion model achieved an accuracy of 0.816, sensitivity of 0.849, specificity of 0.942, and an AUC of 0.963, significantly exceeding physician performance (accuracy 0.900, sensitivity 0.800, specificity 0.933). The external test sets across seven other centers demonstrated similar results, with the fusion model achieving an overall accuracy of 0.849, sensitivity of 0.859, specificity of 0.942, and AUC of 0.961. The multimodal deep learning diagnostic system that integrates CT and ultrasound significantly increases the diagnosis accuracy of HCE, liver cysts, and liver abscesses. It beats standard single-modal approaches and physician diagnoses by lowering misdiagnosis rates and increasing diagnostic reliability. It emphasizes the promise of multimodal imaging systems in tackling diagnostic issues in low-resource areas, opening the path for improved medical care accessibility and outcomes.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab

AutoFRS: an externally validated, annotation-free approach to computational preoperative complication risk stratification in pancreatic surgery - an experimental study.

Kolbinger FR, Bhasker N, Schön F, Cser D, Zwanenburg A, Löck S, Hempel S, Schulze A, Skorobohach N, Schmeiser HM, Klotz R, Hoffmann RT, Probst P, Müller B, Bodenstedt S, Wagner M, Weitz J, Kühn JP, Distler M, Speidel S

•papers•May 12 2025

The risk of postoperative pancreatic fistula (POPF), one of the most dreaded complications after pancreatic surgery, can be predicted from preoperative imaging and tabular clinical routine data. However, existing studies suffer from limited clinical applicability due to a need for manual data annotation and a lack of external validation. We propose AutoFRS (automated fistula risk score software), an externally validated end-to-end prediction tool for POPF risk stratification based on multimodal preoperative data. We trained AutoFRS on preoperative contrast-enhanced computed tomography imaging and clinical data from 108 patients undergoing pancreatic head resection and validated it on an external cohort of 61 patients. Prediction performance was assessed using the area under the receiver operating characteristic curve (AUC) and balanced accuracy. In addition, model performance was compared to the updated alternative fistula risk score (ua-FRS), the current clinical gold standard method for intraoperative POPF risk stratification. AutoFRS achieved an AUC of 0.81 and a balanced accuracy of 0.72 in internal validation and an AUC of 0.79 and a balanced accuracy of 0.70 in external validation. In a patient subset with documented intraoperative POPF risk factors, AutoFRS (AUC: 0.84 ± 0.05) performed on par with the uaFRS (AUC: 0.85 ± 0.06). The AutoFRS web application facilitates annotation-free prediction of POPF from preoperative imaging and clinical data based on the AutoFRS prediction model. POPF can be predicted from multimodal clinical routine data without human data annotation, automating the risk prediction process. We provide additional evidence of the clinical feasibility of preoperative POPF risk stratification and introduce a software pipeline for future prospective evaluation.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

Benchmarking Radiology Report Generation From Noisy Free-Texts.

Automatic Quantification of Ki-67 Labeling Index in Pediatric Brain Tumors Using QuPath

Automated scout-image-based estimation of contrast agent dosing: a deep learning approach

LiteMIL: A Computationally Efficient Transformer-Based MIL for Cancer Subtyping on Whole Slide Images.

A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.

AI-based volumetric six-tissue body composition quantification from CT cardiac attenuation scans for mortality prediction: a multicentre study.

Groupwise image registration with edge-based loss for low-SNR cardiac MRI.

Deep Learning for Detecting Periapical Bone Rarefaction in Panoramic Radiographs: A Systematic Review and Critical Assessment.

Deep learning diagnosis of hepatic echinococcosis based on dual-modality plain CT and ultrasound images: a large-scale, multicenter, diagnostic study.

AutoFRS: an externally validated, annotation-free approach to computational preoperative complication risk stratification in pancreatic surgery - an experimental study.

Ready to Sharpen Your Edge?