Latest Papers on Radiology AI. Tags: Abdominal

An end-to-end interpretable machine-learning-based framework for early-stage diagnosis of gallbladder cancer using multi-modality medical data.

Zhao H, Miao C, Zhu Y, Shu Y, Wu X, Yin Z, Deng X, Gong W, Yang Z, Zou W

•papers•Jul 16 2025

The accurate early-stage diagnosis of gallbladder cancer (GBC) is regarded as one of the major challenges in the field of oncology. However, few studies have focused on the comprehensive classification of GBC based on multiple modalities. This study aims to develop a comprehensive diagnostic framework for GBC based on both imaging and non-imaging medical data. This retrospective study reviewed 298 clinical patients with gallbladder disease or volunteers from two devices. A novel end-to-end interpretable diagnostic framework for GBC is proposed to handle multiple medical modalities, including CT imaging, demographics, tumor markers, coagulation function tests, and routine blood tests. To achieve better feature extraction and fusion of the imaging modality, a novel global-hybrid-local network, namely GHL-Net, has also been developed. The ensemble learning strategy is employed to fuse multi-modality data and obtain the final classification result. In addition, two interpretable methods are applied to help clinicians understand the model-based decisions. Model performance was evaluated through accuracy, precision, specificity, sensitivity, F1-score, area under the curve (AUC), and matthews correlation coefficient (MCC). In both binary and multi-class classification scenarios, the proposed method showed better performance compared to other comparison methods in both datasets. Especially in the binary classification scenario, the proposed method achieved the highest accuracy, sensitivity, specificity, precision, F1-score, ROC-AUC, PR-AUC, and MCC of 95.24%, 93.55%, 96.87%, 96.67%, 95.08%, 0.9591, 0.9636, and 0.9051, respectively. The visualization results obtained based on the interpretable methods also demonstrated a high clinical relevance of the intermediate decision-making processes. Ablation studies then provided an in-depth understanding of our methodology. The machine learning-based framework can effectively improve the accuracy of GBC diagnosis and is expected to have a more significant impact in other cancer diagnosis scenarios.

CT Classification Abdominal Retrospective Clinical In Silico GenAI

Automatic segmentation of liver structures in multi-phase MRI using variants of nnU-Net and Swin UNETR.

Raab F, Strotzer Q, Stroszczynski C, Fellner C, Einspieler I, Haimerl M, Lang EW

•papers•Jul 16 2025

Accurate segmentation of the liver parenchyma, portal veins, hepatic veins, and lesions from MRI is important for hepatic disease monitoring and treatment. Multi-phase contrast enhanced imaging is superior in distinguishing hepatic structures compared to single-phase approaches, but automated approaches for detailed segmentation of hepatic structures are lacking. This study evaluates deep learning architectures for segmenting liver structures from multi-phase Gd-EOB-DTPA-enhanced T1-weighted VIBE MRI scans. We utilized 458 T1-weighted VIBE scans of pathological livers, with 78 manually labeled for liver parenchyma, hepatic and portal veins, aorta, lesions, and ascites. An additional dataset of 47 labeled subjects was used for cross-scanner evaluation. Three models were evaluated using nested cross-validation: the conventional nnU-Net, the ResEnc nnU-Net, and the Swin UNETR. The late arterial phase was identified as the optimal fixed phase for co-registration. Both nnU-Net variants outperformed Swin UNETR across most tasks. The conventional nnU-Net achieved the highest segmentation performance for liver parenchyma (DSC: 0.97; 95% CI 0.97, 0.98), portal vein (DSC: 0.83; 95% CI 0.80, 0.87), and hepatic vein (DSC: 0.78; 95% CI 0.77, 0.80). Lesion and ascites segmentation proved challenging for all models, with the conventional nnU-Net performing best. This study demonstrates the effectiveness of deep learning, particularly nnU-Net variants, for detailed liver structure segmentation from multi-phase MRI. The developed models and preprocessing pipeline offer potential for improved liver disease assessment and surgical planning in clinical practice.

MRI Segmentation Abdominal Methodology In Silico

Comparative study of 2D vs. 3D AI-enhanced ultrasound for fetal crown-rump length evaluation in the first trimester.

Zhang Y, Huang Y, Chen C, Hu X, Pan W, Luo H, Huang Y, Wang H, Cao Y, Yi Y, Xiong Y, Ni D

•papers•Jul 16 2025

Accurate fetal growth evaluation is crucial for monitoring fetal health, with crown-rump length (CRL) being the gold standard for estimating gestational age and assessing growth during the first trimester. To enhance CRL evaluation accuracy and efficiency, we developed an artificial intelligence (AI)-based model (3DCRL-Net) using the 3D U-Net architecture for automatic landmark detection to achieve CRL plane localization and measurement in 3D ultrasound. We then compared its performance to that of experienced radiologists using both 2D and 3D ultrasound for fetal growth assessment. This prospective consecutive study collected fetal data from 1,326 ultrasound screenings conducted at 11-14 weeks of gestation (June 2021 to June 2023). Three experienced radiologists performed fetal screening using 2D video (2D-RAD) and 3D volume (3D-RAD) to obtain the CRL plane and measurement. The 3DCRL-Net model automatically outputs the landmark position, CRL plane localization and measurement. Three specialists audited the planes achieved by radiologists and 3DCRL-Net as standard or non-standard. The performance of CRL landmark detection, plane localization, measurement and time efficiency was evaluated in the internal testing dataset, comparing results with 3D-RAD. In the external dataset, CRL plane localization, measurement accuracy, and time efficiency were compared among the three groups. The internal dataset consisted of 126 cases in the testing set (training: validation: testing = 8:1:1), and the external dataset included 245 cases. On the internal testing set, 3DCRL-Net achieved a mean absolute distance error of 1.81 mm for the nine landmarks, higher accuracy in standard plane localization compared to 3D-RAD (91.27% vs. 80.16%), and strong consistency in CRL measurements (mean absolute error (MAE): 1.26 mm; mean difference: 0.37 mm, P = 0.70). The average time required per fetal case was 2.02 s for 3DCRL-Net versus 2 min for 3D-RAD (P < 0.001). On the external testing dataset, 3DCRL-Net demonstrated high performance in standard plane localization, achieving results comparable to 2D-RAD and 3D-RAD (accuracy: 91.43% vs. 93.06% vs. 86.12%), with strong consistency in CRL measurements, compared to 2D-RAD, which showed an MAE of 1.58 mm and a mean difference of 1.12 mm (P = 0.25). For 2D-RAD vs. 3DCRL-Net, the Pearson correlation and R² were 0.96 and 0.93, respectively, with an MAE of 0.11 ± 0.12 weeks. The average time required per fetal case was 5 s for 3DCRL-Net, compared to 2 min for 3D-RAD and 35 s for 2D-RAD (P < 0.001). The 3DCRL-Net model provides a rapid, accurate, and fully automated solution for CRL measurement in 3D ultrasound, achieving expert-level performance and significantly improving the efficiency and reliability of first-trimester fetal growth assessment.

Ultrasound Detection Abdominal Prospective Clinical Pilot Academic Lab

Deep learning for appendicitis: development of a three-dimensional localization model on CT.

Takaishi T, Kawai T, Kokubo Y, Fujinaga T, Ojio Y, Yamamoto T, Hayashi K, Owatari Y, Ito H, Hiwatashi A

•papers•Jul 16 2025

To develop and evaluate a deep learning model for detecting appendicitis on abdominal CT. This retrospective single-center study included 567 CTs of appendicitis patients (330 males, age range 20-96) obtained between 2011 and 2020, randomly split into training (n = 517) and validation (n = 50) sets. The validation set was supplemented with 50 control CTs performed for acute abdomen. For a test dataset, 100 appendicitis CTs and 100 control CTs were consecutively collected from a separate period after 2021. Exclusion criteria included age < 20, perforation, unclear appendix, and appendix tumors. Appendicitis CTs were annotated with three-dimensional bounding boxes that encompassed inflamed appendices. CT protocols were unenhanced, 5-mm slice-thickness, 512 × 512 pixel matrix. The deep learning algorithm was based on faster region convolutional neural network (Faster R-CNN). Two board-certified radiologists visually graded model predictions on the test dataset using a 5-point Likert scale (0: no detection, 1: false, 2: poor, 3: fair, 4: good), with scores ≥ 3 considered true positives. Inter-rater agreement was assessed using weighted kappa statistics. The effects of intra-abdominal fat, periappendiceal fat-stranding, presence of appendicolith, and appendix diameter on the model's recall were analyzed using binary logistic regression. The model showed a precision of 0.66 (87/132), a recall of 0.87 (87/100), and a false-positive rate per patient of 0.23 (45/200). The inter-rater agreement for Likert scores of 2-4 was κ = 0.76. The logistic regression analysis showed that only intra-abdominal fat had a significant impact on the model's precision (p = 0.02). We developed a model capable of detecting appendicitis on CT with a three-dimensional bounding box.

CT Detection Abdominal Retrospective Clinical In Silico Academic Lab

Identifying Signatures of Image Phenotypes to Track Treatment Response in Liver Disease

Matthias Perkonigg, Nina Bastati, Ahmed Ba-Ssalamah, Peter Mesenbrink, Alexander Goehler, Miljen Martic, Xiaofei Zhou, Michael Trauner, Georg Langs

•preprint•Jul 16 2025

Quantifiable image patterns associated with disease progression and treatment response are critical tools for guiding individual treatment, and for developing novel therapies. Here, we show that unsupervised machine learning can identify a pattern vocabulary of liver tissue in magnetic resonance images that quantifies treatment response in diffuse liver disease. Deep clustering networks simultaneously encode and cluster patches of medical images into a low-dimensional latent space to establish a tissue vocabulary. The resulting tissue types capture differential tissue change and its location in the liver associated with treatment response. We demonstrate the utility of the vocabulary on a randomized controlled trial cohort of non-alcoholic steatohepatitis patients. First, we use the vocabulary to compare longitudinal liver change in a placebo and a treatment cohort. Results show that the method identifies specific liver tissue change pathways associated with treatment, and enables a better separation between treatment groups than established non-imaging measures. Moreover, we show that the vocabulary can predict biopsy derived features from non-invasive imaging data. We validate the method on a separate replication cohort to demonstrate the applicability of the proposed method.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multi-DECT image-based radiomics with interpretable machine learning for preoperative prediction of tumor budding grade and prognosis in colorectal cancer: a dual-center study.

Lin G, Chen W, Chen Y, Cao J, Mao W, Xia S, Chen M, Xu M, Lu C, Ji J

•papers•Jul 16 2025

This study evaluates the predictive ability of multiparametric dual-energy computed tomography (multi-DECT) radiomics for tumor budding (TB) grade and prognosis in patients with colorectal cancer (CRC). This study comprised 510 CRC patients at two institutions. The radiomics features of multi-DECT images (including polyenergetic, virtual monoenergetic, iodine concentration [IC], and effective atomic number images) were screened to build radiomics models utilizing nine machine learning (ML) algorithms. An ML-based fusion model comprising clinical-radiological variables and radiomics features was developed. The assessment of model performance was conducted through the area under the receiver operating characteristic curve (AUC), while the model's interpretability was assessed by shapley additive explanation (SHAP). The prognostic significance of the fusion model was determined via survival analysis. The CT-reported lymph node status and normalized IC were used to develop a clinical-radiological model. Among the nine examined ML algorithms, the extreme gradient boosting (XGB) algorithm performed best. The XGB-based fusion model containing multi-DECT radiomics features outperformed the clinical-radiological model in predicting TB grade, demonstrating superior AUCs of 0.969 in the training cohort, 0.934 in the internal validation cohort, and 0.897 in the external validation cohort. The SHAP analysis identified variables influencing model predictions. Patients with a model-predicted high TB grade had worse recurrence-free survival (RFS) in both the training (P < 0.001) and internal validation (P = 0.016) cohorts. The XGB-based fusion model using multi-DECT radiomics could serve as a non-invasive tool to predict TB grade and RFS in patients with CRC preoperatively.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multimodal Large Language Model With Knowledge Retrieval Using Flowchart Embedding for Forming Follow-Up Recommendations for Pancreatic Cystic Lesions.

Zhu Z, Liu J, Hong CW, Houshmand S, Wang K, Yang Y

•papers•Jul 16 2025

BACKGROUND. The American College of Radiology (ACR) Incidental Findings Committee (IFC) algorithm provides guidance for pancreatic cystic lesion (PCL) management. Its implementation using plain-text large language model (LLM) solutions is challenging given that key components include multimodal data (e.g., figures and tables). OBJECTIVE. The purpose of the study is to evaluate a multimodal LLM approach incorporating knowledge retrieval using flowchart embedding for forming follow-up recommendations for PCL management. METHODS. This retrospective study included patients who underwent abdominal CT or MRI from September 1, 2023, to September 1, 2024, and whose report mentioned a PCL. The reports' Findings sections were inputted to a multimodal LLM (GPT-4o). For task 1 (198 patients: mean age, 69.0 ± 13.0 [SD] years; 110 women, 88 men), the LLM assessed PCL features (presence of PCL, PCL size and location, presence of main pancreatic duct communication, presence of worrisome features or high-risk stigmata) and formed a follow-up recommendation using three knowledge retrieval methods (default knowledge, plain-text retrieval-augmented generation [RAG] from the ACR IFC algorithm PDF document, and flowchart embedding using the LLM's image-to-text conversion for in-context integration of the document's flowcharts and tables). For task 2 (85 patients: mean initial age, 69.2 ± 10.8 years; 48 women, 37 men), an additional relevant prior report was inputted; the LLM assessed for interval PCL change and provided an adjusted follow-up schedule accounting for prior imaging using flowchart embedding. Three radiologists assessed LLM accuracy in task 1 for PCL findings in consensus and follow-up recommendations independently; one radiologist assessed accuracy in task 2. RESULTS. For task 1, the LLM with flowchart embedding had accuracy for PCL features of 98.0-99.0%. The accuracy of the LLM follow-up recommendations based on default knowledge, plain-text RAG, and flowchart embedding for radiologist 1 was 42.4%, 23.7%, and 89.9% (p < .001), respectively; radiologist 2 was 39.9%, 24.2%, and 91.9% (p < .001); and radiologist 3 was 40.9%, 25.3%, and 91.9% (p < .001). For task 2, the LLM using flowchart embedding showed an accuracy for interval PCL change of 96.5% and for adjusted follow-up schedules of 81.2%. CONCLUSION. Multimodal flowchart embedding aided the LLM's automated provision of follow-up recommendations adherent to a clinical guidance document. CLINICAL IMPACT. The framework could be extended to other incidental findings through the use of other clinical guidance documents as the model input.

Mixed Modality LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Evaluating Artificial Intelligence-Assisted Prostate Biparametric MRI Interpretation: An International Multireader Study.

Gelikman DG, Yilmaz EC, Harmon SA, Huang EP, An JY, Azamat S, Law YM, Margolis DJA, Marko J, Panebianco V, Esengur OT, Lin Y, Belue MJ, Gaur S, Bicchetti M, Xu Z, Tetreault J, Yang D, Xu D, Lay NS, Gurram S, Shih JH, Merino MJ, Lis R, Choyke PL, Wood BJ, Pinto PA, Turkbey B

•papers•Jul 16 2025

Background: Variability in prostate biparametric MRI (bpMRI) interpretation limits diagnostic reliability for prostate cancer (PCa). Artificial intelligence (AI) has potential to reduce this variability and improve diagnostic accuracy. Objective: The objective of this study was to evaluate impact of a deep learning AI model on lesion- and patient-level clinically significant PCa (csPCa) and PCa detection rates and interreader agreement in bpMRI interpretations. Methods: This retrospective, multireader, multicenter study used a balanced incomplete block design for MRI randomization. Six radiologists of varying experience interpreted bpMRI scans with and without AI assistance in alternating sessions. The reference standard for lesion-level detection for cases was whole-mount pathology after radical prostatectomy; for control patients, negative 12-core systematic biopsies. In all, 180 patients (120 in the case group, 60 in the control group) who underwent mpMRI and prostate biopsy or radical prostatectomy between January 2013 and December 2022 were included. Lesion-level sensitivity, PPV, patient-level AUC for csPCa and PCa detection, and interreader agreement in lesion-level PI-RADS scores and size measurements were assessed. Results: AI assistance improved lesion-level PPV (PI-RADS ≥ 3: 77.2% [95% CI, 71.0-83.1%] vs 67.2% [61.1-72.2%] for csPCa; 80.9% [75.2-85.7%] vs 69.4% [63.4-74.1%] for PCa; both p < .001), reduced lesion-level sensitivity (PIRADS ≥ 3: 44.4% [38.6-50.5%] vs 48.0% [42.0-54.2%] for csPCa, p = .01; 41.7% [37.0-47.4%] vs 44.9% [40.5-50.2%] for PCa, p = .01), and no difference in patient-level AUC (0.822 [95% CI, 0.768-0.866] vs 0.832 [0.787-0.868] for csPCa, p = .61; 0.833 [0.782-0.874] vs 0.835 [0.792-0.871] for PCa, p = .91). AI assistance improved interreader agreement for lesion-level PI-RADS scores (κ = 0.748 [95% CI, 0.701-0.796] vs 0.336 [0.288-0.381], p < .001), lesion size measurements (coverage probability of 0.397 [0.376-0.419] vs 0.367 [0.349-0.383], p < .001), and patient-level PI-RADS scores (κ = 0.704 [0.627-0.767] versus 0.507 [0.421-0.584], p < .001). Conclusion: AI improved lesion-level PPV and interreader agreement with slightly lower lesion-level sensitivity. Clinical Impact: AI may enhance consistency and reduce false-positives in bpMRI interpretations. Further optimization is required to improve sensitivity without compromising specificity.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

General Purpose and Reasoning Large Language Models for Automated Abdominal CT Protocoling.

Sacoransky E, Azizi N, Yu E, Scott C, Kwan BYM, Soboleski D, Chung AD

•papers•Jul 16 2025

CT LLM Radiology Report Abdominal

Automated microvascular invasion prediction of hepatocellular carcinoma via deep relation reasoning from dynamic contrast-enhanced ultrasound.

Wang Y, Xie W, Li C, Xu Q, Du Z, Zhong Z, Tang L

•papers•Jul 16 2025

Hepatocellular carcinoma (HCC) is a major global health concern, with microvascular invasion (MVI) being a critical prognostic factor linked to early recurrence and poor survival. Preoperative MVI prediction remains challenging, but recent advancements in dynamic contrast-enhanced ultrasound (CEUS) imaging combined with artificial intelligence show promise in improving prediction accuracy. CEUS offers real-time visualization of tumor vascularity, providing unique insights into MVI characteristics. This study proposes a novel deep relation reasoning approach to address the challenges of modeling intricate temporal relationships and extracting complex spatial features from CEUS video frames. Our method integrates CEUS video sequences and introduces a visual graph reasoning framework that correlates intratumoral and peritumoral features across various imaging phases. The system employs dual-path feature extraction, MVI pattern topology construction, Graph Convolutional Network learning, and an MVI pattern discovery module to capture complex features while providing interpretable results. Experimental findings demonstrate that our approach surpasses existing state-of-the-art models in accuracy, sensitivity, and specificity for MVI prediction. The system achieved superiors accuracy, sensitivity, specificity and AUC. These advancements promise to enhance HCC diagnosis and management, potentially revolutionizing patient care. The method's robust performance, even with limited data, underscores its potential for practical clinical application in improving the efficacy and efficiency of HCC patient diagnosis and treatment planning.

Ultrasound Classification Abdominal Methodology In Silico

Filter Papers

Tags

An end-to-end interpretable machine-learning-based framework for early-stage diagnosis of gallbladder cancer using multi-modality medical data.

Automatic segmentation of liver structures in multi-phase MRI using variants of nnU-Net and Swin UNETR.

Comparative study of 2D vs. 3D AI-enhanced ultrasound for fetal crown-rump length evaluation in the first trimester.

Deep learning for appendicitis: development of a three-dimensional localization model on CT.

Identifying Signatures of Image Phenotypes to Track Treatment Response in Liver Disease

Multi-DECT image-based radiomics with interpretable machine learning for preoperative prediction of tumor budding grade and prognosis in colorectal cancer: a dual-center study.

Multimodal Large Language Model With Knowledge Retrieval Using Flowchart Embedding for Forming Follow-Up Recommendations for Pancreatic Cystic Lesions.

Evaluating Artificial Intelligence-Assisted Prostate Biparametric MRI Interpretation: An International Multireader Study.

General Purpose and Reasoning Large Language Models for Automated Abdominal CT Protocoling.

Automated microvascular invasion prediction of hepatocellular carcinoma via deep relation reasoning from dynamic contrast-enhanced ultrasound.

Ready to Sharpen Your Edge?