Multimodal artificial intelligence models for liver fibrosis staging: a scoping review.
Authors
Affiliations (3)
Affiliations (3)
- Department of Gastroenterological Surgery I, Hokkaido University Graduate School of Medicine, Sapporo, Japan.
- Department of Diagnostic Imaging, Hokkaido University Graduate School of Medicine, Sapporo, Japan.
- Department of Gastroenterological Surgery I, Hokkaido University Graduate School of Medicine, Sapporo, Japan. [email protected].
Abstract
Multimodal artificial intelligence (AI) approaches integrating heterogeneous data sources represent an emerging frontier in liver fibrosis assessment. However, use of multimodal AI for liver fibrosis staging has been only preliminarily explored, and the existing evidence is constrained by substantial methodological gaps. This scoping review aimed to comprehensively map the current evidence on multimodal AI models that integrate medical imaging with other data categories for predicting liver fibrosis stage. Following the Joanna Briggs Institute methodology and PRISMA-ScR guidelines, we searched MEDLINE, Web of Science, CENTRAL, and IEEE Xplore on August 12, 2025. Studies developing AI or machine learning models for liver fibrosis prediction integrating at least one imaging modality with heterogeneous data categories (e.g., clinical parameters or serum biomarkers) were included. Three reviewers independently screened records, and extracted data were independently verified by two additional reviewers. Of 2,849 records, 21 studies met the eligibility criteria, yielding 34 distinct multimodal AI models. Research was geographically concentrated in China (81%) and predominantly focused on hepatitis B-related liver disease. CT-based radiomics combined with serum biomarkers represented the most common approach, whereas deep learning architectures were less frequently applied. Across 107 AUC evaluations, the median AUC was 0.890 (interquartile range 0.850-0.925). External-validation AUCs (12 evaluations from 6 studies) ranged 0.808-0.990; 3 internal-test AUCs from a single study fell below 0.70. However, external validation was reported for only 20.6% of models, with calibration and decision curve analysis reported in 23.1% and 24.1% of evaluations, respectively. This scoping review revealed a nascent field with encouraging diagnostic performance but with substantial gaps in external validation, calibration reporting, and clinical utility assessment. Future research should prioritize methodologically rigorous validation and evaluate the impact on clinical decision-making.