AI-driven software for automated quantification of skeletal metastases and treatment response evaluation using whole-body diffusion-weighted MRI (WB-DWI) in advanced prostate cancer.
Authors
Affiliations (5)
Affiliations (5)
- The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND.
- The Royal Marsden NHS Foundation Trust, 203 Fulham Rd, London, England, SW3 6JJ, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND.
- Istituto Europeo di Oncologia, Via Giuseppe Ripamonti, 435, Milan, Lombardy, 20141, ITALY.
- Universita Cattolica del Sacro Cuore Facolta di Medicina e Chirurgia, Largo Francesco Vito, Rome, Lazio, 00168, ITALY.
- University Hospital Basel, Petersgraben 4, Basel, BS, 4031, SWITZERLAND.
Abstract
Quantitative assessment of treatment response in Advanced Prostate Cancer (APC) with bone metastases remains an unmet clinical need. Whole-Body Diffusion-Weighted MRI (WB-DWI) provides two response biomarkers: Total Diffusion Volume (TDV) and global Apparent Diffusion Coefficient (gADC). However, tracking post-treatment changes of TDV and gADC from manually delineated lesions is cumbersome and increases inter-reader variability. We developed a software to automate this process.

Approach: Core technologies include: (i) a weakly-supervised Residual U-Net model generating a skeleton probability map to isolate bone; (ii) a statistical framework for WB-DWI intensity normalisation, obtaining a signal-normalised b=900s/mm² (b900) image; and (iii) a shallow convolutional neural network that processes outputs from (i) and (ii) to generate a mask of suspected bone lesions, characterised by higher b900 signal intensity due to restricted water diffusion. This mask is applied to the gADC map to extract TDV and gADC statistics. We tested the tool using expert-defined metastatic bone disease delineations on 66 datasets, assessed repeatability of imaging biomarkers (N=10), and compared software-based response assessment with a construct reference standard, defined as multidisciplinary consensus based on ≥12 months of imaging, clinical, and laboratory follow-up (N=118).

Main results: Average dice score between manual and automated delineations was 0.6 for lesions within pelvis and spine, with an average surface distance of 2mm. Relative differences for log-transformed TDV (log-TDV) and median gADC were 8.8% and 5%, respectively. Repeatability analysis showed coefficients of variation of 4.6% for log-TDV and 3.5% for median gADC, with intraclass correlation coefficients of 0.94 or higher. The software achieved 80.5% accuracy, 84.3% sensitivity, and 85.7% specificity in assessing response to treatment. Average computation time was 90s per scan. 

Significance: Our software enables reproducible TDV and gADC quantification from WB-DWI scans for monitoring metastatic bone disease response, thus providing potentially useful measurements for clinical decision-making in APC patients
.