Quality over quantity: biopsy-anchored CT radiogenomics models outperform all-lesion training in a multi-tumour cohort despite a smaller sample size.
Authors
Affiliations (19)
Affiliations (19)
- Department of Radiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands.
- Department of Radiology, University Hospital Tuebingen, Tuebingen, Germany.
- Department of Radiology, American Hospital Tbilisi, Tbilisi, Georgia.
- Department of Radiology, Royal Marsden Hospital, London, UK.
- Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.
- Radiology Unit, Sant'Andrea Hospital, Sapienza University of Rome, Rome, Italy.
- Department of Radiology, Stanford University, Palo Alto, CA, USA.
- Department of Innovative Technologies in Medicine & Dentistry, G. d'Annunzio University of Chieti-Pescara, Chieti, Italy.
- Institute for Advanced Biomedical Technologies, G. d'Annunzio University of Chieti-Pescara, Chieti, Italy.
- Feinberg School of Medicine, Northwestern University, NMH/Arkes Family Pavilion Suite 800, 676 N Saint Clair, Chicago, IL, 60611, USA.
- Clinic of Radiology, Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale (EOC), 6900, Lugano, Switzerland.
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
- Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland.
- GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands. [email protected].
- The Netherlands Cancer Institute, Amsterdam, The Netherlands. [email protected].
- Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark. [email protected].
- Maastricht Radiation Oncology, Maastricht, The Netherlands. [email protected].
Abstract
Radiogenomics aims to non-invasively predict tumour genotypes from imaging, but most studies assume molecular homogeneity by assigning a single biopsy-derived label to all lesions within a patient. This approach risks substantial label noise given well-documented interlesional heterogeneity. We investigated whether anchoring training to biopsy-confirmed lesions improves radiogenomic model performance and generalisability. We retrospectively analysed 1646 patients (11473 segmented lesions) with contrast-enhanced CT and EGFR mutation status from next-generation sequencing at the Netherlands Cancer Institute, alongside an external NSCLC radiogenomics cohort (n = 158). All visible lesions were segmented, and the exact biopsy site was matched to its segmentation. Radiomic features were extracted, and machine learning models were trained with three lesion selection strategies: all lesions, non-biopsied lesions only, and biopsy-confirmed lesions only. To disentangle label quality from sample size, we created size-matched variants (one lesion per patient) for all-lesion and non-biopsied strategies. All models achieved significant discrimination of EGFR status on internal validation (AUC = 0.62-0.68). However, performance of the all-lesion and non-biopsied models declined on external validation (AUC = 0.55-0.63), while the biopsy-anchored model maintained stable performance (AUC = 0.62), despite having only 1/10th of the training sample size. When training sets were size-matched, the biopsy-anchored approach significantly outperformed a model trained on all available lesions on external validation (p = 0.037). Radiogenomic models trained on biopsy-confirmed lesions outperform conventional all-lesion strategies in external validation, despite using an order of magnitude fewer samples. Prioritising lesion-level label fidelity can mitigate heterogeneity-driven noise, enhancing robustness and clinical translation of imaging-based genomic prediction. Question Does assigning biopsy-derived molecular labels to all lesions introduce heterogeneity-driven label noise that reduces the generalisability of radiogenomic models? Findings Models trained exclusively on biopsy-confirmed lesions demonstrated superior external generalisability compared with all-lesion approaches, despite being trained on substantially fewer samples. Clinical relevance Biopsy-anchored radiogenomics improves the reliability of non-invasive mutation prediction by accounting for tumour heterogeneity, potentially supporting clinical decision-making when tissue sampling is limited or molecular results are discordant across lesions.