A computationally frugal, open-source chest CT foundation model for thoracic disease detection in lung cancer screening programmes.
Authors
Affiliations (11)
Affiliations (11)
- Hawkes Institute, University College London, London, UK. [email protected].
- Department of Computer Science, University College London, London, UK. [email protected].
- Institute of Health Informatics, University College London, London, UK. [email protected].
- Hawkes Institute, University College London, London, UK.
- Department of Computer Science, University College London, London, UK.
- Institute of Health Informatics, University College London, London, UK.
- Department of Respiratory Medicine, University College London, London, UK.
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK.
- Department of Medical Physics and Biomedical Engineering, University College London, London, UK.
- Hawkes Institute, University College London, London, UK. [email protected].
- Department of Respiratory Medicine, University College London, London, UK. [email protected].
Abstract
Low-dose computed tomography (LDCT) employed in lung cancer screening (LCS) programmes is increasing in uptake worldwide. LCS programmes herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease, yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. The model is pretrained using self-supervised learning on more than 98,000 thoracic LDCT scans, including the United Kingdom's largest LCS initiative to date and 27 public datasets. By extending a masked autoencoder framework to three-dimensional imaging, TANGERINE provides a scalable solution for LDCT analysis, combining architectural simplicity, public availability, and modest computational requirements. TANGERINE demonstrates superior computational and data efficiency in a retrospective multi-dataset analysis: it converges rapidly during fine-tuning, requiring significantly fewer graphics processing unit hours than models trained from scratch, and achieves comparable or superior performance using only a fraction of the fine-tuning data. The model achieves strong performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, and generalises robustly across diverse clinical centres. TANGERINE's accessible, open-source, lightweight design lays the foundation for rapid integration into next-generation medical imaging tools, enabling lung cancer screening programmes to pivot from a singular focus on lung cancer detection toward comprehensive respiratory disease management in high-risk populations.