Harnessing Native-Resolution 2D Embeddings for Lung Cancer Classification: A Feasibility Study with the RAD-DINO Self-supervised Foundation Model.
Authors
Affiliations (3)
Affiliations (3)
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, AR, Little Rock, USA. [email protected].
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, AR, Little Rock, USA.
- Department of Neuroscience, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
Abstract
Low‑dose CT (LDCT) screening reduces lung cancer mortality but yields high false‑positive findings, motivating practical AI that aligns with slice‑based clinical review. We evaluate a label‑efficient 2D pipeline that uses frozen RAD‑DINO embeddings from native‑resolution axial slices and a lightweight multilayer perceptron for patient‑level risk estimation via mean aggregation and isotonic calibration. Using the NLST CT arm with outcomes defined over a 0-24‑month window, we construct a fixed patient‑level split (one CT per patient; no cross‑split leakage) and perform 25 repeated imbalanced test draws (~ 6% prevalence) to approximate screening conditions. At screening prevalence, RAD‑DINO + MLP achieves PR‑AUC = 0.705 (calibrated; raw 0.554) and ROC‑AUC = 0.817 (raw; 0.736 calibrated), with improved probability reliability following calibration; operating points are selected on validation and reported on test. For secondary ablations only, a near‑balanced cohort (N = 1984) yields accuracy 0.966, precision 0.974, recall 0.973, F1 0.973, and ROC‑AUC 0.912. Beyond classification, retrieval with triplet‑fine‑tuned embeddings attains Precision@5 = 0.853. Interpretability analyses show that cancer cases sustain higher top‑k slice scores and that directional SHAP concentrates on a small subset of high‑probability slices; label‑colored t‑SNE provides qualitative views of embedding structure. Limitations include single‑cohort evaluation, lack of Lung‑RADS labels in public NLST, and a CXR → CT pretraining shift; future work will pursue external validation and CT‑native self‑supervised continuation. Overall, frozen 2D foundation embeddings provide a strong, transparent, and computationally practical starting point for LDCT screening workflows under realistic prevalence.