Reliability Assessment Framework Based on Feature Separability for Pathological Cell Image Classification under Prior Bias
Authors
Abstract
\textbf{Background and objective:} Prior probability shift between training and deployment datasets challenges deep learning--based medical image classification. Standard correction methods reweight posterior probabilities to adjust prior bias, yet their benefit is inconsistent. We developed a reliability framework identifying when prior correction helps or harms performance in pathological cell image analysis. \textbf{Methods:} We analyzed 303 colorectal cancer specimens with CD103/CD8 immunostaining, yielding 185{,}432 annotated cell images across 16 cell types. ResNet models were trained under varying bias ratios (1.1--20$\times$). Feature separability was quantified using cosine similarity--based likelihood quality scores, reflecting intra- versus inter-class distinctions in learned feature spaces. Multiple linear regression, ANOVA, and generalized additive models (GAMs) evaluated associations among feature separability, prior bias, sample adequacy, and F1 performance. \textbf{Results:} Feature separability dominated performance ($\beta = 1.650$, $p < 0.001$), showing 412-fold stronger impact than prior bias ($\beta = 0.004$, $p = 0.018$). GAM analysis showed strong predictive power ($R^2 = 0.876$) with mostly linear trends. A quality threshold of 0.294 effectively identified cases requiring correction (AUC = 0.610). Cell types scoring $>0.5$ were robust without correction, whereas those $<0.3$ consistently required adjustment. \textbf{Conclusion:} Feature extraction quality, not bias magnitude, governs correction benefit. The proposed framework provides quantitative guidance for selective correction, enabling efficient deployment and reliable diagnostic AI.