Large-sample PCA eigenvectors stabilize cortical thickness components and improve small sample brain behavior prediction.

May 25, 2026

papers

DOI: 10.1038/s41598-026-52800-4 PMID: 42185407

Authors

Feng ZY,Hosokawa K,Hosoda C

Affiliations (5)

Graduate School of Information Sciences, Tohoku University, 6-3-09 Aoba, Aramaki-Aza Aoba-Ku, Sendai, 980-8579, Japan.
Institute of Development, Aging and Cancer, Tohoku University, 6-3-09 Aoba, Aramaki-Aza Aoba-Ku, Sendai, 980-8579, Japan.
Graduate School of Information Sciences, Tohoku University, 6-3-09 Aoba, Aramaki-Aza Aoba-Ku, Sendai, 980-8579, Japan. [email protected].
Institute of Development, Aging and Cancer, Tohoku University, 6-3-09 Aoba, Aramaki-Aza Aoba-Ku, Sendai, 980-8579, Japan. [email protected].
Graduate School of System Design and Management, Keio University, Yokohama, Japan. [email protected].

Abstract

Reproducible brain-wide association studies remain challenging in structural MRI, in part because high-dimensional cortical measures yield unstable eigenspaces in small samples. Here, using cortical thickness data from the Human Connectome Project Young Adult cohort (N = 1,113), we examined how sample size influences the stability of principal component analysis (PCA) and whether eigenvectors derived from larger samples can improve brain-behavior prediction in independent small samples. PCA stability was quantified across overlapping and non-overlapping resampling schemes using cosine similarity and one-to-one Hungarian matching of components. PCA stability increased systematically with sample size: Subsamples under 100 participants yielded few stable components, whereas larger subsamples produced dozens. Thus, reproducibility hinges not only on statistical power but on the stability of the representational basis: components learned in small subsamples are fragile, while eigenvectors from larger samples converge to stable, transferable axes. We then compared three prediction settings (500 vs. 500, 500 vs. 100, and 100 vs. 100) across 65 cognitive and personality traits using linear regression and machine-learning models. Transferring eigenvectors derived from larger samples to another smaller samples consistently improved prediction relative to deriving PCA components within the same small sample, although absolute effect sizes remained modest. Prediction performance was highest at an intermediate dimensionality of approximately 30 principal components, indicating that increasing the number of retained components does not necessarily improve generalization. These findings identify PCA eigenspace stability as a key determinant of reproducible brain-behavior inference and suggest that reusing larger-sample PCA eigenvectors is a practical strategy for stabilizing feature extraction in resource-limited neuroimaging studies.

View Source Full Text PDF

Topics

Journal Article

Large-sample PCA eigenvectors stabilize cortical thickness components and improve small sample brain behavior prediction.

Authors

Affiliations (5)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?