Back to all papers

Vision transformer embeddings and quantum pyramidal circuits for biomedical image analysis.

May 25, 2026pubmed logopapers

Authors

F Aragones X,González Ballester MA

Affiliations (5)

  • BCN Medtech, Department of Engineering, Universitat Pompeu Fabra, Barcelona, Spain. [email protected].
  • Parc Tecnològic TecnoCampus Mataró-Maresme, Universitat Pompeu Fabra, Mataró, Spain. [email protected].
  • BCN Medtech, Department of Engineering, Universitat Pompeu Fabra, Barcelona, Spain.
  • Quantic, Barcelona Supercomputing Center, Barcelona, Spain.
  • ICREA, Barcelona, Spain.

Abstract

This work presents a novel quantum-hybrid pipeline for lung nodule classification in computed tomography (CT) scans, combining vision transformer (ViT) embeddings with quantum orthogonal pyramidal circuits (QOPCs). The approach was evaluated on 681 lung nodule CT scans across axial, coronal, and sagittal planes. Two ViT configurations were tested: ViT<sub>1</sub> (1 head, 4 layers) and a Bayesian-optimized ViT<sub>2</sub> (4 heads, 8 layers). Features extracted from ViT embedding layers were reduced via principal component analysis to 2-16 dimensions and classified using the QOPC with reconfigurable beam splitter (RBS) gates. The proposed approach achieved unprecedented compression, up to 1,470 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>×</mo></math> (from 10,290 to 7 parameters) while preserving over 99% of baseline accuracy. The approach reached 83.7% accuracy (ViT<sub>2</sub>, <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>k</mi> <mo>=</mo> <mn>8</mn></mrow> </math> , <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>h</mi> <mo>=</mo> <mn>8</mn></mrow> </math> ) with only 46 trainable parameters and achieved computational efficiency (CE) up to 92.0. Training was accelerated up to 28 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>×</mo></math> (0.030 vs. 0.833 min) while maintaining robust diagnostic performance (F1: 0.77-0.82, receiver operating characteristic area under the curve (ROC-AUC): 0.87-0.90). Ablation studies confirmed that the quantum layer outperforms conventional MLPs by +3.4% accuracy with 35% fewer parameters, while late fusion of multi-view predictions further improved performance to 85.4% accuracy and 0.92 ROC-AUC. These results establish hybrid ViT-QOPC architectures as a practical and resource-efficient framework for medical image analysis, demonstrating their ability to dramatically reduce computational cost without compromising clinical accuracy.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.