A compact and interpretable multi-source framework for heterogeneous medical image classification.
Authors
Affiliations (7)
Affiliations (7)
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China. [email protected].
- Department of Information Systems and Operations Management, Vienna University of Economics and Business, 1020, Vienna, Austria. [email protected].
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China. [email protected].
- School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, China.
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
- NOMATEN CoE, National Centre for Nuclear Research, 05-400, Otwock, Poland.
- Scientific Research Center, Baku Engineering University, AZ0101, Baku, Azerbaijan.
Abstract
Deep learning models for medical image analysis often rely on large-scale parameterization, which may limit their practical use in resource-constrained settings. This study aims to design a structurally compact multi-source framework capable of delivering competitive diagnostic performance with reduced computational overhead. We propose ML-ConvNet, a lightweight architecture comprising approximately 4.2 K parameters and 924 M FLOPs at 512×512 input resolution. The network incorporates Multi-Branch Re-parameterized Convolutions for scale-aware feature extraction, Hierarchical Dual-Path Attention for feature localization, Feature Self- Transformation for cross-feature interaction, and a Local Variance Weighted optimization strategy to address class imbalance. The framework is evaluated independently on three publicly available benchmark datasets representing heterogeneous imaging modalities: brain MRI, lung CT, and chest X-ray. Ablation studies, precision-recall analysis, cross-modality validation, and computational benchmarking are conducted to assess performance, stability, and efficiency under controlled experimental conditions. Within the evaluated settings, results indicate competitive diagnostic accuracy relative to established lightweight baselines, including EfficientNet and MobileNet variants, while substantially reducing parameter count. Class-wise F1-scores and PR-AUC values suggest relatively stable minority-class performance under repeated cross-validation sampling. Attention visualizations show activations concentrated over regions broadly associated with pathological findings, though these observations are qualitative in nature. Inference latency measurements on CPU and mobile hardware suggest feasibility for low-latency deployment under the tested single-image batch configurations, though real-world throughput may differ depending on hardware and operational conditions. These findings suggest that careful architectural design and domain-informed inductive biases may support competitive medical image classification on public benchmark datasets without extensive parameter scaling. The framework was evaluated exclusively under controlled conditions on publicly available data, and multi-institutional external validation is required before conclusions regarding generalizability or clinical applicability can be drawn.