Multimodal deep learning for enhanced breast cancer diagnosis on sonography.
Authors
Affiliations (3)
Affiliations (3)
- Santa Clara University, 500 El Camino Real, Santa Clara, 95053, CA, USA.
- Santa Clara Valley Medical Center, 751 S. Bascom Ave, San Jose, 95128, CA, USA.
- Santa Clara University, 500 El Camino Real, Santa Clara, 95053, CA, USA. Electronic address: [email protected].
Abstract
This study introduces a novel multimodal deep learning model tailored for the differentiation of benign and malignant breast masses using dual-view breast ultrasound images (radial and anti-radial views) in conjunction with corresponding radiology reports. The proposed multimodal model architecture includes specialized image and text encoders for independent feature extraction, along with a transformation layer to align the multimodal features for the subsequent classification task. The model achieved an area of the curve of 85% and outperformed unimodal models with 6% and 8% in Youden index. Additionally, our multimodal model surpassed zero-shot predictions generated by prominent foundation models such as CLIP and MedCLIP. In direct comparison with classification results based on physician-assessed ratings, our model exhibited clear superiority, highlighting its practical significance in diagnostics. By integrating both image and text modalities, this study exemplifies the potential of multimodal deep learning in enhancing diagnostic performance, laying the foundation for developing robust and transparent AI-assisted solutions.