Toward Non-Invasive Voice Restoration: A Deep Learning Approach Using Real-Time MRI

August 26, 2025

preprint

DOI: 10.1101/2025.08.22.25334256

Authors

Saleh, M. W.

Affiliations (1)

University of Balamand

Abstract

Despite recent advances in brain-computer interfaces (BCIs) for speech restoration, existing systems remain invasive, costly, and inaccessible to individuals with congenital mutism or neurodegenerative disease. We present a proof-of-concept pipeline that synthesizes personalized speech directly from real-time magnetic resonance imaging (rtMRI) of the vocal tract, without requiring acoustic input. Segmented rtMRI frames are mapped to articulatory class representations using a Pix2Pix conditional GAN, which are then transformed into synthetic audio waveforms by a convolutional neural network modeling the articulatory-to-acoustic relationship. The outputs are rendered into audible form and evaluated with speaker-similarity metrics derived from Resemblyzer embeddings. While preliminary, our results suggest that even silent articulatory motion encodes sufficient information to approximate a speakers vocal characteristics, offering a non-invasive direction for future speech restoration in individuals who have lost or never developed voice.

View Source Full Text PDF

Topics

radiology and imaging

Toward Non-Invasive Voice Restoration: A Deep Learning Approach Using Real-Time MRI

Authors

Affiliations (1)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?