A 23-µJ-per-frame All-on-Chip TinyML U-Net Processor for Real-Time Autonomous Image Segmentation in Miniaturized Ultrasound Devices.
Authors
Abstract
Autonomous medical image segmentation enables critical applications, including urinary retention monitoring, prenatal fetal biometry, neuromodulation, and cardiovascular monitoring. Its deployment in wearable ultrasound patches demands on-device processing to preserve patient privacy and enable operation beyond clinical facilities. U-Net achieves state-of-the-art performance for biomedical segmentation, and recent binarized U-Nets retain high clinical accuracy with dramatically reduced computational cost. However, existing binary neural network (BNN) accelerators cannot support medicalgrade segmentation due to missing accuracy-enhancing features, poor hardware utilization for compute-optimal layers, and memory bottlenecks requiring costly external DRAM. This work presents a 0.81 mm<sup>2</sup> fully-integrated U-Net processor in 28nm featuring: 1) mixed-precision datapaths combining binary convolution with 4-bit skip connections for clinical accuracy; 2) systematic design space exploration across 9,390 configurations optimizing energylatency tradeoffs; 3) interleaved memory representation and halo reuse for energy-efficient battery-powered operation; and 4) hardware-supported layer fusion and lossless compression eliminating external memory while reducing peak on-chip usage by 3.16× and 1.38×, respectively. Validated on bladder and fetal head segmentation datasets, the processor achieves 13.4 frames per second (fps) and 23 µJ per frame, enabling real-time autonomous monitoring in wearable medical devices.