Back to all papers

A 23-µJ-per-frame All-on-Chip TinyML U-Net Processor for Real-Time Autonomous Image Segmentation in Miniaturized Ultrasound Devices.

March 23, 2026pubmed logopapers

Authors

Song Z,Guler U,Chandrakasan A

Abstract

Autonomous medical image segmentation enables critical applications, including urinary retention monitoring, prenatal fetal biometry, neuromodulation, and cardiovascular monitoring. Its deployment in wearable ultrasound patches demands on-device processing to preserve patient privacy and enable operation beyond clinical facilities. U-Net achieves state-of-the-art performance for biomedical segmentation, and recent binarized U-Nets retain high clinical accuracy with dramatically reduced computational cost. However, existing binary neural network (BNN) accelerators cannot support medicalgrade segmentation due to missing accuracy-enhancing features, poor hardware utilization for compute-optimal layers, and memory bottlenecks requiring costly external DRAM. This work presents a 0.81 mm<sup>2</sup> fully-integrated U-Net processor in 28nm featuring: 1) mixed-precision datapaths combining binary convolution with 4-bit skip connections for clinical accuracy; 2) systematic design space exploration across 9,390 configurations optimizing energylatency tradeoffs; 3) interleaved memory representation and halo reuse for energy-efficient battery-powered operation; and 4) hardware-supported layer fusion and lossless compression eliminating external memory while reducing peak on-chip usage by 3.16× and 1.38×, respectively. Validated on bladder and fetal head segmentation datasets, the processor achieves 13.4 frames per second (fps) and 23 µJ per frame, enabling real-time autonomous monitoring in wearable medical devices.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.