Sound of Touch teaser
TL;DR: We treat a continuously excited vibrating string as an active acoustic tactile sensor, and use short two-channel audio windows to infer contact location, contact force, and slippage in real time.

Experiment Video


Abstract

Distributed tactile sensing is hard to scale over large robot surfaces because dense arrays increase wiring, cost, and fragility. Sound of Touch is an active acoustic tactile sensing method that uses continuously excited tensioned strings as sensing elements. Two contact microphones observe spectral changes caused by contact. From short-duration audio, the system estimates contact location and normal force, and detects slip. A physics-grounded string vibration simulator explains how contact position and force shift vibration modes. Experiments show millimeter-scale localization, reliable force estimation, and real-time slip detection.


Method Overview

The sensor uses a steel string, dual EBow-style electromagnetic drivers for continuous excitation, and dual contact microphones. Contact at position x and force F changes effective string length and tension, producing structured resonance shifts. A split-string simulator predicts these trends and supports design analysis, while a real-world inference model maps audio to contact state tuple (contact, location, force, slip).

Concept design

Concept design and sensing principle

Frequency modulation

Contact-induced resonant frequency shifts


Hardware and System

The prototype is built on an aluminum frame with guitar hardware and a 10 cm sensing range. Microphone signals are independently amplified and streamed through a Focusrite Scarlett 4i4 interface at 44.1 kHz using a low-latency JACK setup for real-time inference.

Hardware design

Sensor hardware

System setup

Audio acquisition pipeline


Learning Pipeline

Audio windows are transformed into spectral inputs and encoded using a frozen Audio Spectrogram Transformer (AST). Lightweight task-specific heads estimate contact and slip (binary classification), and location and force (regression). The model is optimized jointly with masked losses so regression is trained only when contact is present.

Learning architecture

Results

Contact and slip classification reached 100% accuracy on the evaluated test set. Detailed regression metrics are shown below.

Contact Location Estimation
Object MAE (mm) <=5 mm (%) Pearson r Condition
Plastic2.787.80.994Clean
Plastic3.183.30.994Noise
Wood8.851.40.863Clean
Wood8.655.70.857Noise
Metal tube5.470.00.952Clean
Metal tube8.057.10.893Noise
Allen Key2.782.90.993Clean
Allen Key2.887.10.992Noise
Contact Force Estimation
Object MAE (N) <=0.2 N (%) Pearson r Condition
Plastic0.14174.40.909Clean
Plastic0.13175.60.928Noise
Wood0.11184.30.903Clean
Wood0.11582.90.927Noise
Metal tube0.17270.00.908Clean
Metal tube0.17365.70.906Noise
Allen Key0.16767.10.844Clean
Allen Key0.16565.70.819Noise

Contact

For questions, contact Xili Yi.