RTMA
Real-time polyphonic guitar transcription — my 14-year thesis project, revived in Rust
This project started as my master's thesis in 2012 — real-time polyphonic guitar audio transcription. The original was MATLAB and C++. The idea was simple: listen to a guitar through a microphone and show the player what notes they're hitting, in real time. The hard part is that guitars are polyphonic — multiple strings ring simultaneously, and their harmonics overlap.
I never stopped thinking about it. In 2024 I started rebuilding it in Rust.
The algorithm
The core uses Non-Negative Matrix Factorization (NMF). You decompose a spectrogram into a dictionary of note templates multiplied by their activations over time. The templates are parametric — built from the Fletcher physics model of guitar string harmonics rather than learned from data. This means they generalize across guitars without training.
Key improvements over the thesis version:
- KL-divergence NMF instead of Euclidean distance — better for audio spectra
- STFT truncation to guitar-relevant frequencies — less computation, better results
- 16–20 harmonics per template instead of 8 — dramatically improved pitch separation
- Constraint to guitar range (E2–E6) — eliminates false detections outside playable range
Neural experiments
The parametric NMF approach gets around 40–50% F1 on GuitarSet. The state of the art (GAPS) hits 86.3%. I've been experimenting with neural approaches to close the gap:
- HCQT features (Harmonic Constant-Q Transform) as input representation
- Causal Conv Mamba and BiMamba architectures for temporal modeling
- Evaluation against GuitarSet using mir_eval metrics
- CREPE (monophonic pitch detection) ported to Rust via candle as a baseline component
The TUI
The application runs as a terminal UI built with ratatui. Live displays include a piano roll, spectrogram, activation chart, and fretboard visualization — all updating in real time from microphone input via cpal.
The long game
RTMA is the engine for what I actually want to build: real-time music education apps in the browser. Listen to what you're playing, show you what you got right, score your exercises, all with sub-frame latency. The Rust core compiles to WebAssembly. The 14-year arc from thesis to product is the thread that connects everything I've worked on.