Selected Publications/Projects

From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding

Submitted to ICASSP 2026

“We propose language model-driven losses (LM loss), which we show to alleviate ``Phoneme Hallucinations’’ better than a semantic distillation objective in a very-low-bitrate personalized codec. The proposed LM losses build upon language models pretrained to associate speech with text.”

DDD: A Perceptually Superior Low-Response-Time DNN-Based Declipper

Published in Proc. ICASSP, 2024

“We introduce DDD (Demucs-Discriminator-Declipper), a real-timecapable speech-declipping deep neural network (DNN) that requires less response time by design, about 1/6 of the then-SOTA. Subjective evaluations on harshly clipped speech shows that DDD outperforms the baselines by a wide margin. We also perform detailed waveform and spectral analyses to gain an insight into the output behavior of DDD in comparison to the baselines.”

Beat-Aligned Spectrogram-to-Sequence Generation of Rhythm-Game Charts

Presented in Late-Breaking/Demo Session of ISMIR, 2023

In the heart of “rhythm games” - games where players must perform actions in sync with a piece of music - are “charts”, the directives to be given to players. We newly formulate chart generation as a sequence generation task and train a Transformer using a large dataset. We also introduce tempo-informed preprocessing and training procedures, some of which are suggested to be integral for a successful training. Our model is found to outperform the baselines on a large dataset, and is also found to benefit from pretraining and finetuning.