INVERSYNTH II: SOUND MATCHING VIA SELF-SUPERVISED SYNTHESIZER-PROXY AND INFERENCE-TIME FINETUNING

Oren Barkan, Shlomi Shvartzman, Noy Uzrad, Moshe Laufer, Almog Elharar, Noam Koenigstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Synthesizers are widely used electronic musical instruments. Given an input sound, inferring the underlying synthesizer's parameters to reproduce it is a difficult task known as sound-matching. In this work, we tackle the problem of automatic sound matching, which is otherwise performed manually by professional audio experts. The novelty of our work stems from the introduction of a novel differentiable synthesizer-proxy that enables gradient-based optimization by comparing the input and reproduced audio signals. Additionally, we introduce a novel self-supervised finetuning mechanism that further refines the prediction at inference time. Both contributions lead to state-of-the-art results, outperforming previous methods across various metrics. Our code is available at: https://github.com/inversynth/ InverSynth2.

Original languageEnglish
Title of host publication24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Proceedings
EditorsAugusto Sarti, Fabio Antonacci, Mark Sandler, Paolo Bestagini, Simon Dixon, Beici Liang, Gael Richard, Johan Pauwels
PublisherInternational Society for Music Information Retrieval
Pages642-648
Number of pages7
ISBN (Electronic)9781732729933
StatePublished - 2023
Event24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Milan, Italy
Duration: 5 Nov 20239 Nov 2023

Publication series

Name24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Proceedings

Conference

Conference24th International Society for Music Information Retrieval Conference, ISMIR 2023
Country/TerritoryItaly
CityMilan
Period5/11/239/11/23

Bibliographical note

Publisher Copyright:
© Barkan et al.

Fingerprint

Dive into the research topics of 'INVERSYNTH II: SOUND MATCHING VIA SELF-SUPERVISED SYNTHESIZER-PROXY AND INFERENCE-TIME FINETUNING'. Together they form a unique fingerprint.

Cite this