This website provides supplementary materials for the above-titled paper to appear in the Proceedings of the ICASSP 2026. Here, we demonstrate several examples of estimation results achieved by proposed methods.
Abstract
Audio effects play an essential role in sound design. This research addresses the task of audio effect estimation, which aims to estimate the configuration of applied effects from a wet signal. Existing approaches to this problem can be categorized into predictive approaches, which use models pre-trained in a data-driven manner, and search-based approaches, which are based on wet signal reconstruction. In this study, we propose a novel approach that integrates these approaches: first, DNNs predict the dry signal and effect configuration, and then a search is performed based on wet signal reconstruction using these predictions. By estimating the dry signal in the prediction stage, it becomes possible to complement or improve the predictions using reconstruction similarity as an objective function. The experimental evaluation showed that methods based on the proposed approach outperformed the method solely based on the predictive approach. Furthermore, the findings suggest that the task division of predicting the effect type combination followed by the search-based estimation of order and parameters was the most effective across various metrics.
Results
In each example, we first show the process of applying effects to the ground-truth dry signal. Then, for the baseline and each proposed method, we demonstrate the process of effect removal, the estimated effect configuration, and the reconstructed wet signal. To evaluate the performance of effect configuration estimation independently of the performance of effect removal, we performed the reconstruction using ground-truth dry signals.
Example 1
| Ground-truth |
|
|
|---|---|---|
|
Dry |
Wet |
|
| Bypass-Config-Iter (Baseline) |
|
|
|
One effect removed (SI-SDR: 21.40) |
Reconstructed (SI-SDR: 36.98) |
|
| Dry-Type-Direct + Search |
|
|
|
Entire chain removed (SI-SDR: 19.39) |
Reconstructed (SI-SDR: 42.22) |
|
| Bypass-Type-Iter + Search |
|
|
|
One effect removed (SI-SDR: 21.40) |
Reconstructed (SI-SDR: 45.46) |
|
| Bypass-Config-Iter + Search |
|
|
|
One effect removed (SI-SDR: 21.40) |
Reconstructed (SI-SDR: 36.27) |
|
Example 2
| Ground-truth |
|
|
|---|---|---|
|
Dry |
Wet |
|
| Bypass-Config-Iter (Baseline) |
|
|
|
One effect removed (SI-SDR: 27.43) |
Reconstructed (SI-SDR: 22.81) |
|
| Dry-Type-Direct + Search |
|
|
|
Entire chain removed (SI-SDR: 24.61) |
Reconstructed (SI-SDR: 28.14) |
|
| Bypass-Type-Iter + Search |
|
|
|
One effect removed (SI-SDR: 27.43) |
Reconstructed (SI-SDR: 29.34) |
|
| Bypass-Config-Iter + Search |
|
|
|
One effect removed (SI-SDR: 27.43) |
Reconstructed (SI-SDR: 29.29) |
|
Example 3
| Ground-truth |
|
||
|---|---|---|---|
|
Dry |
One effect applied |
Wet |
|
| Bypass-Config-Iter (Baseline) |
|
||
|
Two effects removed (SI-SDR: 19.44) |
One effect removed |
Reconstructed (SI-SDR: 30.65) |
|
| Dry-Type-Direct + Search |
|
||
|
Entire chain removed (SI-SDR: 16.46) |
Reconstructed (SI-SDR: 35.20) |
||
| Bypass-Type-Iter + Search |
|
||
|
Two effects removed (SI-SDR: 19.44) |
One effect removed |
Reconstructed (SI-SDR: 35.34) |
|
| Bypass-Config-Iter + Search |
|
||
|
Two effects removed (SI-SDR: 19.44) |
effect removed |
Reconstructed (SI-SDR: 35.33) |
|
Example 4
| Ground-truth |
|
|||
|---|---|---|---|---|
|
Dry |
One effect applied |
Two effects applied |
Wet |
|
| Bypass-Config-Iter (Baseline) |
|
|||
|
Three effects removed (SI-SDR: 9.52) |
Two effects removed |
One effect removed |
Reconstructed (SI-SDR: 9.30) |
|
| Dry-Type-Direct + Search |
|
|||
|
Entire chain removed (SI-SDR: 9.72) |
Reconstructed (SI-SDR: 25.19) |
|||
| Bypass-Type-Iter + Search |
|
|||
|
Three effects removed (SI-SDR: 9.52) |
Two effects removed |
One effect removed |
Reconstructed (SI-SDR: 18.15) |
|
| Bypass-Config-Iter + Search |
|
|||
|
Three effects removed (SI-SDR: 9.52) |
Two effects removed |
One effect removed |
Reconstructed (SI-SDR: 18.53) |
|
Example 5
| Ground-truth |
|
|||
|---|---|---|---|---|
|
Dry |
One effect applied |
Two effects applied |
Wet |
|
| Bypass-Config-Iter (Baseline) |
|
|||
|
Three effects removed (SI-SDR: 4.74) |
Two effects removed |
One effect removed |
Reconstructed (SI-SDR: 15.18) |
|
| Dry-Type-Direct + Search |
|
|||
|
Entire chain removed (SI-SDR: 9.94) |
Reconstructed (SI-SDR: 16.84) |
|||
| Bypass-Type-Iter + Search |
|
|||
|
Three effects removed (SI-SDR: 4.74) |
Two effects removed |
One effect removed |
Reconstructed (SI-SDR: 14.73) |
|
| Bypass-Config-Iter + Search |
|
|||
|
Three effects removed (SI-SDR: 4.74) |
Two effects removed |
One effect removed |
Reconstructed (SI-SDR: 13.49) |
|
Citation
@inproceedings{okita2026audio,
author={Okita, Youichi and Katayose, Haruhiro},
title={Audio Effect Estimation with {DNN}-Based Prediction and Search Algorithm},
booktitle={Proceedings of the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2026},
}