Avinash Subramaniam*, Debottam Dutta*, Chaitanya Amballa*
Dept. of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
* = Equal contribution
We present Click2Hear, an interactive system that utilizes visual cues to generate localized binaural audio from a video with mono audio. The system achieves this by performing audio separation and sound source localization. Mono audio lacks spatial information in itself to localize or separate sources without any external information such as speaker identity and visual cues. Without such prior information, most of the existing methods utilize visual information present in the video frames to achieve these tasks. These methods take inspiration from the real world that, humans also have a sense of localizing the sound with the help of visual information they capture. This visual information helps in both separating and localizing sources at the same time. In addition to helping with audio source separation, visual information can also give a sense of the spatial location of the audio source. Existing audio-visual methods attempt these tasks separately or require sequential processing to do both tasks. We first investigate the generalizability of a popular existing method in localizing and separating mono-sound mixtures to more realistic scenarios by conducting several experiments. We then propose a unified framework that does both of the aforementioned tasks with the additional task of binauralization in one shot. Our experiments show promising results in achieving this task.
Click on the video to play/pause the correspoding audio associated with it. You can change time by clicking on the waveform
Click on any pixel to play the correspoding audio associated with it. You can change time by clicking on the waveform
Click on the video to play/pause the correspoding audio associated with it. You can change time by clicking on the waveform
Click on any pixel to play the correspoding audio associated with it. You can change time by clicking on the waveform
Click on the video to play/pause the correspoding audio associated with it. You can change time by clicking on the waveform
Click on the video to play/pause the correspoding audio associated with it. You can change time by clicking on the waveform
Click on the video to play/pause the correspoding audio associated with it. You can change time by clicking on the waveform
Click on any pixel to play the correspoding audio associated with it. You can change time by clicking on the waveform
© 2023 Avinash Subramaniam, Debottam Dutta, Chaitanya Amballa. University of Illinois at Urbana-Champaign. All rights reserved.