Intel Sports enables immersive media experiences in sports like American football and basketball by using the powers of volumetric video and microphone array capture. Here, I work with Prof. Weiqing Gu during Summer 2017 at Harvey Mudd College together with liaisons from Intel Steven Xing and Peter Sankhagowit.
Immersive Auditory Experiences for Basketball
This research is an exploratory research in which we would like to have users enjoy an immersive experience in sport games like basketball, with a primary emphasis on immersive auditory experiences. In a typical basketball game, there usually are multiple cameras and microphones located at different locations. The key question is how do we combine each of the single microphone (or camera) to reconstruct the entire sound field (or 3D scene) so that viewers can enjoy this immersive experiences at later time. Though there are plentiful amount of publications in the area of 3D scene reconstructions, research in the area of auditory scene or sound field reconstructions is not yet deeply explored.
To enable the auditory immersive experiences, our team designs a system to separate and localize important sound objects (i.e., basketball impact sound, cheering sound, etc.) from a basketball scene using multiple known-location microphone array. With separated audio sources coupled with known locations, we can reconstruct the auditory immersive experiences by rendering them using Head-Related Transfer Functions (HRTF) to simulate the 3D acoustic effects that humans perceive.
Our team explores many different ideas on how we can accomplish audio separation and localization at the same time. The biggest problem that our team encountered was that there was no ground truth / label in the audio data capture for either the separation task or localization task. To this end, we explore the idea of using Expectation and Maximization (EM) algorithm for separating and localizing audio sources on synthetic data.
Given the time constraint, I couldn’t explore deeper into the idea of EM algorithm. Prof. Weiqing Gu and the team from Intel continues exploring this idea of EM algorithm and gets very promising results on real recorded basketball recordings. They published the results at NeurIPS 2018, where they show separation and localization results on the data synthesized from Mozilla Common Voice Dataset.
Latent Gaussian Activity Propagation: Using Smoothness and Structure to Separate and Localize Sounds in Large Noisy Environments Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), pp. 3469-3478. 2018. PDF: https://proceedings.neurips.cc/paper/2018/file/7dd0240cd412efde8bc165e864d3644f-Paper.pdf
Last updated: Jan 16, 2022