Signal processing letters

6/19/2023

In contrast, a supervised beamforming method that uses a deep neural network (DNN) for estimating spatial information of speech and noise readily fits real-time processing, but suffers from drastic performance degradation in mismatched conditions. Its heavy computational cost, however, prevents its application to real-time processing. One may use a state-of-the-art blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) that works well in various environments thanks to its unsupervised nature. This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party). Our article “An analysis of environment, microphone and data simulation mismatches in robust speech recognition” received ISCA Award for the Best Review Paper published in Computer Speech and Language (2016-2020). Our article “Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation” received the 15th IEEE Signal Processing Society (SPS) Japan Student Journal Paper Award. Our article “Multichannel Audio Source Separation With Deep Neural Networks” received the 6th IEEE Signal Processing Society (SPS) Japan Young Author Best Paper Award.

Our paper “Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation” has been accepted to IEEE ICASSP 2022.

I’m happy to share that I’m starting a new position as Research Scientist (研究員) at RIKEN! Our article “Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation” has been accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Our Sound Scene Understanding Team presented two papers at IEEE ICASSP 2022: ① “Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation” and ② “Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation”. We presented two papers at IWAENC 2022: ① “DNN-free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF” and ② “Joint Localization and Synchronization of Distributed Camera-Attached Microphone Arrays for Indoor Scene Analysis”. Our paper “Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments” was presented at Interspeech 2022. Our paper “Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments” was presented at IEEE/RSJ IROS 2022.

0 Comments

Signal processing letters

Leave a Reply.

Author

Archives

Categories