现代信号与数据处理实验室
Advanced Data & Signal Processing Laboratory

项目成果介绍

《基于声学矢量传感器阵列和稀疏表示的语音声源方位角估计方法研究》

项目编号:61271309

中文摘要:
空间语音声源方位角(DOA)估计是服务机器人听觉系统的关键技术之一,具有巨大的应用价值和市场潜力。传统DOA估计方法在鲁棒性、精度、系统开销和体积等方面都存在无法逾越的障碍,限制了实际应用。本项目面向服务机器人应用,开展新的DOA估计方法研究,即基于稀疏表示和声学矢量传感器(AVS)的高精度、鲁棒、多语音声源DOA估计新理论和新方法研究。总结如下:
(1) 开展基于AVS阵列和稀疏表示的DOA 估计方法研究,提出了两种新的基于AVS阵列/子阵列数据模型的DOA 估计算法(AVS-SS-LF 和AVS-SS-ST),仿真结果验证了所提出方法的有效性;
(2) 开展基于单AVS、稀疏表示的语音声源DOA估计方法研究,推导了时频域AVS各传感器间数值比(ISDR)近似模型,获得DOA与ISDR的函数关系;推导出基于ISDR的DOA过完备字典稀疏表示模型,提出了一种新的DOA估计算法,即AVS-ISDR-SSR,大量仿真实验和实测实验验证了算法的有效性;
(3) 开展基于语音时频稀疏性和单AVS的DOA估计算法研究,提出了一种新的多源DOA估计算法,即AVS-ISDR,实验表明,该算法可实现多达7个语音声源的DOA估计;以此,提出了四种高局部时频点提取算法,使得AVS-ISDR在较宽信噪比动态范围和混响条件下,获得稳定和高精度的多语音声源DOA估计;
(4) 分析AVS多通道语音信号双频谱特性,利用双频谱域对高斯噪声的抑制特性,提出了两种基于双频谱数据比的DOA估计方法(AVS-BISDR、AVS-MBISDR),能够有效地抑制加性高斯白噪声以及方向性高斯噪声干扰的影响;
(5) 基于语音时频稀疏特性和机器学习策略,提出了两种基于深度学习的鲁棒DOA估计方法(AVS-DNN-ISDR、AVS-WISDR-DNN),获得在低信噪比和强混响环境中的准确DOA估计;
(6) 自主创新研制了传感器AVS和DOA估计实验原型系统,对提出的DOA估计算法进行了实测验证,并围绕机器人听觉关键技术开展了语音增强、声纹识别、音频事件检测等研究。
综上,课题组按照研究计划顺利完成了研究任务,研究成果获得包括华为、海尔、广州视源股份有限公司、优必选、深圳市海岸技术有限公司等的关注,并在积极进行成果转化。

英文摘要:

Spatial speech sound source Direction-of-Arrival (DOA) estimation is one of the key technologies in the auditory system of service robots, which has great application and potential market value. The traditional array-based DOA estimation methods have some limitations and obstacles, such as robustness to noise, DOA estimation accuracy, system hardware cost and physical size, which limits the practical applications. Targeting at the applications of service robot this project develops new DOA estimation methods and theorems of high accuracy, robustness, multi sound/speech sources based on the Acoustic Vector Sensor (AVS) and sparse representation theory. Summarizations are given as follows:

(1) Under the framework of AVS array and sparse representation theory, we have proposed two new DOA estimation algorithms using AVS array / AVS-subarray data model, termed as AVS-SS-LF and AVS-SS-ST algorithms. Numerical simulation results verify the effectiveness of the proposed DOA methods;

(2) Under the framework of single AVS and sparse representation theory, we have derived the inter-sensor model of AVS, termed as ISDR, and obtained the function relationship between DOA and ISDR. And then, the overcomplete dictionary sparse representation model of ISDR is formed and a new DOA method named as AVS-ISDR-SSR algorithm has been developed.

Number of simulation experiments as well as real experiments have been conducted to verify the effectiveness of our proposed DOA estimation algorithm;

(3) Based on speech sparsity property and single AVS, we proposed a new DOA estimation algorithm for single and multiple speech sources, namely AVS-ISDR, experiments show that AVS-ISDR is able to achieve up to 7 sound source DOA estimation. To further improve the performance of the AVS-ISDR, we put forward four algorithms to effectively extract the reliable high local signal-to-noise time-frequency points. As a results, the AVS-ISDR provides high DOA estimation accuracy and robust performance under a wide dynamic range of signal-to-noise ratio as well reverberant conditions;

(4) By analyzing the bispectrum characteristics of the multi-channel speech signals of single AVS and using the suppression characteristics of bispectrum on Gaussian noise, we proposed two DOA estimation methods according to the bispectrum ISDR data model, termed as AVS-BISDR, and AVS-MBISDR algorithms. Number of simulation experiments as well as real experiments verify that AVS-BISDR, and AVS-MBISDR are able to reduce the effect of additive white Gaussian noise and directional Gaussian noise interference;

(5) Based on speech sparsity property and machine learning strategies, we proposed two deep learning based robust DOA estimation methods, named as AVS-DNN-ISDR and AVS-WISDR-DNN, respectively, which are able to obtain accurate DOA estimation under low SNR and strong reverberation conditions;

(6) We have independently and creatively designed and implemented the sensor AVS. Moreover, we developed the DOA estimation prototype system. With the developed AVS sensor and the DOA estimation system, numerous experiments have been conducted to evaluate the performance of the proposed DOA algorithms, which provide more reliable validation than that of the computer simulated data. Based on these achievements, we further carry out some researches on the key technologies for robotic auditory system, such as speech enhancement, voice verification, audio event detection etc.

In conclusion, following the project research plan, we have successfully completed the tasks designed.  The research outcomes of the project have draw great attention by several companies, including HUAWEI, Haier, Guangzhou Shiyuan Ltd, Shenzhen HaiAn speech technology Ltd etc. Some achievement transformations are undergoing.

 

图:

1

 实时DOA估计系统软件界面2

 实时DOA估计系统实测场景

3

 嵌入式移动机器人DOA估计实验平台

 

项目取得成果

课题组目前完成的研究成果包括:

  • 硬件系统:
  • 声学矢量传感器三种(AVSⅠ、AVSⅡ、AVSⅢ);
  • 语音声源DOA估计试验系统一套;
  • 嵌入式移动机器人DOA估计实验平台一个。
  • 应用软件:
  • 基于MATLAB的DOA估计算法研究软件系统一套;
  • 基于AVS 语音定位和识别的智能垃圾桶软件系统一套;
  • 基于声纹识别的考勤管理平台-APP一个;
  • 基于智能机器人的非特定人中英文混合命令短语识别系统一套;
  • 基于时频掩膜的单声学矢量传感器目标语音增强系统一套;
  • Vcamera—语音相机app一个;
  • 基于MATLAB的说话人确认系统一套。

期刊论文发表

  • Zou Y X, Li B, Ritz C H. Multi-Source DOA Estimation Using an Acoustic Vector Sensor Array Under a Spatial Sparse Representation Framework[J]. Circuits, Systems, and Signal Processing, 2016, 35(3): 993-1020. 【SCI收录 000370819100014】【EI收录 20160801981561】;
  • 邹月娴, 郭轶凡, 郑炜乔. 基于 AVS 和稀疏表示的鲁棒语者声源 DOA 估计方法[J]. 数据采集与处理, 2015, 30(2): 299-306;
  • Zou Y X, Wang P, Wang Y Q, et al. Speech enhancement with an acoustic vector sensor: an effective adaptive beamforming and post-filtering approach[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014, 2014(1): 1-12. 【SCI收录 000347390400001】【EI收录 20142417806603】【影响因子38】;
  • 邹月娴, 王鹏, 王文敏. 一种基于单 AVS 的空间目标语音增强方法[J]. 清华大学学报: 自然科学版. 2013 (6): 883-887. 【EI收录 20134416914714】;
  • 胡旭琰, 邹月娴, 王文敏. 基于 MDT 特征补偿的噪声鲁棒语音识别算法[J]. 清华大学学报: 自然科学版, 2013 (6): 753-756. 【EI收录 20134416914686】;

会议论文发表

  • Yanhan Jin, Yuexian Zou, C. H. Ritz,“Robust Speaker 3-D DOA Estimation Based On The Inter-Sensor Data Ratio Model And Mask Estimation In The Bispectrum Domain”,IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2017)【接收发表】;
  • Jin Y H, Zou Y X. Robust speaker DOA estimation with single AVS in bispectrum domain[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2016). IEEE, 2016: 3196-3200. 【EI收录20162402488768】;
  • Zheng W Q, Zou Y X, Ritz C. Spectral mask estimation using deep neural networks for inter-sensor data ratio model based robust DOA estimation[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2015). IEEE, 2015: 325-329. 【EI收录 20154501510470】;
  • Zou Y X, Shi W, Li B, et al. Multisource DOA estimation based on time-frequency sparsity and joint inter-sensor data ratio with single acoustic vector sensor[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP2013), 2013: 4011-4015. 【EI收录 20135217120852】;
  • Zou Y, Guo Y, Zheng W, et al. An effective DOA estimation by exploring the spatial sparse representation of the inter-sensor data ratio model[C]// 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP2014), 2014: 42-46. 【EI收录 20152100870017】;
  • Guo Y, Zou Y X, Wang Y. A robust high resolution speaker DOA estimation under reverberant environment[C]// IEEE 9th International Symposium on Chinese Spoken Language Processing (ISCSLP2014), 2014: 400-400. 【EI收录】;
  • Shi W, Zou Y, Liu Y. Long-term auto-correlation statistics based voice activity detection for strong noisy speech[C]//Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on. IEEE, 2014: 100-104. 【EI收录20152100870666】;
  • Zou Y X, Zheng W Q, Shi W, et al. Improved voice activity detection based on support vector machine with high separable speech feature vectors[C]//2014 19th International Conference on Digital Signal Processing. IEEE, 2014: 763-767. 【EI收录 20153601243014】;
  • Zou Y X, Wang Y Q, Wang P, et al. An effective target speech enhancement with single acoustic vector sensor based on the speech time-frequency sparsity[C]//2014 19th International Conference on Digital Signal Processing. IEEE, 2014: 547-551. 【EI收录 20153601242972】;
  • Wang C, Zou Y, Liu S, et al. An Efficient Learning Based Smartphone Playback Attack Detection Using GMM Supervector[C]//Multimedia Big Data (BigMM), 2016 IEEE Second International Conference on. IEEE, 2016: 385-389. 【EI收录】;
  • Shihan Liu, Yuexian Zou, “Multi-Constraint Nonnegative Matrix Factorization Approach to Speech Enhancement with Nonstationary Noise,” International Conference on Intelligence Science and Big Data Engineering (IScIDE). pp. 181-191, Guangzhou, China, May, 2016;
  • Chun Wang, Yuexian Zou, Weiqiao Zheng, Wei Shi, “An Efficient Playback Attack Detection Approach Based on Supervised Learning.” IEEE International Conference on Intelligence Science and Big Data Engineering (IScIDE), Guangzhou, China, May 13-15, 2016;
  • Liu S H, Zou Y X, Ning H K. Nonnegative matrix factorization based noise robust speaker verification[C]//Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on. IEEE, 2015: 35-39. 【EI收录 20160701912123】;
  • Wang C, Shi W, Zou Y X. Multi-pronunciation dictionary construction for Mandarin-English bilingual phrase speech recognition system[C]//Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on. IEEE, 2015: 15-19. 【EI收录20160701912119】
  • Zheng W Q, Yu J S, Zou Y X. An experimental study of speech emotion recognition based on deep convolutional neural networks[C]//Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 2015: 827-831. 【EI收录20161502238751】
  • Liu J H, Zheng W Q, Zou Y X. A Robust Acoustic Feature Extraction Approach Based On Stacked Denoising Autoencoder[C]//Multimedia Big Data (BigMM), 2015 IEEE International Conference on. IEEE, 2015: 124-127. 【EI收录20153701270922】;
  • Hu X Y, Zou Y X, Shi W. An effective missing feature compensation method for speech recognition at noisy environment[C]//Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on. IEEE, 2014: 133-137. 【EI收录 20152100870641】;
  • Ning H, Zou Y X, Hu X. A new score normalization for text-independent speaker verification[C]//2014 19th International Conference on Digital Signal Processing. IEEE, 2014: 636-639. 【EI收录 20153601242990】。

专利发明

  • 邹月娴,郭轶凡,石伟,“一种基于AVS和稀疏表示的单语者声源DOA估计方法”,中国发明专利号:1,申请时间2013-12-25;授权时间2016-05-18;
  • 邹月娴,郑炜乔,王永庆,石伟,王春,郭轶凡,宁洪珂,刘诗涵,“一种基于声学矢量传感器的语音控制智能垃圾桶”,申请时间:2014-09-03,发明专利申请号:8;授权时间2016-09-10;
  • 邹月娴,王鹏,“一种基于时频掩膜的单声学矢量传感器目标语音增强方法”,2013,发明专利申请号:0,实审
  • 邹月娴,郑炜乔,余嘉胜,王毅,柳俊宏,陈锦,黄晓林,金彦含,“一种语音控制拍照软件”,2015,发明专利申请号:1,实审
  • 邹月娴,金彦含,“一种基于声学矢量传感器和双谱变换的鲁棒单语者声源DOA估计方法”,2016,发明专利申请号:5,实审

 

项目成果转化及应用情况。

本课题的研究成果已申请了5项中国发明专利,其中2项授权,研究成果获得华为、海尔、广州视源股份有限公司、优必选、深圳市海岸技术有限公司、深圳市尔木科技有限公司等高科技公司的关注,已有企业与本课题组进行洽谈购买专利事宜。

本课题培养的北京大学优秀硕士毕业生石伟已经在2016年3月创建海岸语音技术有限公司,重点借助本课题的知识产权,开展成果转化和产品化工作,期望能够为智能家庭服务机器人提供声源DOA估计关键技术。

此外,本课题培养的北京大学硕士研究生王永庆加入了乐视语音小组、郭轶凡加入了腾讯、郑炜乔加入了中国著名语音专业公司思必驰、胡旭琰加入了网易语音小组、任梦琪和李波出国攻读博士等,可以认为NSFC项目的支持不仅仅局限于技术,对人才的培养也起到了重要的作用。

 人才培养情况。

在自然科学基金项目的支持下,已经培养了12名硕士研究生,其中11名已经获得北京大学理学硕士学位。

研究生姓名 专业/研究方向 硕士论文题目 导师姓名 答辩时间
任梦琪 集成电路与系统/嵌入式系统与 DSP 技术 小孔径麦克风阵列语者定位技术研究与实现 邹月娴 2012年7月
李波 集成电路与系统/嵌入式系统与 DSP 技术 基于信号稀疏性的声学矢量传感器DOA估计方法研究 邹月娴 2012年7月
石伟 集成电路与系统/多媒体技术 基于声学矢量传感器的鲁棒DOA估计方法研究与实现 邹月娴 2013年7月
郭轶凡 计算机应用技术/多媒体信息处理技术 基于AVS和稀疏表示的鲁棒语者声源DOA估计算法研究 邹月娴 2015年7月
郑炜乔 计算机应用技术/多媒体信息处理技术 面向智能服务机器人的语音声源 DOA 估计技术研究 邹月娴 2016年7月
金彦含 计算机应用技术/多媒体信息处理技术 基于AVS和双谱的鲁棒语者声源DOA估计算法研究 邹月娴 2017年7月
胡旭琰 计算机应用技术/多媒体信息处理技术 基于带噪语谱补偿的鲁棒语音识别算法研究 邹月娴 2014年7月
王鹏 计算机应用技术/多媒体技术 基于声学矢量传感器的语音增强算法研究 邹月娴 2013年7月
王永庆 计算机应用技术/多媒体信息处理技术 基于时频掩膜的空间目标语音增强算法研究 邹月娴 2015年7月
宁洪珂 计算机应用技术/多媒体信息处理技术 信道鲁棒的说话人确认算法研究 邹月娴 2015年7月
刘诗涵 计算机应用技术/多媒体信息处理技术 基于非负矩阵分解的单通道语音增强算法研究 邹月娴 2016年7月
王春 计算机应用技术/多媒体信息处理技术 基于监督学习的录音回放攻击检测方法及应用 邹月娴 2016年7月