多说话人分离技术及应用进展-洪青阳.pdf-三个皮匠报告

1、洪青阳合作者：余洪涌、姜跃猛、李朝阳、王捷、李琳厦门大学智能语音实验室2024.3多说话人分离技术及应用进展纲要1.研究背景2.工业版本模块化系统3.改进方案4.落地应用1.研究背景多说话人分离（说话人日志）：给定一个包含多人交替说话的语音，系统需要判断每个时间段是谁在说话。多说话人分离系统音频分割信息1.研究背景应用场景：会议纪要，多说话人转录，智能客服，录音质检等.终端设备：智能手机个人电脑录音笔支持厂商：科大讯飞（智能办公本）、华为（AI纪要）、声云（语音转写）.1.研究背景端到端架构模块化架构研究趋势：简单场景复杂场景2000 200220062009 2013 2018 2019

2、2020 2021 2022 2023竞赛/数据集Rich Transcription（RT）AMICALLHOMEDIHARD(I,II,III)CHiME-6M2MeT,AISHELL-4架构MIXER6挑战：噪声干扰，人数未知，语音重叠等应用：离线=在线，单麦克风=多麦克风，适配新场景VoxSRC(20,21,22,23)M2MeT2.0,CHiME-7AliMeeting1.研究背景模块化系统聚类方法：AHC1、SC2,3、VB/VBx4,5、UIS-RNN6、DNC7 1 K.C.Gowda and G.Krishna,“Agglomerative Clustering Using

3、the Concept of Mutual Nearest Neighbourhood,”Pattern Recognition,vol.10,pp.105112,1978.2 U.von Luxburg,“Atutorial on spectral clustering,”Statistics and Computing,vol.17,pp.395416,2007.3 T.Park,Kyu J.Han,Manoj Kumar,and Shrikanth S.Narayanan,“Auto-tuning Spectral Clustering for Speaker Diarization U

4、sing Normalized Maximum Eigengap,”IEEE SignalProcessing Letters,vol.27,pp.381385,2020.4 M.Diez,L.Burget,S.Wang,J.Rohdin,H.Cernocky,“Bayesian HMM based x-vector Clustering for Speaker Diarization,”Interspeech,2019,pp.346-350.5 M.Diez,L.Burget,F.Landini,J.Cernocky,Analysis of Speaker Diarization based

5、 on Bayesian HMM with Eigenvoice Priors,IEEE/ACM Transactions on Audio Speech andLanguage Processing,vol.28,p 355-368,2020.6A.Zhang,Q.Wang,Z.Zhu,J.Paisley,and C.Wang,“Fully Supervised Speaker Diarization,”ICASSP,2019.7 Q.J.Li,F.L.Kreyssig,C.Zhang,P.C.Woodland,“Discriminative Neural Clustering for Sp

6、eaker Diarisation,”IEEE Spoken Language Technology Workshop(SLT 2021),Jan 2021,Shenzhen,China.1.研究背景端到端系统端到端系统EEND1SA-EEND2TS-VAD4基于Bi-LSTM的端到端模型目标说话人音频端点检测模型1 Y.Fujita,N.Kanda,S.Horiguchi,K.Nagamatsu,and S.Watanabe,“End-to-end Neural Speaker Diarization with Permutation-free Objectives,”in Interspe

多说话人分离技术及应用进展-洪青阳.pdf

相关报告