CNNによるテキスト独立型話者識別の音声データセットを用いた評価

大嵜郁弥; 京相雅樹

文献

J-GLOBAL ID：201902213979429508 整理番号：19A0956313

CNNによるテキスト独立型話者識別の音声データセットを用いた評価

A Speaker Recognition Framework Using Sound Spectrogram and Convolutional Neural Network Based Deep Learning Technique and Performance Evaluation with a Large-Scale Dataset of Human Speech

出版者サイト複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=19A0956313&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=19A0956313&from=J-GLOBAL&jstjournalNo=S0532B") }}

著者 (2件)： ,
資料名：
巻： 118 号： 436(MBE2018 57-83)(Web) ページ： 117-121 (WEB ONLY) 発行年： 2019年01月24日
JST資料番号： S0532B ISSN： 0913-5685 資料種別：会議録 (C)
記事区分：原著論文発行国：日本 (JPN) 言語：日本語 (JA)

精度の高い話者識別を実現するには,様々な話者に対し汎化性を有し,雑音に頑健性を持つ必要がある.近年,畳み込みニューラルネットワーク(Convolutional Neural Network::以下CNN)を用いた話者識別手法が注目を集めている.このような深層学習を用いた手法は高い性能が期待できるが,大量のデータを必要とする.そこで,本研究ではデータ量の差による精度差を検討した.結果,60分以上のデータ量がある場合は約90%識別可能な一方で5分程度のデータ量の場合識別率は約50%程度だった.今後は,精度差を埋める為CNNによる画像認識の領野で検討されている精度改善手法を用い高精度化を目指す.(著者抄録)

, , , , , , , ,
, , ,

パターン認識

引用文献 (8件)：

D.A.Reynolds and R.c.Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models” IEEE Trans.Speech Audi Process., Vol. 3, No.1, pp.73-83, 1995.
P.Kenny, G.Boulianne, P.Ouellet and P.Dumouchel, “Speaker and session variability in GMM based speaker verification, ”IEEE Trans. Audio Speech Lang. Process., Vol.15, No.4, pp.1448-1460, 2007.
K.Dehak, P.J.Kenny, .R.Dehak, P.Dumouchel and P.Ouellet, ”Front-end factor analysis for speaker verification, ” IEEE Trans. Audio Speech Lang.Process., Vol.19, No4, pp.788-798, 2011
Honglak Lee, Peter Pham, Yan Largman, and Andrew Yng, ”Unsupervised feature learning for audio classification using convolutional deep belief networks.” Advances in neural information processing systems., Vol.9, pp.1096-1104, 2009
A.Nagrani, J.S.Chung, and A.Zisserman, ”Voxceleb:a large scale speaker identification dataset.” In Interspeech, 2017.

, , ,

前のページに戻る