日本語End-to-end音声認識におけるクラスラベルの検討

伊藤均; 萩原愛子; 一木麻乃; 三島剛; 佐藤庄衛; 小林彰夫

文献

J-GLOBAL ID：201702200615650564 整理番号：17A0979319

日本語End-to-end音声認識におけるクラスラベルの検討

出版者サイト {{ this.onShowPLink() }} 複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=17A0979319&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=17A0979319&from=J-GLOBAL&jstjournalNo=U0451A") }}

著者 (6件)： , , , , ,
資料名：
巻： 2017 号： SLP-117 ページ： Vol.2017-SLP-117,No.12,1-6 (WEB ONLY) 発行年： 2017年07月20日
JST資料番号： U0451A 資料種別：会議録 (C)
記事区分：原著論文発行国：日本 (JPN) 言語：日本語 (JA)

本稿では,日本語end-to-end音声認識における出力ラベルにクラスラベルを導入した。End-to-end音声認識は発音辞書を介すことなく文字を直接出力できるアプローチであり,主に英語音声認識の分野で研究例が報告されている。このend-to-endアプローチを日本語に適用する場合,文字総種の多さから英語に比べパラメータ数が膨大となり,限られた学習データでは学習サンプルが不足する文字も多数あるため,学習が難しい。本稿ではconnectionist temporal classification(CTC)基準の日本語end-to-end音声認識において,出力ラベルにクラスモデルを導入してパラメータ数を削減すると同時にデータスパース性を解消し,言語モデルでクラスラベルを単語に復元する手法を提案する。日本語end-to-end音声認識にクラスラベルを導入したことにより,音声認識誤り率の改善を確認した。(著者抄録)

, , , , , ,
, ,

パターン認識 , 自然語処理 , 音声処理

引用文献 (19件)：

A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. ICASSP. IEEE, 2013, pp. 6645-6649.
A. Graves and N. Jaitly, “Towards end-to-end speech recognition with recurrent neural networks.” in Proc. ICML, vol. 14, 2014, pp. 1764-1772.
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., “Deep speech 2: End-to-end speech recognition in English and Mandarin,” arXiv preprint arXiv:1512.02595, 2015.
H. Sak, A. Senior, K. Rao, and F. Beaufays, “Fast and accurate recurrent neural network acoustic models for speech recognition,” arXiv preprint arXiv:1507.06947, 2015.

, , ,

前のページに戻る