End-to-End複数言語音声認識モデルにおける様々なマルチタスク学習の検討

早川友瑛; 西崎博光; 山本一公; 小林彰夫; 宇津呂武仁

文献

J-GLOBAL ID：202002250231200465 整理番号：20A2051018

End-to-End複数言語音声認識モデルにおける様々なマルチタスク学習の検討

Investigation of Various Multi-task Learning for End-to-End Multi-Language Speech Recognition Model

出版者サイト {{ this.onShowPLink() }} 複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=20A2051018&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=20A2051018&from=J-GLOBAL&jstjournalNo=G0381C") }}

著者 (5件)： , , , ,
資料名：
巻： 2020 号：秋季ページ： ROMBUNNO.2-P1-1 発行年： 2020年08月26日
JST資料番号： G0381C ISSN： 1880-7658 資料種別：会議録 (C)
記事区分：短報発行国：日本 (JPN) 言語：日本語 (JA)

・End-to-End音声認識システムの枠組みを用いて,複数言語の音声認識におけるマルチタスク学習について提案。
・CTCをベースとしたEnd-to-End音声認識システムに言語識別器,話者識別器,発話スタイル識別器を組み合わせたマルチタスク学習を行い,評価。
・対象言語として,チェコ語,英語,フランス語,ドイツ語,日本語,スペイン語を扱い,各言語の学習データを約30時間以内に制限し,少ないデータ量でも音声認識が可能であること実証。
・CTCの事前学習モデルの認識性能をマルチタスク学習モデルが上回り,補助タスク部分にGRL(Gradient Reversal Layer)を適用することにより,認識性能の改善を確認。
・マルチタスク学習モデルがベースラインであるDNN-HMM音声認識システムの認識性能に匹敵し,複数言語の音声認識において,マルチタスク学習モデルが有効であることを確認。

, , , , ,
, , , ,

パターン認識

引用文献 (3件)：

Alex Graves, Santiago Fernández, Faustino J.Gomez, and Jürgen Schmidhuber. “Connection-isttemporal classification: labelling unsegmented sequence data with recurrent neural networks,” Proc. of ICML, 2006.
Yaroslav Ganin and Victor Lempitsky. “Unsupervised Domain Adaptation by Backpropagation,” Proc. of ICML, 2015.
P. Daniel, G. Arnab, B. Gilles, B.Lukas, G. Ondrej, G. Nagendra, H. Mirko, M. Petr, Q. Yanmin, S. Petr, S. Jan, S. Georg, K. Vesely, “The Kaldi Speech Recognition Toolkit,” Proc. of ASRU 2011, 4 pages, 2011.

, , ,

前のページに戻る