環境音認識における深層ニューラルネットワークの応用と改善【JST・京大機械翻訳】

Lin Yu-Kai; Su Mu-Chun; Hsieh Yi-Zeng; Hsieh Yi-Zeng; Hsieh Yi-Zeng

文献

J-GLOBAL ID：202102232495976479 整理番号：21A0577780

環境音認識における深層ニューラルネットワークの応用と改善【JST・京大機械翻訳】

The Application and Improvement of Deep Neural Networks in Environmental Sound Recognition

出版者サイト複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=21A0577780&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=21A0577780&from=J-GLOBAL&jstjournalNo=U7135A") }}

著者 (5件)： , , , ,
資料名：
巻： 10 号： 17 ページ： 5965 発行年： 2020年
JST資料番号： U7135A ISSN： 2076-3417 資料種別：逐次刊行物 (A)
記事区分：原著論文発行国：スイス (CHE) 言語：英語 (EN)

ニューラルネットワークは音認識において大きな結果を達成し,ネットワークに対する訓練入力として多くの異なる種類の音響特徴が試みられている。しかし,ニューラルネットワークが生音声信号入力から特徴を効率的に抽出できるかどうかについてはまだ疑問がある。本研究は,より深いネットワークアーキテクチャを用いて,他の研究からの生信号入力ネットワークを改善した。生信号を,提案したネットワークにおいてよりよく分析することができた。また,いくつかの種類のネットワーク設定の議論を提示し,スペクトログラム様変換で,著者らのネットワークは,環境音分類50(ESC50)のためのオープンアウドーデータセットにおいて73.55%の精度に達することができた。本研究はまた,異なる特徴を持つ異なる種類のネットワークフィードを結合できるネットワークアーキテクチャを提案した。グローバルプール化の助けを借りて,柔軟な融合方法をネットワークに統合した。著者らの実験は,異なるオーディオ特徴入力(生音声信号と対数-melスペクトル)を有する2つの異なるネットワークを首尾よく結合した。上記の設定を用いて,提案した並列Netは最終的にESC50において81.55%の精度に達し,それはまた人間の認識レベルに達した。Copyright 2021 The Author(s) All rights reserved. Translated from English into Japanese by JST.【JST・京大機械翻訳】

, , , , , ,
, , , , 【Automatic Indexing@JST】

著者キーワード (4件)： , , ,

パターン認識

引用文献 (41件)：

Chen, J.; Cham, A.H.; Zhang, J.; Liu, N.; Shue, L. Bathroom Activity Monitoring Based on Sound. In Proceedings of the International Conference on Pervasive Computing, Munich, Germany, 8-13 May 2005.
Weninger, F.; Schuller, B. Audio Recognition in the Wild: Static and Dynamic Classification on a Real-World Database of Animal Vocalizations. In Proceedings of the Acoustics, Speech and Signal Processing (ICASSP) 2011 IEEE International Conference, Prague, Czech, 22-27 May 2011.
Clavel, C.; Ehrette, T.; Richard, G. Events detection for an audio-based Surveillance system. In Proceedings of the ICME 2005 IEEE International Conference Multimedia and Expo., Amsterdam, The Netherlands, 6-8 July 2005.
Bugalho, M.; Portelo, J.; Trancoso, I.; Pellegrini, T.; Abad, A. Detecting Audio Events for Semantic Video search. In Proceedings of the Tenth Annual Conference of the International Speech Communication Association, Bighton, UK, 6-9 September 2009.
Mohamed, A.-R.; Hinton, G.; Penn, G. Understanding how deep Belief Networks Perform Acoustic Modelling. In Proceedings of the Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference, Kyoto, Japan, 23 April 2012.

, ,

前のページに戻る