日本語文法誤り訂正における事前学習済みモデルを用いたデータ増強

加藤秀佳; 岡部格明; 北野道春; 宿久洋

文献

J-GLOBAL ID：202302277993569157 整理番号：23A2118158

日本語文法誤り訂正における事前学習済みモデルを用いたデータ増強

Data Augmentation Using Pretrained Models in Japanese Grammatical Error Correction

出版者サイト {{ this.onShowPLink() }} 複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=23A2118158&COPY=1") }}
このテーマを更に深掘りする（JDreamⅢへ） {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=23A2118158&from=J-GLOBAL&jstjournalNo=U0128A") }}

著者 (4件)： , , ,
資料名：
巻： 38 号： 4 ページ： A-L41_1-10(J-STAGE) 発行年： 2023年
JST資料番号： U0128A ISSN： 1346-8030 資料種別：逐次刊行物 (A)
記事区分：原著論文発行国：日本 (JPN) 言語：日本語 (JA)

文法誤り訂正(GEC)は,非文法的な文を文法的に正しい文に変換する機械翻訳タスクと一般的に呼ばれている。本タスクでは非文法的な文と文法的な文のペアからなる,大量の並列データを必要とする。しかし,日本語のGECタスクでは,限られた数の大規模並列データしか利用できない。したがって,擬似並列データを生成するデータ増強(DA)が活発に研究されている。多くの以前の研究では,文法的な文よりもむしろ非文法的な文を生成することに焦点を当ててきた。この問題に対処するために,本研究では事前訓練済みBERTモデルを用いて正しい文を生成するDAアルゴリズムである,BERT-DAアルゴリズムを提案した。本実験では,2つの要素であるソースデータとデータ生成量に焦点を当てた。それらの要素を考慮することは,BERT-DAにとってより有効であることが証明された。複数ドメインの評価結果に基づき,BERT-DAモデルはMax MatchとGLEU⁺に関して既存のシステムを上回った。(翻訳著者抄録)

, , , , , ,

著者キーワード (10件)： , , , , , , , , ,

人工知能 , 情報加工一般 , 数理言語学 , システム・制御理論一般

引用文献 (33件)：

[浅野 18] 浅野広樹, 水本智也, 乾健太郎: 文法性・流暢性・意味保存性に基づく文法誤り訂正の参照無し評価, 自然言語処理, Vol. 25, No. 5, pp. 555-576 (2018)
[Bryant 15] Bryant, C. and Ng, H. T.: How far are we from fully automatic high quality grammatical error correction?, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 697-707 (2015)
[Chollampatt 18] Chollampatt, S. and Ng, H. T.: A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction, in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018)
[Dahlmeier 12] Dahlmeier, D. and Ng, H. T.: Better Evaluation for Grammatical Error Correction, in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 568-572, Montréal, Canada (2012)
[Devlin 19] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186, Minneapolis, Minnesota (2019)

, , , , ,

前のページに戻る