LSTMと単語埋込みを用いたソーシャルメディアにおける自動毒性分類【JST・京大機械翻訳】

Alsharef Ahmad; Aggarwal Karan; Sonia; Koundal Deepika; Alyami Hashem; Ameyed Darine

文献

J-GLOBAL ID：202202224302563722 整理番号：22A1037875

LSTMと単語埋込みを用いたソーシャルメディアにおける自動毒性分類【JST・京大機械翻訳】

An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding

出版者サイト複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=22A1037875&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=22A1037875&from=J-GLOBAL&jstjournalNo=U7694A") }}

著者 (6件)： , , , , ,
資料名：
巻： 2022 ページ： Null 発行年： 2022年
JST資料番号： U7694A ISSN： 1687-5265 資料種別：逐次刊行物 (A)
記事区分：原著論文発行国：イギリス (GBR) 言語：英語 (EN)

テキストにおける毒性の自動同定は,ソーシャルメディア世界が,緩やかな hから下りの幸福までの範囲である,非濾過コンテンツで reTeされているので,テキスト分析における重要な領域である。研究者は訓練データセットに起因する意図しないバイアスと不公平性を見出し,それは文脈で毒性語の不正確な分類を引き起こした。本論文では,テキストにおける毒性の位置決めのためのいくつかのアプローチを評価し,テキスト分類の全体的品質の強化を目的として提示した。一般的教師なし方法は,バイアスを緩和し,F1スコアを増強する間,精度を改善するために,最先端のモデルと外部埋込みに依存して使用された。提案アプローチでは,変換者(BERT)からの双方向エンコーダ表現により生成された単語埋込みと,Glove単語埋込みおよびLSTMとの長い短期メモリ(LSTM)深層学習モデルの組合せを用いた。これらのモデルを訓練し,毒性として分類された多数のコメントを含む大きな二次定性的データで試験した。結果は,コメントのバイナリ分類(毒性および非毒性)にBERT単語埋込みを有するLSTMを用いて,94%の許容精度および0.89のF1スコアを達成したことを見出した。LSTMとBERTの組合せは,Glove単語埋込みによるLSTM非コンパニとLSTMの両方よりも良好であった。本論文では,訓練データのみではなく,テキストのより大きなコーパス(高品質単語埋込み)を持つモデルにより,高精度でコメントを分類する問題を解決することを試みた。Copyright 2022 Ahmad Alsharef et al. Translated from English into Japanese by JST.【JST・京大機械翻訳】

, , , , , , , , ,
, , , , , , , 【Automatic Indexing@JST】

自然語処理 , 人工知能

引用文献 (43件)：

B. Van Aken, J. Risch, A. Krestel, A. Löser, "Challenges for toxic comment classification: An in-depth error analysis," 2018, https://arxiv.org/abs/1809.07572.
D. Borkan, L. Dixon, J. Sorensen, N. Thain, L. Vasserman, "Nuanced metrics for measuring unintended bias with real data for text classification," Proceedings of the Companion Proceedings of the 2019 World Wide Web Conference, pp. 491-500, San Francisco, CA, USA, May 2019.
F. Ahmadi, Sonia, G. Gupta, S. R. Zahra, P. Baglat, P. Thakur, "Multi-factor biometric authentication approach for fog computing to ensure security perspective," Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 172-176, IEEE, New Delhi, India, March 2021.
Data.world, "Twitter," https://data.world/marcusyyy/twitter.
A. Onan, "Topic-enriched word embeddings for sarcasm identification," Advances in Intelligent Systems and Computing, pp. 293-304, Springer, New York, NY, USA, 2019.

, , , , ,

前のページに戻る