コーパス検索支援のための動的同義語候補抽出

吉田稔; 中川裕志; 寺田昭

文献

J-GLOBAL ID：201002255364342919 整理番号：10A0089002

コーパス検索支援のための動的同義語候補抽出

Dynamic Synonym Candidates Extraction for Searching Documents in a Corpus

出版者サイト {{ this.onShowPLink() }} 複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=10A0089002&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=10A0089002&from=J-GLOBAL&jstjournalNo=U0128A") }}

著者 (3件)： , ,
資料名：
巻： 25 号： 1 ページ： 122-132 (J-STAGE) 発行年： 2010年
JST資料番号： U0128A ISSN： 1346-8030 資料種別：逐次刊行物 (A)
記事区分：原著論文発行国：日本 (JPN) 言語：日本語 (JA)

Web文書集合内には多くの同義語が存在するので,ユーザの入力クエリを含む文書を検索するだけでは網羅性が低い。本論文では,クエリ入力支援システムとして,任意のユーザ入力クエリの同義語候補を提示する動的同義語抽出アルゴリズムを提案した。本手法では「意味の似ている語は似た文脈で使用される」との仮定に基づき,文脈としてクエリ文字列に隣接する文字列を高速に検索するために,全文検索用索引構造であるSuffix Arrayを利用した。まず,得られた隣接文字列の集合をトライ木として表し,TF-IDF(Term Frequency-Inverse Document Frequency)値に類似したスコア付けを行うことで上位N₁個の文脈を選択する。次に,文脈文字列に隣接する文字列を取得してトライ木として表し,文脈に多く連接するほど高くなるスコア関数に基づいて上位N₂個の候補を同義語候補とする。実際に本アルゴリズムを特定文書集合に応用し,特別な前処理を必要とせずに英語など多言語に適用できることを示した。さらに,7Mbytesのコーパスを用いた実験では約2秒で1クエリに対する応答が得られ,ベクトル空間モデルに基づく従来手法よりもやや劣る抽出精度となった。

, , , , , , , , , ,
, ,

情報加工一般 , 人間機械系 , 検索技術

引用文献 (18件)：

[Chakrabarti 02] S. Chakrabarti, : Mining the Web : Discovering Knowledge from Hypertext Data, Morgan-Kaufmann Publishers (2002)
[Collins 00] M. Collins, : Discriminative Reranking for Natural Language Parsing, Proceedings of the Seventeenth International Conference on Machine Learning, pp. 175-182 (2000)
[Gasperin 01] C. Gasperin, , P. Gamallo, , A. Agustini, G. P. Lopes, and de V. Lima, : Using Syntactic Contexts for Measuring Word Similarity, Proceedings of the ESSLLI'01 Workshop on Semantic Knowledge Acquisition and Categorisation (2001)
[Gorman 06] J. Gorman, and J. R. Curran: Scaling distributional similarity to large corpora, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistic (COLING/ACL 2006), pp. 361-368 (2006)
[Grossi 00] R. Grossi, and J. Vitter: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching, Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, pp. 397-406 (2000)

, , ,

前のページに戻る