文字列の高速類似度計算・検索のための要素技術について-系列分割手法-

磯上貞雄; 城所引泰; 村上登志男; 久保山哲二

文献

J-GLOBAL ID：201202255688660179 整理番号：12A1111355

文字列の高速類似度計算・検索のための要素技術について-系列分割手法-

出版者サイト {{ this.onShowPLink() }} 複写サービスで全文入手
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=12A1111355&from=J-GLOBAL&jstjournalNo=L5808A") }}

著者 (4件)： , , ,
資料名：
巻： 31 ページ： 123-128 発行年： 2010年12月
JST資料番号： L5808A ISSN： 0913-4514 資料種別：逐次刊行物 (A)
記事区分：短報発行国：日本 (JPN) 言語：日本語 (JA)

2つの文字列間の非類似度や共通部分を計算する方法としてWagnerとFischerによるアルゴリズムが知られている。しかし,このアルゴリズムは,Web上の大規模なデータや,数百万塩基対の遺伝子配列等の大規模な文字列に対しては計算量が大きく適用が困難である。そこで,より高速な近似編集距離計算や類似パターン検索の要素技術の1つとして,「局所一致性」を保証した系列分割手法が使われるようになってきた。これは,乱数を使って構造の対称性を崩して分割を行うことにより,もともと指数個あつた分割候補から瞬時に1つの候補を選べるようにして,劇的な高速化を実現したものである。本稿では,この系列分割アルゴリズムの既存研究について概観し,遺伝子配列からの類似部分列の発見や,文章からの剽窃発見等の大規模な文字列への応用を目指している,高速な検索手法の開発のための展望について述べる。

, , ,
,

検索技術

引用文献 (9件)：

[1] T. Batu, F. Ergiin, and S. C. Sahinalp. Oblivious string embeddings and edit distance approximations. In Proc. of 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 792-801, 2006.
[2] R. Cole and U. Vishkin. Deterministic coin tossing with applications to optimal parallel list ranking. Information and Control, 70(1):32-53, 1986.
[3] G. Cormode and S. Muthukrishnan. The string edit distance matching problem with moves. In SODA, pages 667-676, 2002.
[4] G. Cormode and S. Muthukrishnan. The string edit distance matching problem with moves. ACM Transactions on Algorithms, 3(1), 2007.
[5] M. Crochemore, G. M. Landau, and M. Ziv-Ukelson. A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In Proc. of 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 679-688, 2002.

, , , , , ,

前のページに戻る