Art
J-GLOBAL ID:201002240508220999   Reference number:10A0089012

Finding and Classifying Near-Duplicate Pages based on Identical Sentences Detection

同一文抽出に基づく類似ページの検出と分類
Author (3):
Material:
Volume: 25  Issue:Page: 224-232 (J-STAGE)  Publication year: 2010 
JST Material Number: U0128A  ISSN: 1346-8030  Document type: Article
Article type: 原著論文  Country of issue: Japan (JPN)  Language: JAPANESE (JA)
Thesaurus term:
Thesaurus term/Semi thesaurus term
Keywords indexed to the article.
All keywords is available on JDreamIII(charged).
On J-GLOBAL, this item will be available after more than half a year after the record posted. In addtion, medical articles require to login to MyJ-GLOBAL.

Semi thesaurus term:
Thesaurus term/Semi thesaurus term
Keywords indexed to the article.
All keywords is available on JDreamIII(charged).
On J-GLOBAL, this item will be available after more than half a year after the record posted. In addtion, medical articles require to login to MyJ-GLOBAL.

JST classification (2):
JST classification
Category name(code) classified by JST.
Other information processing  ,  Retrieval technology 
Reference (13):
  • [BarYossef 07] BarYossef, Z., Keidar, I., and Schonfeld, U.: Do Not Crawl in the DUST: Different URLs with Similar Text, in Proceedings of WWW2007, pp. 111--120 (2007)
  • [Broder 93] Broder, A. Z.: Some applications of Rabin's fingerprinting method, in Sequences II: Methods in Communications, Security, and Computer Science, pp. 143--152 (1993)
  • [Broder 97] Broder, A. Z., Glassman, S. C., Manasse, M. S., and Zweig, G.: Syntactic clustering of the Web, in Proceedings of the 6th International Conference on World Wide Web, pp. 1157--1166 (1997)
  • [Charikar 02] Charikar, M. S.: Similarity estimation techniques from rounding algorithms, in STOC '02: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 380--388 (2002)
  • [Henzinger 06] Henzinger, M.: Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms, in Proceedings of 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrival, pp. 284--291 (2006)
more...
Terms in the title (3):
Terms in the title
Keywords automatically extracted from the title.

Return to Previous Page