Sciweavers

708 search results - page 15 / 142
» Identifying Content Blocks from Web Documents
Sort
View
ADC
2009
Springer
113views Database» more  ADC 2009»
16 years 1 months ago
Ranking-Constrained Keyword Sequence Extraction from Web Documents
Given a large volume of Web documents, we consider problem of finding the shortest keyword sequences for each of the documents such that a keyword sequence can be rendered to a g...
Ding-Yi Chen, Xue Li, Jing Liu, Xia Chen
KDD
2006
ACM
185views Data Mining» more  KDD 2006»
16 years 6 months ago
Understanding Content Reuse on the Web: Static and Dynamic Analyses
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Ricardo A. Baeza-Yates, Álvaro R. Pereira J...
163
Voted
WIDM
2004
ACM
15 years 12 months ago
Stylistic and lexical co-training for web block classification
Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classifica...
Chee How Lee, Min-Yen Kan, Sandra Lai
DOCENG
2008
ACM
15 years 8 months ago
Interactive office documents: a new face for web 2.0 applications
As the world wide web transforms from a vehicle of information dissemination and e-commerce transactions into a writable nexus of human collaboration, the Web 2.0 technologies at ...
John M. Boyer
ERCIMDL
2006
Springer
124views Education» more  ERCIMDL 2006»
15 years 10 months ago
Design and Selection Criteria for a National Web Archive
Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usua...
Daniel Gomes, Sérgio Freitas, Mário ...