Sciweavers

1052 search results - page 22 / 211
» Improved CHAID algorithm for document structure modelling
Sort
View
163
Voted
ISAAC
2005
Springer
120views Algorithms» more  ISAAC 2005»
15 years 12 months ago
Improved Algorithms for Largest Cardinality 2-Interval Pattern Problem
Abstract The 2-Interval Pattern problem is to find the largest constrained pattern in a set of 2-intervals. The constrained pattern is a subset of the given 2-intervals such that ...
Hao Yuan, Linji Yang, Erdong Chen
197
Voted
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
16 years 1 months ago
Robust web extraction: an approach based on a probabilistic tree-edit model
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Nilesh N. Dalvi, Philip Bohannon, Fei Sha
186
Voted
CIKM
2009
Springer
16 years 28 days ago
Effective and efficient structured retrieval
Search engines that support structured documents typically support structure created by the author (e.g., title, section), and may also support structure added by an annotation pr...
Le Zhao, Jamie Callan
272
Voted
CIKM
2009
Springer
16 years 28 days ago
Improving web page classification by label-propagation over click graphs
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...
166
Voted
TREC
2004
15 years 7 months ago
Language Models for Searching in Web Corpora
: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
Jaap Kamps, Gilad Mishne, Maarten de Rijke