PAPER DIGEST
Most Influential WWW 2002 Paper · 2026-03 edition

A Flexible Learning System For Wrapping Tables And Lists In HTML Documents

William W. Cohen; Matthew Hurst; Lee S. Jensen

Venue
ACM Web Conference (WWW) 2002
Recognition
Most Influential WWW 2002 Paper (Rank No. 15)
Edition
2026-03
Impact factor
6
Certificate ID
d7ecd4641e45935f

Abstract

A program that makes an existing website look like a database is called a <i>wrapper</i>. <i>Wrapper learning</i> is the problem of learning website wrappers from examples. We present a wrapper-learning system called WL<sup>2</sup> that can exploit several different representations of a document. Examples of such different representations include DOM-level and token-level representations, as well as two-dimensional geometric views of the rendered page (for tabular data) and representations of the visual appearance of text asm it will be rendered. Additionally, the learning system is modular, and can be easily adapted to new domains and tasks. The learning system described is part of an "industrial-strength" wrapper management system that is in active use at WhizBang Labs. Controlled experiments show that the learner has broader coverage and a faster learning rate than earlier wrapper-learning systems.

Download PDF certificate