PAPER DIGEST
Most Influential CIKM 2001 Paper · 2026-03 edition

Using LSI For Text Classification In The Presence Of Background Text

Sarah Zelikovitz; Haym Hirsh

Venue
ACM Conference on Information and Knowledge Management (CIKM) 2001
Recognition
Most Influential CIKM 2001 Paper (Rank No. 11)
Edition
2026-03
Impact factor
4
Certificate ID
aa2f8f60ae4f820a

Abstract

This paper presents work that uses Latent Semantic Indexing (LSI) for text classification. However, in addition to relying on labeled training data, we improve classification accuracy by also using unlabeled data and other forms of available "background" text in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available and relevant background text. We report the performance of this approach on data sets both with and without the inclusion of the background text, and compare our work to other efforts that can incorporate unlabeled data and other background text in the classification process.

Download PDF certificate