PAPER DIGEST
Most Influential SIGIR 2005 Paper · 2026-03 edition

Multi-label Informed Latent Semantic Indexing

Kai Yu; Shipeng Yu; Volker Tresp

Venue
ACM SIGIR Conference (SIGIR) 2005
Recognition
Most Influential SIGIR 2005 Paper (Rank No. 10)
Edition
2026-03
Impact factor
5
Certificate ID
5036f0b9de8e7cf0

Abstract

Latent semantic indexing (LSI) is a well-known unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values in the training data set. This is of particular importance in applications with <i>multiple labels</i>, in which each document can belong to several categories simultaneously. In this paper we introduce the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs. The recovered "latent semantics" thus incorporate the human-annotated category information and can be used to greatly improve the prediction accuracy. Empirical study based on two data sets, Reuters-21578 and RCV1, demonstrates very encouraging results.

Download PDF certificate