PAPER DIGEST
Most Influential CIKM 2007 Paper · 2026-03 edition

Spam Filtering For Short Messages

Gordon V. Cormack; José Marí a Gó mez Hidalgo; Enrique Puertas Sá nz

Venue
ACM Conference on Information and Knowledge Management (CIKM) 2007
Recognition
Most Influential CIKM 2007 Paper (Rank No. 14)
Edition
2026-03
Impact factor
4
Certificate ID
018dd9f33146edef

Abstract

We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.

Download PDF certificate