PAPER DIGEST
Most Influential SIGIR 2013 Paper · 2026-03 edition

Sumblr: Continuous Summarization Of Evolving Tweet Streams

Lidan Shou; Zhenhua Wang; Ke Chen; Gang Chen

Venue
ACM SIGIR Conference (SIGIR) 2013
Recognition
Most Influential SIGIR 2013 Paper (Rank No. 10)
Edition
2026-03
Impact factor
4
Certificate ID
f360f3deae37fd24

Abstract

With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.

Download PDF certificate