PAPER DIGEST
Most Influential SIGIR 2017 Paper · 2026-03 edition

Attentive Collaborative Filtering: Multimedia Recommendation With Item- And Component-Level Attention

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, Tat-Seng Chua

Venue
ACM SIGIR Conference (SIGIR) 2017
Recognition
Most Influential SIGIR 2017 Paper (Rank No. 2)
Edition
2026-03
Impact factor
8
Certificate ID
bff4cddbbb68eb94

Abstract

Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions is 1/0 binary implicit feedback (<i>e.g.</i>, photo likes, video views, song downloads, etc.), which can be collected at a larger scale with a much lower cost than explicit feedback (<i>e.g.</i>, product ratings). However, the majority of existing collaborative filtering (CF) systems are not well-designed for multimedia recommendation, since they ignore the implicitness in users' interactions with multimedia content. We argue that, in multimedia recommendation, there exists <i>item</i>- and <i>component-level</i> implicitness which blurs the underlying users' preferences. The item-level implicitness means that users' preferences on items (<i>e.g.</i> photos, videos, songs, etc.) are unknown, while the component-level implicitness means that inside each item users' preferences on different components (<i>e.g.</i> regions in an image, frames of a video, etc.) are unknown. For example, a 'view'' on a video does not provide any specific information about how the user likes the video (<i>i.e.</i>item-level) and which parts of the video the user is interested in (<i>i.e.</i>component-level). In this paper, we introduce a novel <i>attention</i> mechanism in CF to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF). Specifically, our attention model is a neural network that consists of two attention modules: the component-level attention module, starting from any content feature extraction network (<i>e.g.</i> CNN for images/videos), which learns to select informative components of multimedia items, and the item-level attention module, which learns to score the item preferences. ACF can be seamlessly incorporated into classic CF models with implicit feedback, such as BPR and SVD++, and efficiently trained using SGD. Through extensive experiments on two real-world multimedia Web services: Vine and Pinterest, we show that ACF significantly outperforms state-of-the-art CF methods.

Download PDF certificate