Multi-modal Graph Contrastive Learning for Micro-video Recommendation

Zixuan Yi; Xi Wang; Iadh Ounis; Craig Macdonald

Venue: ACM SIGIR Conference (SIGIR) 2022
Recognition: Most Influential SIGIR 2022 Paper (Rank No. 9)
Edition: 2026-03
Impact factor: 4
Certificate ID: e1d86419367c2f89

Abstract

Recently micro-videos have become more popular in social media platforms such as TikTok and Instagram. Engagements in these platforms are facilitated by multi-modal recommendation systems. Indeed, such multimedia content can involve diverse modalities, often represented as visual, acoustic, and textual features to the recommender model. Existing works in micro-video recommendation tend to unify the multi-modal channels, thereby treating each modality with equal importance. However, we argue that these approaches are not sufficient to encode item representations with multiple modalities, since the used methods cannot fully disentangle the users' tastes on different modalities. To tackle this problem, we propose a novel learning method named Multi-Modal Graph Contrastive Learning (MMGCL), which aims to explicitly enhance multi-modal representation learning in a self-supervised learning manner. In particular, we devise two augmentation techniques to generate the multiple views of a user/item: modality edge dropout and modality masking. Furthermore, we introduce a novel negative sampling technique that allows to learn the correlation between modalities and ensures the effective contribution of each modality. Extensive experiments conducted on two micro-video datasets demonstrate the superiority of our proposed MMGCL method over existing state-of-the-art approaches in terms of both recommendation performance and training convergence speed.

Download PDF certificate