PAPER DIGEST
Most Influential ICML 2009 Paper · 2026-03 edition

Identifying Suspicious URLs: An Application Of Large-scale Online Learning

Justin Ma; Lawrence K. Saul; Stefan Savage; Geoffrey M. Voelker

Venue
International Conference on Machine Learning (ICML) 2009
Recognition
Most Influential ICML 2009 Paper (Rank No. 9)
Edition
2026-03
Impact factor
7
Certificate ID
807e4168fb15d0b8

Abstract

This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. We show that this application is particularly appropriate for online algorithms as the size of the training data is larger than can be efficiently processed in batch <i>and</i> because the distribution of features that typify malicious URLs is changing continuously. Using a real-time system we developed for gathering URL features, combined with a real-time source of labeled URLs from a large Web mail provider, we demonstrate that recently-developed online algorithms can be as accurate as batch techniques, achieving classification accuracies up to 99% over a balanced data set.

Download PDF certificate