PAPER DIGEST
Most Influential CIKM 2008 Paper · 2026-03 edition

BNS Feature Scaling: An Improved Representation Over Tf-idf For Svm Text Classification

George Forman

Venue
ACM Conference on Information and Knowledge Management (CIKM) 2008
Recognition
Most Influential CIKM 2008 Paper (Rank No. 13)
Edition
2026-03
Impact factor
5
Certificate ID
7a88b390c53b15de

Abstract

In the realm of machine learning for text classification, TF-IDF is the most widely used representation for real-valued feature vectors. However, IDF is oblivious to the training class labels and naturally scales some features inappropriately. We replace IDF with Bi-Normal Separation (BNS), which has been previously found to be excellent at ranking words for feature selection filtering. Empirical evaluation on a benchmark of 237 binary text classification tasks shows substantially better accuracy and F-measure for a Support Vector Machine (SVM) by using BNS scaling. A wide variety of other feature representations were later tested and found inferior, as well as binary features with no scaling. Moreover, BNS scaling yielded better performance without feature selection, obviating the need for feature selection.

Download PDF certificate