PAPER DIGEST
Most Influential CIKM 2002 Paper · 2026-03 edition

On Arabic Search: Improving The Retrieval Effectiveness Via A Light Stemming Approach

Mohammed Aljlayl; Ophir Frieder

Venue
ACM Conference on Information and Knowledge Management (CIKM) 2002
Recognition
Most Influential CIKM 2002 Paper (Rank No. 13)
Edition
2026-03
Impact factor
5
Certificate ID
b96bf6fed1546358

Abstract

The inflectional structure of a word impacts the retrieval accuracy of information retrieval systems of Latin-based languages. We present two stemming algorithms for Arabic information retrieval systems. We empirically investigate the effectiveness of surface-based retrieval. This approach degrades retrieval precision since Arabic is a highly inflected language. Accordingly, we propose root-based retrieval. We notice a statistically significant improvement over the surface-based approach. Many variant word senses are based on an identical root; thus, the root-based algorithm creates invalid conflation classes that result in an ambiguous query which degrades the performance by adding extraneous terms. To resolve ambiguity, we propose a novel light-stemming algorithm for Arabic texts. This automatic rule-based stemming algorithm is not as aggressive as the root extraction algorithm. We show that the light stemming algorithm significantly outperforms the root-based algorithm. We also show that a significant improvement in retrieval precision can be achieved with light inflectional analysis of Arabic words.

Download PDF certificate