Probabilistic Document Indexing From Relevance Feedback Data
Abstract
Based on the binary independence indexing model, we apply three new concepts for probabilistic document indexing from relevance feedback data: <ul> <li>Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation.</li> <li>Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of more complex document structures or different types of terms (e.g. single words and noun phrases).</li> <li>Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information.</li> </ul> We give experimental results for five test collections which show improvements over other indexing methods.