Incorporating structured text retrieval into the extended Boolean model

Authors

  • Mathys C. du Plessis
  • Gideon de V. de Kock

Abstract

Conventional information retrieval models are inappropriate for use in databases containing semi-structured biographical data. A hybrid algorithm that effectively addresses many of the problems in searching biographical databases is presented in this article. An overview of applicable structured text retrieval algorithms is given, with focus specifically on the tree matching model. Small adaptations to the Extended Boolean Model, to make it more applicable to biographical databases, are described. The adaptation of tree matching models to the hierarchical nature of data in a person record is described and a distance function between query and record is defined. A hybrid model between the Extended Boolean Model and the adapted Tree Matching Model is then presented. A fast ranking algorithm appropriate for general searches and a more effective (but more resource intensive) algorithm for more advanced searches is given. It is shown how dates can be incorporated in the hybrid model to create a more powerful search algorithm. The hybrid algorithm can be used to rank records in descending order of relevance to a user's query.

Downloads

Download data is not yet available.

Downloads

Published

2012-01-26

How to Cite

du Plessis, M. C., & de Kock, G. de V. (2012). Incorporating structured text retrieval into the extended Boolean model. COMPUTING AND INFORMATICS, 28(5), 581–597. Retrieved from http://www.cai.sk/ojs/index.php/cai/article/view/50