A MapReduce Based Distributed LSI for Scalable Information Retrieval

Authors

  • Yang Liu School of Electrical Engineering and Information, Sichuan University
  • Maozhen Li School of Engineering and Design, Brunel University, Uxbridge, UB8 3PH
  • Mukhtaj Khan School of Engineering and Design, Brunel University, Uxbridge, UB8 3PH
  • Man Qi Department of Computing, Canterbury Christ Church University, Canterbury, Kent, CT1 1QU

Keywords:

Information retrieval, latent semantic indexing, MapReduce, load balancing, genetic algorithms

Abstract

Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a MapReduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.

Downloads

Download data is not yet available.

Downloads

Published

2014-06-24

How to Cite

Liu, Y., Li, M., Khan, M., & Qi, M. (2014). A MapReduce Based Distributed LSI for Scalable Information Retrieval. COMPUTING AND INFORMATICS, 33(2), 259–280. Retrieved from https://www.cai.sk/ojs/index.php/cai/article/view/995

Most read articles by the same author(s)