Revisiting Multiple Pattern Matching

Authors

  • Robert Susik Institute of Applied Computer Science, Lodz University of Technology, 90 924 Lódź
  • Szymon Grabowski Institute of Applied Computer Science, Lodz University of Technology, 90 924 Lódź
  • Kimmo Fredriksson School of Computing, University of Eastern Finland, Kuopio

Keywords:

Text algorithms, pattern matching, multiple pattern matching, q-grams, bit-parallelism, compressed pattern matching

Abstract

We consider the classical exact multiple string matching problem. The proposed solution is based on a combination of a few ideas: using q-grams instead of single characters, pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of various alternatives to achieve the possibly best combination of techniques. The main contribution of this paper are different alphabet mapping methods that allow to reduce memory requirements and use larger q-grams. The experimental results show that the presented algorithm is competitive in most practical cases. One of the tests shows also that tailoring our scheme to search over a byte-encoded text results in speedups in comparison to searching over a plain text.

Downloads

Download data is not yet available.

Author Biographies

Robert Susik, Institute of Applied Computer Science, Lodz University of Technology, 90 924 Lódź

Robert Susik received his M.Sc.Eng. degree in Lodz University of Technology in 2012. During the study he participated in the student exchange programme (Erasmus) and studied at the University of Eastern Finland. Currently, he is a Ph.D. student at the Institute of Applied Computer Science of Lodz University of Technology. He is interested in data processing, data archiving and string matching algorithms.

Szymon Grabowski, Institute of Applied Computer Science, Lodz University of Technology, 90 924 Lódź

Szymon Grabowski received his M.Sc. degree in University of Lodz in 1996, Ph.D. degree in AGH University of Science and Technology in Cracow in 2003, and the habilitation degree in Systems Research Institute of Polish Academy of Sciences in Warsaw in 2011. His former research, including Ph.D. dissertation, involved nearest neighbor classification methods in pattern recognition, also with applications in image processing. Currently, his main interests are focused in string matching and text indexing algorithms, and data compression. Some of his particular research topics include various approximate string matching problems, compressed text indexes, and XML compression. He has published about 120 papers in journals and conferences. He is currently a professor at Institute of Applied Computer Science of Lodz University of Technology.

Kimmo Fredriksson, School of Computing, University of Eastern Finland, Kuopio

Kimmo Fredriksson received his computer science M.Sc. and Ph.D. degrees in University of Helsinki in 1997 and 2001, respectively. His research interests include wide variety of string matching problems as well as indexing techniques for searching in metric spaces. He has published about 60 papers in these topics in international conferences and journals. He has the position of adjunct professor in the University of Eastern Finland.

Downloads

Published

2019-12-30

How to Cite

Susik, R., Grabowski, S., & Fredriksson, K. (2019). Revisiting Multiple Pattern Matching. COMPUTING AND INFORMATICS, 38(4), 937–962. Retrieved from https://www.cai.sk/ojs/index.php/cai/article/view/2019_4_937