Lightweight Fingerprints for Fast Approximate Keyword Matching Using Bitwise Operations

Aleksander Cisłak; Szymon Grabowski

doi:10.31577/cai_2019_2_367

Authors

Aleksander Cisłak Lodz University of Technology, Institute of Applied Computer Science, 90–924 Lódź, Poland
Szymon Grabowski Lodz University of Technology, Institute of Applied Computer Science, 90–924 Lódź, Poland

DOI:

https://doi.org/10.31577/cai_2019_2_367

Keywords:

Fingerprint, keyword matching, approximate matching, bitwise

Abstract

We aim to speed up approximate keyword matching with the use of a lightweight, fixed-size block of data for each string, called a fingerprint. These work in a similar way to hash values; however, they can be also used for matching with errors. They store information regarding symbol occurrences using individual bits, and they can be compared against each other with a constant number of bitwise operations. In this way, certain strings can be deduced to be at least within the distance k from each other (using Hamming or Levenshtein distance) without performing an explicit verification. We show experimentally that for a preprocessed collection of strings, fingerprints can provide substantial speedups for k = 1, namely over 2.5 times for the Hamming distance and over 30 times for the Levenshtein distance. Tests were conducted on synthetic and real-world English and URL data.

Downloads

Download data is not yet available.

Lightweight Fingerprints for Fast Approximate Keyword Matching Using Bitwise Operations

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Keywords