Streamed Data Analysis Using Adaptable Bloom Filter

Amritpal Singh; Shalini Batra

Authors

Amritpal Singh Department of Computer Science and Engineering, Thapar university, Patiala, Punjab
Shalini Batra Department of Computer Science and Engineering, Thapar university, Patiala, Punjab

Keywords:

Bloom filter, partition hashing, double hashing, Kalmann filter

Abstract

With the coming up of plethora of web applications and technologies like sensors, IoT, cloud computing, etc., the data generation resources have increased exponentially. Stream processing requires real time analytics of data in motion and that too in a single pass. This paper proposes a framework for hourly analysis of streamed data using Bloom filter, a probabilistic data structure where hashing is done by using a combination of double hashing and partition hashing; leading to less inter-hash function collision and decreased computational overhead. When size of incoming data is not known, use of Static Bloom filter leads to high collision rate if data flow is too much, and wastage of storage space if data is less. In such cases it is difficult to determine the optimal Bloom filter parameters (m, k) in advance, thus a target threshold for false positives (f_p) cannot be guaranteed. To accommodate the growing data size, one of the major requirements in Bloom filter is that filter size m should grow dynamically. For predicting the array size of Bloom filter Kalman filter has been used. It has been experimentally proved that proposed Adaptable Bloom Filter (ATBF) efficiently performs peak hour analysis, server utilization and reduces the time and space required for querying dynamic datasets.

Downloads

Download data is not yet available.

Streamed Data Analysis Using Adaptable Bloom Filter

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Keywords