Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying

Xingjun Zhang; Guofeng Zhu; Endong Wang; Scott Fowler; Xiaoshe Dong

Authors

Xingjun Zhang Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049
Guofeng Zhu Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049
Endong Wang Inspur(Beijing) Electronic Information Industry Co. Ltd., 100085, Beijing
Scott Fowler Department of Science and Technology, Linköping University, Campus Norrköping, SE-601 74
Xiaoshe Dong Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049

Keywords:

Data de-duplication, self-adaptive, FastCDC

Abstract

The data de-duplication system not only pursues the high de-duplication rate, which refers to the aggregate reduction in storage requirements gained from de-duplication, but also the de-duplication speed. To solve the problem of random parameter-setting brought by Content Defined Chunking (CDC), a self-adaptive data chunking algorithm is proposed. The algorithm improves the de-duplication rate by conducting pre-processing de-duplication to the samples of the classified files and then selecting the appropriate algorithm parameters. Meanwhile, FastCDC, a kind of content-based fast data chunking algorithm, is adopted to solve the problem of low de-duplication speed of CDC. By introducing de-duplication factor and acceleration factor, FastCDC can significantly boost de-duplication speed while not sacrificing the de-duplication rate through adjusting these two parameters. The experimental results demonstrate that our proposed method can improve the de-duplication rate by about 5 %, while FastCDC can obtain the increase of de-duplication speed by 50 % to 200 % only at the expense of less than 3 % de-duplication rate loss.

Downloads

Download data is not yet available.

Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information

Make a Submission

Keywords