New Algorithm for Clustering Distributed Data Using k-Means

Authors

  • Ahmed M. Khedr Computer Science Department, Faculty of Sciences, University of Sharjah, Sharjah, UAE & Mathematics Department, Faculty of Science, Zagazig University, Zagazig
  • Raj K. Bhatnagar School of Electronic and Computing Systems, University of Cincinnati, Cincinnati, 45221

Keywords:

Data privacy, decomposable algorithm, k-means clustering, vertically and horizontally distributed data

Abstract

The internet era and high speed networks have ushered in the capabilities to have ready access to large amounts of geographically distributed data. Individuals, businesses, and governments recognize the value of this available resource to those who can transform the data into information. These databases, though valuable as individual entities, become significantly more valuable when they function as parts of a federated database and their data can be aggregated for collective mining or computations. This requires new algorithms to shift their focus from working with single databases to efficiently working with federated databases. In this paper, we propose a new decomposable version of the popular k-means clustering algorithm that works in this desired manner with a set of networked databases. We show that it is possible to perform global computation in a reasonably secure manner for either horizontally or vertically distributed databases. The computation is completed by only exchanging a few local summaries among the databases. An empirical and analytical validation of our results is also presented.

Downloads

Download data is not yet available.

Downloads

How to Cite

Khedr, A. M., & Bhatnagar, R. K. (2015). New Algorithm for Clustering Distributed Data Using k-Means. COMPUTING AND INFORMATICS, 33(4), 943–964. Retrieved from https://www.cai.sk/ojs/index.php/cai/article/view/2809