New Algorithm for Clustering Distributed Data Using k-Means

Ahmed M. Khedr; Raj K. Bhatnagar

Authors

Ahmed M. Khedr Computer Science Department, Faculty of Sciences, University of Sharjah, Sharjah, UAE & Mathematics Department, Faculty of Science, Zagazig University, Zagazig
Raj K. Bhatnagar School of Electronic and Computing Systems, University of Cincinnati, Cincinnati, 45221

Keywords:

Data privacy, decomposable algorithm, k-means clustering, vertically and horizontally distributed data

Abstract

The internet era and high speed networks have ushered in the capabilities to have ready access to large amounts of geographically distributed data. Individuals, businesses, and governments recognize the value of this available resource to those who can transform the data into information. These databases, though valuable as individual entities, become significantly more valuable when they function as parts of a federated database and their data can be aggregated for collective mining or computations. This requires new algorithms to shift their focus from working with single databases to efficiently working with federated databases. In this paper, we propose a new decomposable version of the popular k-means clustering algorithm that works in this desired manner with a set of networked databases. We show that it is possible to perform global computation in a reasonably secure manner for either horizontally or vertically distributed databases. The computation is completed by only exchanging a few local summaries among the databases. An empirical and analytical validation of our results is also presented.

Downloads

Download data is not yet available.

New Algorithm for Clustering Distributed Data Using k-Means

Authors

Keywords:

Abstract

Downloads

Downloads

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Keywords