Privacy Preserving Multi-server k-means Computation over Horizontally Partitioned Data

Ghosal, R and Chatterjee, S (2018) Privacy Preserving Multi-server k-means Computation over Horizontally Partitioned Data. In: 14th International Conference on Information Systems Security, ICISS 2018, 17 - 19 December 2018, Bangalore, pp. 189-208.

Preview

PDF
ICISS 2018_11281_189-208_2018.pdf - Published Version
Download (352kB) | Preview

Official URL: https://doi.org/10.1007/978-3-030-05171-6_10

Abstract

The k-means clustering is one of the most popular clustering algorithms in data mining. Recently a lot of research has been concentrated on the algorithm when the data-set is divided into multiple parties or when the data-set is too large to be handled by the data owner. In the latter case, usually some servers are hired to perform the task of clustering. The data set is divided by the data owner among the servers who together compute the k-means and return the cluster labels to the owner. The major challenge in this method is to prevent the servers from gaining substantial information about the actual data of the owner. Several algorithms have been designed in the past that provide cryptographic solutions to perform privacy preserving k-means. We propose a new method to perform k-means over a large set of data using multiple servers. Our technique avoids heavy cryptographic computations and instead we use a simple randomization technique to preserve the privacy of the data. The k-means computed has essentially the same efficiency and accuracy as the k-means computed over the original data-set without any randomization. We argue that our algorithm is secure against honest-but-curious and non-colluding adversary.

Item Type:	Conference Paper
Publication:	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher:	Springer Verlag
Additional Information:	The copyright for this article belongs to the Authors.
Keywords:	Cryptography; Data mining; Data privacy; Information systems; Information use; Random processes, Cryptographic computations; Horizontally partitioned data; K - means clustering; K-means; Multiple servers; Privacy preserving; Privacy preserving computation; Randomization techniques, Clustering algorithms
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	02 Sep 2022 04:08
Last Modified:	02 Sep 2022 04:08
URI:	https://eprints.iisc.ac.in/id/eprint/76354

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India