Fair k-Center Clustering in MapReduce and Streaming Settings

Autor(en)
Suman Bera, Syamantak Das, Sainyam Galhotra, Sagar Kale
Abstrakt

Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.

Organisation(en)
Forschungsgruppe Theory and Applications of Algorithms
Externe Organisation(en)
Katana Graph, Indraprastha Institute of Information Technology Delhi, University of Chicago
Seiten
1414-1422
Anzahl der Seiten
9
DOI
https://doi.org/10.1145/3485447.3512188
Publikationsdatum
04-2022
Peer-reviewed
Ja
ÖFOS 2012
102031 Theoretische Informatik
Schlagwörter
ASJC Scopus Sachgebiete
Software, Computer Networks and Communications
Link zum Portal
https://ucris.univie.ac.at/portal/de/publications/fair-kcenter-clustering-in-mapreduce-and-streaming-settings(d2b68bd7-ec5d-4986-a371-a43901837783).html