Fair k-Center Clustering in MapReduce and Streaming Settings
- Autor(en)
- Suman Bera, Syamantak Das, Sainyam Galhotra, Sagar Kale
- Abstrakt
Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.
- Organisation(en)
- Forschungsgruppe Theory and Applications of Algorithms
- Externe Organisation(en)
- Katana Graph, Indraprastha Institute of Information Technology Delhi, University of Chicago
- Seiten
- 1414-1422
- Anzahl der Seiten
- 9
- DOI
- https://doi.org/10.1145/3485447.3512188
- Publikationsdatum
- 04-2022
- Peer-reviewed
- Ja
- ÖFOS 2012
- 102031 Theoretische Informatik
- Schlagwörter
- ASJC Scopus Sachgebiete
- Software, Computer Networks and Communications
- Link zum Portal
- https://ucrisportal.univie.ac.at/de/publications/d2b68bd7-ec5d-4986-a371-a43901837783