Yaoqing Yang (UC Berkeley)

Date: Oct 9, 2020

Title and Abstract

Benchmarking Semi-supervised Federated Learning

Abstract: Federated learning promises to use the computational power of edge devices while maintaining user data privacy. Current frameworks, however, typically make the unrealistic assumption that the data stored on user devices come with ground truth labels, while the server has no data. In this work, we consider the more realistic scenario where the users have only unlabeled data, and the server has a limited amount of labeled data. We introduce a novel semi-supervised federated learning (SSFL) framework to characterize this scenario, and we make several careful design choices to address two problems inside this framework. First, we find that batch normalization (BN) introduces instability, measured by the variance of learned BN statistics, both in the semi-supervised and in the non-iid settings, where we consider non-iid in the sense of different distributions of classes at different users. We analyze this problem and show that the so-called group normalization (GN) can significantly reduce instability and improve accuracy. Second, through extensive empirical evaluations in various scenarios, we find that the solution with GN still degrades significantly when the number of users is large. We thoroughly analyze the problem and propose a novel grouping-based model averaging method to replace the widely used FedAvg averaging method. Overall, we find that our grouping-based averaging, combined with GN, achieves better generalization accuracy even when compared to four existing supervised federated learning algorithms.