N&O seminar: Marten van Dijk (CWI)

Everyone is welcome to attend the next N&O seminar with Marten van Dijk with the title 'Asynchronous SGD over Distributed Local Data Sets with Differential Privacy.

When
16 Mar 2022 from 11 a.m. to 16 Mar 2022 noon CET (GMT+0100)
Where
L017 and online
Web
Add

Everyone is welcome to attend the next N&O seminar with Marten van Dijk, group leader Computer Security at CWI. The title of the seminar is 'Asynchronous SGD over Distributed Local Data Sets with Differential Privacy'.

The talk will take place in L017 at CWI, along with zoom support for remote participants. For more information and registration to get the Zoom link via e-mail, please contact Willem Feijen (willem.feijen at cwi.nl), Samarth Tiwari (samarth.tiwari at cwi.nl) or Sven Polak (sven.polak at cwi.nl).

Abstract: We first explain Hogwild!, which implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations, and update shared state that represents a jointly learned (global) model. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). Our convergence analysis shows O(sqrt{K}) communication rounds, where K is the total number of gradient computations across all local compute nodes, for strongly convex problems for heterogeneous data. It does not use the bounded gradient assumption and is tight in this setting.

Next, we show that for attaining (epsilon,delta)-differential privacy sigma can be chosen equal to sqrt{2(epsilon +ln(1/delta))/epsilon} for epsilon=Omega(T/N^2), where T is the total number of rounds and N is equal to the size of the local data set. In many existing machine learning problems, N is always large and T=O(N). Hence, sigma becomes ``independent'' of any T=O(N) choice with epsilon=Omega(1/N). This means that our sigma only depends on N rather than T. This differential privacy characterization allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of epsilon and delta) in such a way to optimize the anticipated utility (test accuracy) the most. This ability of planning ahead together with sigma's independence of T leads to an adaptive DP-SGD algorithm that allows a client to balance its privacy budget with the accuracy of the learned global model based on local test data.

The presentation is based on joint work:

N. Nguyen, T. Nguyen, Q. Tran-Dinh, L. M. Nguyen, P. H. Nguyen, and M. van Dijk. Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes, AISTATS 2021, and

M. van Dijk, N. V. Nguyen, T. N. Nguyen, L. M. Nguyen, P. H. Nguyen. Bringing Differential Private SGD to Practice: On the Independence of Gaussian Noise and the Number of Training Rounds, CoRR abs/2102.09030 (2021)