Machine learning with big data often involves very large optimization models. For distributed optimization over a cluster of machines, the number of parameters or optimization variables can be too large for frequent communication and synchronization. In order to address this difficulty, we can set up dedicated parameter servers to store different subsets of the model parameters, and update them at different machines simultaneously in an asynchronous manner. In this talk, we focus on distributed empirical risk minimization with convex linear models, and propose a family of randomized algorithms called DSCOVR (Doubly Stochastic Coordinate Optimization with Variance Reduction). These algorithms exploit the simultaneous partition in both data and model to gain parallelism, and employ stochastic variance reduction techniques to achieve fast convergence rates. This is joint work with Adams Wei Yu, Qihang Lin and Weizhu Chen.
Lin Xiao is a principal researcher at Microsoft Research, located in Redmond, Washington. He obtained his PhD in Aeronautics and Astronautics from Stanford University in 2004, and spent two years as a postdoctoral fellow at California Institute of Technology before joining Microsoft. His current research interests include large-scale optimization, machine learning, randomized algorithm, and parallel and distributed computing.