Tofu: Distributing Tensor Computation Automatically for Deep Learning Systems
Speaker
Dr. Minjie Wang, New York University
Time
2017-12-22 14:00:00 ~ 2017-12-22 15:30:00
Location
SEIEE-3-412
Host
Minyi Guo, Weinan Zhang
Abstract
We present Tofu, which improves the scaling performance and programmability of a tensor dataflow-based DNN system by performing automatic distribution Tofu can explore a spectrum of distribution strategies, including data parallelism, model parallelism and others in between Such exploration is enabled through the development of tensor description language (TDL), which allows Tofu to discover all feasible ways of distributing an operator by partitioning its tensor along different dimensions To find the best strategy with minimal communication cost for the overall dataflow graph, Tofu uses a novel search algorithm that exploits the layer-by-layer characteristics of neural network computation We implement Tofu in MXNet and show its performance benefits for several DNN applications
Bio
Minjie Wang is a forth year Ph.D. student at New York University and a member of the NYU systems group. Before joining NYU, Minjie got his master's and bachelor's at Shanghai Jiao Tong University. He also spent two years as a research intern in Microsoft Research Asia, where he found his research interests in machine learning systems and built his first deep learning system: Minerva. Minjie was also one of the founding members of the Deep Machine Learning Community. He is one of the main developers of the MXNet, NNVM, and MinPy projects. He is the recipient of 2016 NVIDIA Graduate Fellowship.