Tofu: Distributing Tensor Computation Automatically for Deep Learning Systems

Speaker

Dr. Minjie Wang, New York University

Time

2017-12-22 14:00:00 ~ 2017-12-22 15:30:00

Location

SEIEE-3-412

Host

Minyi Guo, Weinan Zhang

Abstract

We present Tofu, which improves the scaling performance and programmability of a tensor dataflow-based DNN system by performing automatic distribution Tofu can explore a spectrum of distribution strategies, including data parallelism, model parallelism and others in between Such exploration is enabled through the development of tensor description language (TDL), which allows Tofu to discover all feasible ways of distributing an operator by partitioning its tensor along different dimensions To find the best strategy with minimal communication cost for the overall dataflow graph, Tofu uses a novel search algorithm that exploits the layer-by-layer characteristics of neural network computation We implement Tofu in MXNet and show its performance benefits for several DNN applications

Bio

Minjie Wang is a forth year Ph.D. student at New York University and a member of the NYU systems group. Before joining NYU, Minjie got his master's and bachelor's at Shanghai Jiao Tong University. He also spent two years as a research intern in Microsoft Research Asia, where he found his research interests in machine learning systems and built his first deep learning system: Minerva. Minjie was also one of the founding members of the Deep Machine Learning Community. He is one of the main developers of the MXNet, NNVM, and MinPy projects. He is the recipient of 2016 NVIDIA Graduate Fellowship.

Home

Research Areas

Admission

Students

Open Positions / Job Opportunity