Neural Networks Through the Lens of the Hessian


Zhewei Yao,University of California at Berkeley


2020-01-08 15:00:00 ~ 2020-01-08 16:30:00


Room 3-412 SEIEE Building


Jingwen Leng,Assistant Professor, John Hopcroft Center for Computer Science

We will describe recent results in applying second-order methods for training Deep Neural Networks. We will first introduce a new systematic approach for model compression using second-order information, resulting in unprecedented small models for a range of challenging tasks in image classification, object detection, and natural language processing, exceeding *all* industry-level results including expensive Auto-ML based methods, which are searched at a massive scale.
Second, we will address a common misconception that computing second-order information is slow by presenting a new scalable framework for computing Hessian information. We will show that the Hessian information could be used during training with little overhead, resulting in a speed-up of 3.58x in total training time as compared to state-of-the-art first order based methods for ResNet18 training on ImageNet.
Finally, I will discuss some future directions involving stochastic second-order methods for accelerating the training of neural networks and how the loss landscape curvature could be used as a reward function for searching new architectures.
Zhewei Yao a Ph.D. student in the BAIR, RISELab (former AMPLab), BDD, and Math Department at University of California at Berkeley. He is advised by Michael Mahoney and he is also working very closely with Kurt Keutzer. His research interest lies in computing statistics, optimization and machine learning. Currently, He is interested in leveraging tools from randomized linear algebra to provide efficient and scalable solutions for large-scale optimization and learning problems. He is also working on the theory and application of deep learning. Before joining UC Berkeley, He received my B.S. in Math from Zhiyuan Honor College at Shanghai Jiao Tong University
© John Hopcroft Center for Computer Science, Shanghai Jiao Tong University

邮箱:jhc@sjtu.edu.cn 电话:021-54740299