Efficiency Optimization in Distributed Computing Systems
Speaker
Chen Chen, Huawei Hong Kong Research Center
Time
2021-08-27 10:00:00 ~ 2021-08-27 11:30:00
Location
电信群楼1-418A会议室;腾讯会议(会议号码:206726851 参会密码为:357976 )
Host
金海明
Abstract
With the burst of data volume and computation complexity, modern applications are increasingly served in a distributed manner. For operators of such distributed computing systems, resource efficiency is often a primary concern. Since typical distributed computing systems host a mixture of data-parallel jobs, efficiency optimization can be conducted at both inter-job and intra-job levels. At inter-job level, efficiency optimization means to design better job scheduling algorithms on resource allocation and service order, so as to reduce the overall job completion time. At intra-job level, efficiency optimization means to speed up job execution by mitigating the performance bottlenecks caused by network collisions or stragglers. In particular, for distributed machine learning jobs, efficiency optimization can also be achieved by system-algorithm co-designs exploiting the statistical redundancy in model convergence process. In this talk, I will cover my research works covering all these aspects and introduce my research visions for the future.
Bio
Chen Chen is a researcher at Huawei Hong Kong Research Center. He obtained his Ph.D. degree in 2018 from Department of Computer Science and Engineering, Hong Kong University of Science and Technology, under the supervision of Prof. Bo Li and Prof. Wei Wang. Prior to that, he received the bachelor degree in 2014 from Department of Automation, Tsinghua University. His recent research interests include cluster scheduling, distributed machine learning and federated learning.