Analyzing Programs in the Era of Software 2.0
Xin Zhang, Peking University
2021-05-10 10:00:00 ~ 2021-05-10 11:30:00
With the software industry experiencing a major shift to machine learning, the programming systems community is facing both opportunities and challenges. On one hand, advances in machine learning provide new toolkits to build better programming systems to ensure software quality. On the other hand, as machine learning programs are increasingly being used in critical applications, it is now paramount to ensure their quality as well. In this talk, I will describe a set of new analysis techniques that address these opportunities and challenges.
First, I will talk about a data-driven framework for improving program analyses. It enables both online and offline learning by incorporating probabilities in the representation, which is conventionally only logical. While the logical part still encodes the expert knowledge from the analysis designer and ensures correctness, the probabilistic part now offers new abilities to handle uncertainties. Our approach reduces the number of false positives by 70% for foundational program analyses like datarace detection and pointer analysis. In addition, our inference engine can solve problems containing up to 10^30 clauses from various domains including program analysis, statistical AI, and Big Data analytics.
While existing program analyses work well with conventional programs, they cannot be applied to analyzing novel properties that arise in machine learning. To address this challenge, we have developed program analyses for emerging properties such as interpretability and fairness. Our interpretability analysis is the first that uses corrections as actionable feedback to judgments made by a neural network. And our fairness analysis can scale to models that are more than five orders of magnitude larger than the largest previously-verified model. To enable building machine learning programs that satisfy these properties by construction, we have also developed a probabilistic programming language that supports distributional inference and causal inference.
Xin Zhang is an assistant professor in the department of computer science and technology at Peking University. His research areas are programming languages and software engineering, with a focus on the interplay between programming systems and machine learning. On one hand, he leverages machine learning ideas to improve the usability of programming systems. On the other hand, he develops new analyses and languages to ensure the quality of machine learning programs. His work has received Distinguished Paper Awards from PLDI'14 and FSE'15. Xin was a postdoctoral associate at MIT CSAIL from 2017 to 2020 and received his Ph.D. from Georgia Tech in 2017 which was partly supported by a Facebook Fellowship.