Learning to Find Bugs
Abstract
Automated bug detection, e.g., through pattern-based static analysis,
is an increasingly popular technique to find programming errors.
Traditionally, bug detectors are program analyses that are manually
written and carefully tuned by an analysis expert. Unfortunately, the
huge amount of possible bug patterns makes it difficult to cover more
than a small fraction of all bugs. This talk presents a new approach
toward creating bug detectors. The key idea is to replace manually
writing a program analysis with training a machine learning model that
distinguishes buggy from non-buggy code. We present a general
framework to create large amounts of positive and negative training
examples, train a model to distinguish these two, and to use the
trained model for identifying anomalies in previously unseen code. As
a proof of concept, we create five bug detectors for JavaScript that
find a diverse set of programming mistakes, e.g., accidentally swapped
function arguments, incorrect assignments, and incorrect binary
operations. To find bugs, the trained models use information that is
usually discarded by program analyses, such as identifier names of
variables and functions. Our evaluation shows that the trained bug
detectors detect real-world bugs with a precision and recall that is
as high or even higher than that of manually tuned analyses.
Bio
Michael Pradel is an assistant professor at TU Darmstadt, which he
joined after a PhD at ETH Zurich and a post-doc at UC Berkeley. His
research interests span software engineering and programming
languages, with a focus on tools and techniques for building reliable,
efficient, and secure software. In particular, he is interested in
dynamic program analysis, test generation, concurrency, performance
profiling, and JavaScript-based web applications.