Abstract

During maintenance, developers spend a lot of time transforming existing code: refactoring, optimizing, and adding checks to make it more robust. Much of this work is the drudgery of identifying and replacing specific patterns, yet it resists automation, because of meaningful patterns are hard to automatically find. We present a technique for mining loop idioms, surprisingly probable semantic patterns that occur in loops, from big code to find meaningful patterns. First, we show that automatically identifiable patterns exist, in great numbers, with a large scale empirical study of loop over 25 MLOC. We find that loops in this corpus are simple and predictable: 90% of them have fewer than 15LOC and 90% have no nesting and very simple control structure. Encouraged by this result, we coil loops to abstract away syntactic diversity to define information rich loop idioms. We show that only 50 loop idioms cover 50% of the concrete loops. We show how loop idioms can help a tool developers identify and prioritize refactorings. We also show how our framework opens the door to data-driven tool and language design discovering opportunities to introduce new API calls and language constructs: loop idioms show that LINQ would benefit from an Enumerate operator, a result confirmed by the fact that precisely this feature is one of the most requested features on StackOverflo with 197 votes and 95k views.

Bio

Charles Sutton is a Reader (equivalent to Associate Professor: bit.ly/1W9UhqT) in Machine Learning at the University of Edinburgh. He has published over 50 papers in a broad range of applications of probabilistic machine learning and deep learning, including natural language processing (NLP), analysis of computer systems, software engineering, sustainable energy, and exploratory data analysis. His work in software engineering has won an ACM Distinguished Paper Award. His PhD is from the University of Massachusetts Amherst, and he has done postdoctoral work at the University of California Berkeley. He is currently Director of the EPSRC Centre for Doctoral Training in Data Science at the University of Edinburgh, and a Turing Fellow of the Alan Turing Institute.

This page was last modified on 17 Oct 2017.