Automatic Feature Generation for Predicting Program Properties

Abstract

We present a novel approach for automatic feature generation for predicting program properties. Our approach automatically produces features that can capture long-distance syntactic relationships between program elements. The features are purely syntactic, and the method is useful for any programming language. We propose to represent program elements based on relations captured in an Abstract Syntax Tree (AST). We show that this representation is general and can: 1. cover a number of different prediction tasks, 2. drive two different learning algorithms (for both generative and discriminative models), and 3. work across different programming languages. 

 We evaluate our approach on the tasks of predicting variable names, method names, and types of expressions. We use the generated features to drive both CRF-based and word2vec-based learning, for programs of four languages: JavaScript, Java, Python and C#. Our evaluation shows that automatically generated features based on our representation can capture semantic similarities, achieve good performance in several tasks and languages, and decrease the error rate over an existing method by 18.2%. 

Bio

After serving 7 years as an officer on board a missile boat in the Israeli Navy, Uri graduated summa cum laude from the Technion, Israel Institute of Technology, as an alumnus of the Rothschild Technion Program for Excellence. Between 2014-2016 he worked at Microsoft R&D center in Israel. He is currently a second-year Ph.D student in the Technion, working with Prof. Eran Yahav, focusing on machine and deep learning techniques for Programming Language Processing. In addition to a B.Sc. in Computer Science, Uri also holds a B.A. in Humanities from Haifa University. 

This page was last modified on 26 Oct 2017.