Detecting uninformative comments in software


Comments in software are critical for maintenance and reuse. But apart 

from prescriptive advice, there is little practical support or quantitative 

understanding of what makes a comment useful. In this talk, I will present 

a novel task of identifying comments which are uninformative about the 

code they document. Specifically, I will introduce the notion of comment

entailment from code which is key to our definition of how informative

a comment is. When a comment's natural language semantics can be 

inferred directly from the code, we call the comment as highly entailed

by the code, and hence less informative than comments which explain or 

describe constructs beyond the obvious details. 

Based on this idea, we have developed a tool called CRAIC which identifies 

uninformative comments that can be expanded or alternately removed by 

the developer. CRAIC uses deep language models are trained in an unsupervised 

manner utilizing large corpora of software projects without expensive annotations 

of entailment. Our models can perform the comment entailment task with high 

agreement with human judgements, yielding over 80% agreement.


Annie Louis is a research associate in the Institute for Adaptive and Neural 

Computation at the University of Edinburgh. Previously, she was a Newton 

International Fellow in the Institute for Language, Cognition and Computation

at Edinburgh and before that, a PhD student in the natural language processing 

group at the University of Pennsylvania. 

Her research focuses on language processing and machine learning techniques 

for both natural and programming languages. In particular, she is interested in 

models of documents, conversations, and programs which can successfully 

combine multiple sources of information, from the language side (eg. word 

and sentence level information combined with discourse structure) but also 

pragmatic and task knowledge from the target application. She develops such 

technology for diverse problems such as automatically assessing the quality 

of documents, to summarize and generate text, improve search of social media 

content, and software engineering.

This page was last modified on 25 Sep 2017.