Detecting uninformative comments in software
Abstract
Comments in software are critical for maintenance and reuse. But apart
from prescriptive advice, there is little practical support or quantitative
understanding of what makes a comment useful. In this talk, I will present
a novel task of identifying comments which are uninformative about the
code they document. Specifically, I will introduce the notion of comment
entailment from code which is key to our definition of how informative
a comment is. When a comment's natural language semantics can be
inferred directly from the code, we call the comment as highly entailed
by the code, and hence less informative than comments which explain or
describe constructs beyond the obvious details.
Based on this idea, we have developed a tool called CRAIC which identifies
uninformative comments that can be expanded or alternately removed by
the developer. CRAIC uses deep language models are trained in an unsupervised
manner utilizing large corpora of software projects without expensive annotations
of entailment. Our models can perform the comment entailment task with high
agreement with human judgements, yielding over 80% agreement.
Bio
Annie Louis is a research associate in the Institute for Adaptive and Neural
Computation at the University of Edinburgh. Previously, she was a Newton
International Fellow in the Institute for Language, Cognition and Computation
at Edinburgh and before that, a PhD student in the natural language processing
group at the University of Pennsylvania.
Her research focuses on language processing and machine learning techniques
for both natural and programming languages. In particular, she is interested in
models of documents, conversations, and programs which can successfully
combine multiple sources of information, from the language side (eg. word
and sentence level information combined with discourse structure) but also
pragmatic and task knowledge from the target application. She develops such
technology for diverse problems such as automatically assessing the quality
of documents, to summarize and generate text, improve search of social media
content, and software engineering.