Abstract
Source code is bimodal comprising a formal algorithmic channel and a natural language channel of identifiers and comments. In this talk, we present name-flows to model the bimodality of code and use it to mine conceptual types to suggest nominal type refinements. Name-flows, which are built from identifiers and assignment flows between them, are semantically rich and provide strong information about latent conceptual types. Our tool mines a lattice of conceptual types from name-flows and refines nominal types in an explicitly defined type lattice. Our evaluation on real-life C# projects shows that our tool finds coherent refinements for primitive and user-defined types (UDT) achieving an accuracy of 77% on the exact reconstruction of pre-existing UDTs. We also show that our tool minimises co-occurrence of disparate conceptual types in the same scope with the same primitive type, thereby, reducing chances of vulnerabilities arising from unintended flows.
Bio
Santanu Dash is a Research Associate in the Software Systems Engineering Group at University College London. His current research focus is on Bimodal Software Engineering and Data-driven Program Construction. Previously, he was a Post-doctoral Research Assistant at the Information Security Group at Royal Holloway, University of London. Santanu received a PhD in Computer Science from University of Hertfordshire, UK and a Bachelors and Masters in Computer Engineering from Nanyang Technological University in Singapore.