The 62nd CREST Open Workshop - Automated Program Repair and Genetic Improvement
Date: 20th and 21st January 2020
Venue: George Fox room Friends House, 173-177 Euston Road, London, NW1 2BJ
Overview:
Program repair has the potential to reduce the significant manual effort that developers devote to finding and fixing software bugs. Recent years have witnessed a dramatic growth of research in program repair. Researchers have proposed a large number of techniques aimed to address fundamental challenges of program repair such as scalability and test-overfitting, and have successfully deployed program repair in industry. One of the techniques that has been used in the repair field has been genetic improvement. GI uses automated search in order to improve existing software. Aside from bug fixing, GI has been used to optimise other software properties, such as runtime, memory and energy consumption. It has also been used for other kinds of improvement such as specialising and porting. The goal of this workshop is to reflect on the progress that the research community has made over the last years in those two closely related fields, share experience in research and deployment, and identify key challenges that need to be addressed in future research.
Organizers:
Dr Justyna Petke, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Dr Sergey Mechtaev, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Prof Mark Harman, Facebook, and UCL, UK
Speakers:
Dr Marija Selakovic, Huawei, Germany
Prof Orna Grumberg, Israel Institute of Technology, Israel
Prof Martin Monperrus, KTH Royal Institute of Technology, Sweden
Dr Alexandru Marginean, Facebook UK and University College London, UK
Dr Julia Lawall, INRIA, France
Dr Shin Hwei Tan, Southern University of Science and Technology (SUSTech) (China)
Dr Michail Basios, TurinTech, University College London, UK
Prof Bill Langdon, University College London, UK
Dr Sandy Brownlee, University of Stirling, UK
Prof Abhik Roychoudhury, National University of Singapore, Singapore
Dr Markus Wagner, University of Adelaide, Australia
Dr Thomas Durieux, University of Lisbon, Portugal
Prof Sarfraz Khurshid, The University of Texas at Austin, USA
Dr Justyna Petke, University College London, UK
Dr Sergey Mechtaev, University College London, UK
Speakers and schedule:
Day 1 - Monday 20th January 2020
10:00 Pastries
10:30 Welcome and introductions
11:00 Dr Julia Lawall, INRIA, France -- "SPINFER: Inferring Software Maintenance Rules for the Linux Kernel" (Slides, Videos: )
Abstract: In a large software system such as the Linux kernel, there is a continual need for large-scale changes across many source files, triggered by new needs or refined design decisions. In this work, we propose to ease such
changes by suggesting transformation rules to developers, inferred automatically from a collection of examples. Our approach can help automate large-scale changes as well as help understand existing large-scale changes, by highlighting the various cases that the developer who performed the changes has taken into account. We have implemented our approach as a tool, Spinfer. We evaluate Spinfer on a range of challenging large-scale changes from the Linux kernel and obtain rules that achieve precision and recall of up to 100%.
This is joint work with Lucas Serrano, Ferdian Thung, David Lo, Lingxiao Jiang, and Gilles Muller
11:30 Prof Orna Grumberg, Israel Institute of Technology, Israel -- "Automated Program Repair" (Slides, Videos: )
Abstract: In the process of software development and maintenance, much efforts are invested in order to ensure that the product is as bug free as possible. Automating the repair process is highly desired.
In this talk we present two approaches to automated program repair.
The first approach automatically repairs an erroneous program using a predefined set of mutations to the code (e.g., replacing ‘+’ with ‘-‘ ; ‘>’ with ‘<’, etc.). The repair algorithm goes through generate-validate iterations, which involve fault localization mechanism in order to disregard sets of unsuccessful repairs. The approach guarantees soundness and completeness with respect to a bounded notion of correctness.
The second approach intertwines compositional Assume-Guarantee verification with repair. Automata learning is used to verify the program or find an erroneous behavior. In the latter case, abduction assists in repairing the system by inferring new constraints that eliminate the error. Then, the verification proceeds for the repaired system.
This is a joint work with: Hadar Frenkel, Corina Pasareanu, Bat-chen Rothenberg and Sarai Sheinvald
12:00 Dr Shin Hwei Tan, Southern University of Science and Technology (SUSTech) (China) -- "Collaborative bug finding and bug-fixing for Android Apps." (Slides, Videos: )
Abstract: Many automated test generation techniques have been proposed for finding crashes in Android apps. Despite recent advancement in these techniques, a study shows that Android app developers prefer reading test cases written in natural language. Meanwhile, there exist redundancies in bug reports across different apps that have not been previously reused. In this paper, we propose collaborative bug finding, a novel approach that uses similarities between different Android apps for finding relevant bug reports that help in the discovery of new bugs. Given an app under test, our approach uses interactions between programmers for deriving specialized test scenarios. We design three settings with varying degrees of interactions between programmers where each setting emulates a different real-world scenarios. Our studies of the first two settings in a software testing course show that collaborative bug finding helps students who are novice Android app testers to discover 17 new bugs. However, students admit that searching for relevant bug reports could be time-consuming. Based on students’ feedback, we introduce Bugine, an approach that automatically recommends relevant GitHub issues for a given app. Bugine Our results show that Bugine is able to find 34 new bugs. In total, collaborative bug finding helps us to find 51 new bugs, in which five have been confirmed and seven have been fixed by the developers. These results confirm our intuition that our proposed technique is useful in discovering new bugs for Android apps. In this talk, I will first introduce my recent work on collaborative bug finding and discuss how this concept may be applied for bug-fixing.
12.30 Lunch
13:30 Dr Alexandru Marginean, Facebook UK and University College London, UK -- "SapFix: Automated End-to-End Repair at Scale"
Abstract: We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code. We have used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are collectively used by hundreds of millions of people worldwide.
14:00 Prof Abhik Roychoudhury, National University of Singapore, Singapore -- "Overfitting in Program Repair" (Slides, Videos: )
Abstract: Automated program repair is an emerging technology which seeks to automatically rectify program errors and vulnerabilities. Repair techniques are driven by a correctness criterion which is often in the form of a test-suite. Such test-based repair produces over-fitting patches, where the patches produced may fail on tests outside the test-suite driving the repair. In this talk, we will study and discuss various mechanisms to combat the patch over-fitting problem. This may include quality indicators to choose patches, automatic test generation via fuzz testing to reduce over-fitting, or more recently constraint extraction and propagation techniques to repair software vulnerabilities. The constraint extraction to bootstrap the propagation is achieved via a modification of various sanitizers. The overall vision of these works, is to explore architectures which can combine a program repair engine with a fuzz testing engine to provide a find-and-fix capability for program vulnerabilities.
14:30 Dr Thomas Durieux, University of Lisbon, Portugal -- "Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts" (Slides, Videos: )
Abstract: In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are evaluated on a single benchmark of bugs, which are also rarely reproduced by other researchers. In this paper, we present a large-scale experiment using 11 Java test-suite-based repair tools and 5 benchmarks of bugs. Our goal is to have a better understanding of the current state of automatic program repair tools on a large diversity of benchmarks. Our investigation is guided by the hypothesis that the repairability of repair tools might not be generalized across different benchmarks of bugs. We found that the 11 tools 1) are able to generate patches for 21% of the bugs from the 5 benchmarks, and 2) have better performance on Defects4J compared to other benchmarks, by generating patches for 47% of the bugs from Defects4J compared to 10-30% of bugs from the other benchmarks. Our experiment comprises 23,551 repair attempts in total, which we used to find the causes of non-patch generation. These causes are reported in this paper, which can help repair tool designers to improve their techniques and tools.
15:00 Refreshments
15:30 Prof Sarfraz Khurshid, The University of Texas at Austin, USA -- "Automated Testing and Debugging for Declarative Models"
Abstract: While software models have a vital role in the development of reliable systems, writing the models correctly remains a challenging problem. Our key insight is that a test-driven approach that takes foundational
concepts in software testing and debugging -- as defined and widely practiced for imperative code -- and re-defines them for the declarative paradigm promises an effective methodology for building correct models. This talk gives an overview of our approach that supports testing and debugging of declarative models written in Alloy,
which is a first-order relational logic with transitive closure.
16:00 Dr Sergey Mechtaev, University College London, UK -- "Synthesis of Software Patches Using Symbolic Execution" (Slides, Videos: )
Abstract: Program synthesis automatically constructs a program that satisfies a given specification. Program repair can be viewed as an instance of program synthesis, whose goal is to modify a given incorrect program to satisfy the specification. One of the key challenges of program repair is its limited scalability when it is applied to large programs. This talk presents an approach of using symbolic execution to analyse the semantic impact of synthesized patches in order to alleviate the scalability problem. The presented approach scales to complex real-world software, and is able to repair complex defects, such as the famous Heartbleed vulnerability. Besides, this approach can provide correctness guarantees of the generated patches when a formal specification or a reference implementation are available.
16:30 Final discussion and wrap up
17:00 Close
Day 2 - Tuesday 21st January 2020
10:00
10:30 Pastries
11:00 Prof Martin Monperrus, KTH Royal Institute of Technology, Sweden -- "Automated Patch Assessment for Program Repair at Scale"
Abstract: In this paper, we do automatic correctness assessment for patches generated by program repair techniques. We consider the human patch as ground truth oracle and randomly generate tests based on it, i.e., Random testing with Ground Truth -- RGT. We build a curated dataset of 638 patches for Defects4J generated by 14 state-of-the-art repair systems. We evaluate automated patch assessment on our dataset which is, to our knowledge, the largest ever. The results of this study are novel and significant. First, we show that 10 patches from previous research classified as correct by their respective authors are actually overfitting. Second, we demonstrate that the human patch is not the perfect ground truth. Third, we precisely measure the trade-off between the time spent for test generation and the benefits for automated patch assessment at scale.
URL: arxiv.org/abs/1909.13694
11:30 Dr Marija Selakovic, Huawei, Germany -- "Actionable Performance Analyses" (Slides, Videos: )
Abstract: The application’s performance has become the essential concern in software development. Unfortunately, many applications still suffer from performance issues: coding or design errors that lead to performance degradation. Fixing performance issues is a challenging task: current performance profilers report only where resources are spent, but not where resources are wasted. In this talk I will focus on actionable performance analyses that help developers optimize their software by suggesting relatively simple code changes. More concretely, these analyses propose optimizations based on reordering opportunities and method inlining and they share the three key properties. First, the proposed optimizations are effective, that is, the changes suggested by the analysis lead to statistically significant performance improvements. Second, the optimizations are exploitable, that is, they are easy to understand and apply. Finally, the optimizations are recurring, that is, they are applicable across multiple projects.
12:00 Lunch
13:30 Dr Michail Basios, TurinTech, University College London, UK -- "Artemis: Automatic Code Optimisation - from Research to Industry"
Abstract: Data structure selection and tuning is laborious but can vastly improve an application’s performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce Artemis a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS, with respect to a test suite, then changes an application to use that DDS. Artemis achieved substantial performance improvements when applied on 43 open source Java projects (programs and libraries) and 5 C++ projects. For execution time, CPU usage, and memory consumption, Artemis finds at least one solution that improves all measures for 86% (37/43) of the projects. The median improvement across the best solutions is 4.8%, 10.1%, 5.1% for runtime, memory and CPU usage. In this workshop, we will talk about how to extend Artemis in a wider range of applications such as android mobile apps and big industrial codebases. We will discuss the difficulties and issues of applying such optimisations and show the approaches we followed to overcome them. Last, we will talk about potential extensions in the framework that can allow Artemis to achieve even better optimisations.
14:00 Dr Sandy Brownlee, University of Stirling, UK -- "Search-based approaches to improving the energy consumption of Java programs" (Slides, Videos: )
Abstract: There is growing interest in approaches to improve the performance of software with respect to non-functional properties. Reducing computational energy consumption is one such property to be targeted, and is particularly important at the extremes (i.e., mobile devices and datacentres). In many cases there is even a trade-off between functionality and energy consumption. Given the huge number of design choices that can impact on the energy performance of software, it is an area where search-based and Genetic Improvement approaches have much potential. This talk focuses on a variety of approaches to reducing the energy consumption of Java programs, using a tool called Opacitor to estimate energy consumption, while still considering the functionality of the code.
14:30 Dr Markus Wagner, University of Adelaide, Australia -- "Optimising energy consumption using GI" (Slides, Videos: )
Abstract: Code execution consumes energy, and this can affect us in more than just one way. In this talk, I will outline two of them: (i) high energy consumption of apps running on modern smartphones negatively impacts the user experience, and (ii) patterns emerging from the energy consumption can lead to leakage of secret information. Optimisation engineers tackling these problems with actual devices have to overcome a surprisingly large number of challenges. I will outline overcome and open challenges as well as achievements, ranging from the characterisation of devices and the improvement of simulators, to the source-code level optimisation of the energy consumption of Facebook’s rebound library and to the assembly-level minimisation the power-based information leakage of an AES implementation.
15:00 Refreshments
15:30 Prof Bill Langdon, University College London, UK -- "GI4SBSE Using Geentic Improvement to speed up search" (Slides, Videos: )
Abstract: Evolutionary computation is applied to optimise run time of the tree interpreter in fastest single computer floating point genetic programming system.
Up to two fold speed up is obtained.
Performance varies with tree size. The GI version of Andy Singleton's GPquick is demonstrated on random trees of up to 79 million opcodes on Intel AVX512 SIMD parallel compute servers
16:00 Dr Justyna Petke, University College London, UK -- "Gin and PyGGI: General Frameworks for Genetic Improvement" (Slides, Videos: )
Abstract: Genetic Improvement is a young field of research that uses automated search to improve existing software. The cost of re-implementing GI to investigate new approaches is hindering progress. Therefore I will present Gin, an extensible and modifiable toolbox for GI experimentation in Java, as well as PyGGI, a lightweight framework for GI for multiple programming languages.
16:30 Final discussion and wrap up
17:00 Close
Registered Attendees:
1. Dr. Giovani Guizzo, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
2. Dr. Maria Kechagia, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
3. Dr. Aymeric Blot, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
4. Dr. Earl Barr, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
5. Dr. Jie Zhang, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
6. Dr Marija Katic, Birkbeck, University of London, UK
7. Dr. Jens Krinke, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
8. James Callun, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
9. Dr John Woodward, Dept of Computer Science, Queen Mary University of London
10. Nikhil Parasaram, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
11. Artur Meski, Institute of Computer Science, Polish Academy of Sciences
12. Dr Leon Moonen, Simula Research Laboratory
13. Professor Tracy Hall, Department: Computing and Communications , University of Lancaster
14. Dr Michal Knapik, Institute of Computer Science, PAS, Warsaw, Poland,
15. Piotr Dziurzanski Dept. of Computer Science, University of York
16. Nicholas Matragkas Department of Computer Science, University of York
17. Dario Asprone, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
18. Seongmin Lee, Korea Advanced Institute of Science and Technology
19. Robert White, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
20. Derek Jones, Knowledge Software
21. Max Hort, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
22. Emily Winter, University of Lancaster
Dr Vesna Nowack, Queen Mary University of London, UK
P
Professor Steve Counsell, Brunel University, UK
Daniel Kroening, University of Oxford, UK
Dr Matias Martinez, Université Polytechnique Hauts-de-France, France
Dr David Clark, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Dr David Bowes, University of Lancaster
Dr Julio Cesar Cortest Rios, University of Manchester
David Kelly, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Dan Bruce, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Rebecca Moussa, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Leonid Joffe, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Alex Burdusel, Kings College London, UK
Pavel Reich, Queen Mary University of London, UK
Louis Vines, CREST Centre, SSE Group, Department of Computer Science, UCL, UK