The 50th CREST Open Workshop - Genetic Improvement
Date: 30 and 31 January 2017
Venue: William Penn Suite, Friends House, 173-177 Euston Rd, London NW1 2BJ
Overview:
Genetic Improvement (GI) uses automated search to find improved versions of existing software. GI has been used to improve many system aspects such as its correctness (through bug fixing) and resource consumption such as time, memory and energy. It has also been used for other kinds of improvement such as specialising and porting. Many and varied techniques have proved successful at improving programs, such as loop perforation, genetic programming, guided random search, transplantation and constraint-based synthesis. This workshop will bring together researchers working on search based software engineering, program synthesis, genetic programming, program analysis, data mining and machine learning. We will explore the possible applications and connections between the various fields that will hopefully lead to development of new GI techniques for automated software improvement.
Speakers:
Alexandru Marginean, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Andrea Arcuri, Westerdals, Oslo, Norway
Bill Langdon, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Bobby Bruce, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Cristian Cadar, Department of Computing, Imperial College London, UK
Cristina Videira Lopes, Department of Informatics, Donald Bren School of Information and Computer Sciences, University of California Irvine, USA
Doron Peled, Department of Computer Science, Bar Ilan University, Isreal
James Clause, Computer and Information Sciences, University of Delaware, USA
Julia Rubin, Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
Mark Harman, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Michele Sebag, Laboratoire de Recherche en Informatique, Université Paris-Sud, France
Robert Feldt, Software Engineering, Blekinge Institute of Technology, Sweden
Saemundur O. Haraldsson, Computer Science, University of Stirling, UK
Sandy Brownlee, Computer Science, University of Stirling, UK
Shin Hwei Tan, Computing, National University of Singapore, Singapore
Shin Yoo, KAIST School of Computing, South Korea
Schedule:
Day 1
10:00 Pastries
10:30 Welcome by Justyna Petke and Introductions (Videos: 480p, 720p)
11:00 Andrea Arcuri - 10 Years After: Automatic Software Generation and Improvement Through Search Based Techniques (Slides, Videos: 480p, 720p)
Ten years have passed since "Coevolving Programs and Unit Tests from their Specification" was published in 2007. Although generating software automatically turned out to be not viable, the resulting co-evolutionary framework can be applied to "simpler" tasks, like bug fixing and non-functional criteria improvement. This talk will discuss the early work from the 2007-2009 period. Given a hammer (co-evolution) which failed at its task (code generation), it was all about trying to find some other nails (bug fixing and code improvement) before you give up and switch PhD topic yet again. At that time, results were obtained only on toy problems. However, looking at what achieved by others during the following 10 years, much
progress has been done.
11:30 Alexandru Marginean - Multilingual Software Transplantation
Automated software transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system (donor) into another (host), entirely unrelated, system. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. In previous work, we autotransplanted code from a donor program, into a host one, when both the donor and the host were written in the same programming language. This talk introduces multilingual software transplantation: code transplants, when the host, and the donor are written in different programming languages. We propose a novel rule-based translation - genetic programming hybrid system, to translate, and implant the organ in the language, and in the context, of the host system.
12:00 Shin Hwei Tan - Design of repair operators for automated program repair (Slides: PPTX, PDF, Videos: 480p, 720p)
Fixing software bugs manually is time-consuming. Furthermore, fixing bugs may introduce regressions due to low-quality patches and inadequate testing. Several automated program repair approaches have been proposed to reduce the time and effort spent for bug fixing. This talk starts by discussing how to design contextual operators for automated repair of software regressions. Since existing program repair techniques rely on tests as correctness criteria, automatically generated patches may not be correct and may introduce regressions as in human patches. This talk then introduces anti-patterns for search-based program repair techniques that address the problem of incorrect patches. Finally, the talk explains the ineffectiveness of the current design of repair operators as a motivation for future design.
12:30 Lunch
13:30 Cristian Cadar - Symbolic Execution for Evolving Software (Slides)
One of the distinguishing characteristics of software systems is that they evolve: new patches are committed to software repositories and new versions are released to users on a continuous basis. Unfortunately, many of these changes bring unexpected bugs that break the stability of the system or affect its security. Furthermore, recent techniques for automatically improving software are proposing ways to generate patches, some of which might be incorrect. In this talk, I describe some of our work on devising novel symbolic execution techniques for increasing the confidence in evolving software: a technique for reasoning about the correctness of optimisations and refactorings; a technique for high-coverage patch testing, and a technique for revealing behavioural divergences across versions.
14:00 Bill Langdon - Genetic improvement of GPU software (Slides, Videos: 480p, 720p)
The GI special issue of "Genetic Programming and Evolvable Machines" includes a survey of genetic improvement for general purpose computing on graphics cards. It showed existing GI-4-GPGPU work falls into four themes. Experiments with growing and grafting new code with genetic programming (GGGP) in combination with human input on existing CUDA GPU code show that semi-automated approaches can give useful speed ups. In the case of BarraCUDA these have been adopted and the GI version, 0.7.107x, exceeds 2000 sourceforge downloads.
14:30 Bobby Bruce - Automatic Parallelisation of Software using Genetic Improvement (Slides: Keynote, PDF, Videos: 480p, 720p)
There is little doubt that the exploitation of multi-core hardware by the majority of software engineers has been lacklustre. Most software continues to run sequentially when, in many instances, distribution of work across many cores is possible. We believe the development of tools to do this automatically is by far the best solution to this problem. As such we present early investigations into using Genetic Improvement to automatically insert OpenACC compiler directive to sequential C/C++ programs, thus offloading computationally intensive activity to a GPU without human intervention
15:00 Refreshments
15:30 Saemundur Haraldsson - Janus Manager, the self-improving software system (Slides, Videos: 480p, 720p)
In the ever expanding field of GI, many interesting and successful applications have improved both functional and non-functional properties. This talk presents Janus Manager, a bespoke software system for Icelandic vocational rehabilitation centres. During business hours it provides overview and control for many specialists to simultaneously schedule and observe the rehabilitation process for multiple clients at a time. However in the evening, after the last user logs out, it starts a self-analysis based on the day's recorded interactions and a self-healing process using GI if any exceptions are recorded. When the developer next logs in to the system it will display a report, listing encountered errors and suggested changes to the source code. The system has already been under testing for 4 months and demonstrates the power of GI for improving real, live code.
16:00 Shin Yoo - Amortised optimisation as a mean of Genetic Improvement (Slides, Videos: 480p, 720p)
Many successful applications of Genetic Improvement have dealt with non-functional properties, perhaps because they are precisely the issues human developers find hard to tackle. In addition to their non-intuitive nature, these properties are often heavily affected by the environment in which the target program is executed. Amortised optimisation is an attempt to take the optimisation to the execution environment to achieve GI in-situ, instead of in-vitro. We introduce a few early exploratory findings and discuss potential future research.
Day 2
10:30 Pastries
11:00 Robert Feldt - GI++ == Focused Automated Programming? (Slides: short, full, Videos: 480p, 720p)
Genetic Improvement (GI) is a recent and exciting way to apply search and optimisation to better support software developers. Even though it has shown impressive results it can also be argued to be quite distant from a long-time, more ambitious and, in some sense, end goal of search-based software engineering (SBSE): fully automated programming. In this talk we propose that one can view GI as one point or range of techniques on a continuum of SBSE software development approaches and discuss what could be a next, and maybe even more ambitious, goal on such a 'scale'. We briefly touch on our recent results on focused automated programming.
11:30 Cristina Videira Lopes - The Curious Case of Code Duplication in GitHub
Previous studies have shown that there is a non-trivial amount of duplication in source code. We recently analyzed a corpus of 2.6 million non-fork projects hosted on github representing over 258 million files written in Java, C++ Python and JavaScript, and found a surprisingly large amount of duplication, much more than we anticipated. This finding made us be much more careful when using open source repositories for drawing statistical conclusions. I will talk both about the findings and the consequences of these findings for some of our other work.
12:00 Michele Sebag - Algorithm Selection via Algorithm Recommendation
Algorithm selection (AS), selecting the algorithm best suited for a particular problem instance, is acknowledged to be a key issue to make the best out of algorithm portfolios. The talk will discuss a collaborative filtering approach to AS. Collaborative filtering, popularized by the Netflix challenge, aims to recommend the items that a user will most probably like, based on the previous items she liked, and the items that have been liked by other users. As first noted by Stern et al. (2010), algorithm selection can be formalized as a collaborative filtering problem, by considering that a problem instance ``likes better`` the algorithms that achieve better performance on this particular instance. Applying collaborative filtering on the benchmark datasets (reporting the performance of each algorithm in the portfolio on a set of problem instances) yields a latent representation of algorithms and of problem instances, which enables algorithm recommendation. The main challenge is to handle the so-called cold-start problem, and to recommend an algorithm for a brand new problem instance.
12:30 Lunch
13:30 Doron Peled - Synthesis of Concurrent Programs using Genetic Programming (Slides: PPTX, PDF, Videos: 480p, 720p)
We present a method to automatically generate concurrent code using genetic programming, based on automatic verification (model checking). As the problem of constructing concurrent code is in general undecidable, the user needs the intervene by tuning various parameters and supplying specification and hints that would steer the search for correct code in the right direction. We demonstrate how various hard-to-program protocols are generated using our method and our developed tool. We show how a commonly used protocol for coordinating concurrent interactions was found to be incorrect using our tool, and was then subsequently fixed.
14:00 Julia Rubin - Energy-Efficiency in Mobile Software (Slides)
Poor battery life is a problem that affects all mobile users. A dramatic increase in computing capabilities of small devices, coupled with only a limited increase in battery capacity, renders battery lifetime as one of the major usability concerns. Yet, developers rarely have time to invest in optimizing software for energy-efficiency. In many cases, they also lack tools and skills needed for the task. In this talk, I will outline major sources of energy-related problems in mobile software. I will then discuss existing approaches for detecting energy bugs and bad smells, and present ideas for possible future work on optimizing the energy consumption in mobile applications.
14:30 James Clause - SEEDS: The Software Engineer's Energy-optimization Decision Support Framework (Slides, Videos: 480p, 720p)
Reducing the energy usage of software is becoming more important in many environments. Empirical studies indicate that software engineers can support this goal by making design and implementation decisions in ways that take into consideration how such decisions impact energy usage. However, the large number of possible choices and the lack of feedback and information available to software engineers necessitates some form of automated decision-making support. In this talk, I will describes SEEDS, our framework for systematically optimizing the energy usage of applications by making code-level changes, including recent efforts to improve the search process via GI techniques.
15:00 Refreshments
15:30 Sandy Brownlee - Hyper-parameter tuning to improve existing software (Slides, Videos: 480p, 720p)
We describe a systematic approach to the automated improvement of a schedule simulation application used by a major airline. Our initial experiments considered the optimisation of existing hyper-parameters in the software. Single and multi-objective approaches were applied to improving the speed and simulation accuracy of the software. Strong results were obtained, with configurations making the application both faster running and more accurate. In addition, analysis of the search space and optimal solutions added value to the optimisation by providing insight into the relative importance of the hyper-parameters, as well as any possible interactions between them. This allowed the end-users to have confidence in the results and provided feedback for the software developers. We will conclude with some commentary on the next stage of this work, which will consider genetic improvement of the software at the level of code.
16:00 Mark Harman - GI overview (Videos: 480p, 720p)
In this talk we revisit the foundations of genetic improvement, considering connections between genetic improvement and other forms of program manipulation, such as traditional program transformation, synthesis, and various definitions of program slicing.
Photos:
This workshop is supported by the following sponsors:
Registered attendees:
1. Mark Harman, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
2. Matheus Paixao, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
3. Justyna Petke, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
4. Andrea Arcuri, Westerdals, Oslo, Norway
5. Cristian Cader, Department of Computing, Imperial College London, UK
6. Cristina Videira Lopes, Department of Informatics, Donald Bren School of Information and Computer Sciences, University of California Irvine, USA
7. Doron Peled, Department of Computer Science, Bar Ilan University, Isreal
8. Julia Rubin, Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada
9. Michele Sebag, Laboratoire de Recherche en Informatique, Université Paris-Sud, France
10. Robert Feldt, Software Engineering, Blekinge Institute of Technology, Sweden
11. Shin Yoo, KAIST School of Computing, South Korea
12. James Clause, Computer and Information Sciences, University of Delaware, USA
13. Derek Jones, Knowledge Software, UK
14. Bobby Bruce, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
15. Bill Langdon, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
16. Martin Shepperd, Department of Computer Science, Brunel University London, UK
17. Mike Papadakis, SnT, University of Luxembourg , Luxembourg
18. David White, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
19. Stephen Cullum, School of Computer Science and Electronic Engineering, University of Essex, UK
20. Marco Micucci, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
21. Nassim Seghir, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
22. David Clark, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
23. Earl Barr, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
24. Tracy Hall, Department of Computer Science, Brunel University, London, UK
25. Carlos Gavidia, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
26. Chaiyong Ragkhitwetsagul, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
27. Yuri Bykov, Computer Science, Queen Mary University London, UK
28. Saemundur O. Haraldsson, Computer Science, University of Stirling, UK
29. Hector D Menendez, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
30. John R. Woodward, Computing Science and Mathematics, University of Stirling, UK
31. Shin Hwei Tan, Computing, National University of Singapore, Singapore
32. Byeonghyeon You, COINSE Lab, KAIST School of Computing, South Korea
33. Seongmin Lee, COINSE Lab, KAIST School of Computing, South Korea
34. Junhwi Kim, COINSE Lab, KAIST School of Computing, South Korea
35. Jeongju Sohn, COINSE Lab, KAIST School of Computing, South Korea
36. Syed Islam, School of Architecture, Computing and Engineering, University of East London
37. Jose Miguel Rojas, Computer Science, University of Sheffield, UK
38. Brendan Cody-Kenny, Natural Computing Research and Applications Group, University College Dublin, Ireland
39. Andrea Mattavelli, Department of Computing, Imperial college London, UK
40. Alexandru Marginean, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
41. Sandy Brownlee, Computer Science, University of Stirling, UK
42. Federica Sarro, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
43. Kwaku Yeboah-Antwi, DiverSE Group, INRIA, University of Rennes I, France
44. Titcheu Chekam Thierry, Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg
45. Stefan Forstenlechner, Natural Computing Research and Applications Group, University College Dublin, Ireland
46. Dongsum Kim, Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg
47. Greta Yorsh, Computer Science, Queen Mary University London, UK
48. Santanu Dash, CREST Centre, SSE Group, Department of Computer Science, UCL, UK