Technical Program 

Monday, May 22


09:00 - 10:30

ICPC 2017 Opening

Room: Auditorio 1  |  Session Chairs: David Lo and Alexander Serebrenik

Keynote: The ABCs of Software Engineering: Affect, Biometrics, and Cognition

Andrew Begel, Microsoft Research  
Researchers have long investigated how people read, write, and speak about software on their computers to identify the skills, education, and practices needed need to acquire expertise and perform development duties effectively and efficiently. However, until now the methods used to study developer comprehension, expression, and communication have been limited and coarse-grained because there was no way to identify what a developer thought or felt unless it was expressed out loud.
The world has changed. With the introduction of low-cost, widely available, high-fidelity biometric sensors, we can now more directly observe a software developer's cognitive and affective (emotional) processes. The ABCs of Software Engineering is a set of techniques that modernize classic approaches to program comprehension and human interaction by combining (A) principles governing the influence of human *affect* on behavior, (B) *biometric* sensors, and (C) models of *cognition* informed by advances in cognitive neuroscience. Technologies like electroencephalography (EEG), electro-dermal activity sensors (EDA), capacitive sensors, and eye trackers can reveal a software developer's internal emotional states, for example identifying when the developer is confused, frustrated, surprised, stressed, fatigued, or in a highly productive flow state. These affective states can be correlated with code quality, software complexity, development productivity, and effective communication --- the same software outcomes already correlated with developer activities in other research areas such as mining software repositories (MSR) and cooperative and human aspects of software engineering (CHASE). By developing a better understanding of what programmers think and feel when they create and maintain software, we can design tools and interventions to improve their productivity and reduce the impact of their errors.
11:00 - 12:30

Technical Research: Developer Observation

Room: Auditorio 1  |  Session Chair: David Shepherd
Do Software Developers Understand Open Source Licenses?
Daniel Almeida, Gail Murphy, Greg Wilson and Mike Hoye  
Software provided under open source licenses is widely used, from forming high-profile stand-alone applications (e.g., Mozilla Firefox) to being embedded in commercial offerings (e.g., network routers). Despite the high frequency of use of open source licenses, there has been little work about whether software developers understand the open source licenses they use. To our knowledge, only one survey has been conducted, which focused on which licenses developers choose and when they encounter problems with licensing open source software. To help fill the gap of whether or not developers understand the open source licenses they use, we conducted a survey that posed development scenarios involving three popular open source licenses (GNU GPL 3.0, GNU LGPL 3.0 and MPL 2.0) both alone and in combination. The 375 respondents to the survey, who were largely developers, gave answers consistent with those of a legal expert’s opinion in 62% of 42 cases. Although developers clearly understood cases involving one license, they struggled when multiple licenses were involved. An analysis of the quantitative and qualitative results of the study indicate a need for tool support to help guide developers in understanding this critical information attached to software components.

Software Engineers' Information Seeking Behavior in Change Impact Analysis – An Interview Study
Markus Borg, Emil Alégroth and Per Runeson  
Software engineers working in large projects must navigate complex information landscapes. Change Impact Analysis (CIA) is a task that relies on engineers’ successful information seeking in databases storing, e.g., source code, requirements, design descriptions, and test case specifications. Several previous approaches to support information seeking are task-specific, thus understanding engineers’ seeking behavior in specific tasks is fundamental. We present an industrial case study on how engineers seek information in CIA, with a particular focus on traceability and development artifacts that are not source code. We show that engineers have different information seeking behavior, and that some do not consider traceability particularly useful when conducting CIA. Furthermore, we observe a tendency for engineers to prefer less rigid types of support rather than formal approaches, i.e., engineers value support that allows flexibility in how to practically conduct CIA. Finally, due to diverse information seeking behavior, we argue that future CIA support should embrace individual preferences to identify change impact by empowering several seeking alternatives, including searching, browsing, and tracing.

How Developers Document Pull Requests with External References Fiorella Zampetti, Luca Ponzanelli, Andrea Mocci, Gabriele Bavota, Massimiliano Di Penta and Michele Lanza  
Online resources of formal and informal documentation–such as reference manuals, forum discussions and tutorials–have become an asset to software developers, as they allow them to tackle problems and to learn about new tools, libraries, and technologies. This study investigates to what extent and for which purpose developers refer to external online resources when they contribute changes to a repository by raising a pull request. Our study involved (i) a quantitative analysis of over 150k URLs occurring in pull requests posted in GitHub; (ii) a manual coding of the kinds of software evolution activities performed in commits related to a statistically significant sample of 2,130 pull requests referencing external documentation resources; (iii) a survey with 69 participants, who provided feedback on how they use online resources and how they refer to them when filing a pull request. Results of the study indicate that, on the one hand, developers find external resources useful to learn something new or to solve specific problems, and they perceive useful referring such resources to better document changes. On the other hand, both interviews and repository mining suggest that external resources are still rarely referred in document changes.

Variability through the Eyes of the Programmer
Jean Melo, Fabricio Batista Narcizo, Dan Witzner Hansen, Claus Brabrand and Andrzej Wasowski  
Preprocessor directives (#ifdefs) are often used to implement compile-time variability, despite the critique that they increase complexity, hamper maintainability, and impair code comprehensibility. Previous studies have shown that the time of bug finding increases linearly with variability. However, little is known about the cognitive process of debugging programs with variability. We carry out an experiment to understand how developers debug programs with variability. We ask developers to debug programs with and without variability, while recording their eye movements using an eye tracker. The results indicate that debugging time increases for code fragments containing variability. Interestingly, debugging time also seems to increase for code fragments without variability in the proximity of fragments that do contain variability. The presence of variability correlates with increase in the number of gaze transitions between definitions and usages for fields and methods. Variability also appears to prolong the “initial scan” of the entire program that most developers initiate debugging with.
14:00 - 15:30

Technical Research: Naming and Complexity

Room: Auditorio 1  |  Session Chair: Andrew Begel
Meaningful Identifier Names: The Case of Single-Letter Variables
Gal Beniamini, Sarah Gingichashvili, Alon Klein Orbach and Dror Feitelson  
It is widely accepted that variable names in computer programs should be meaningful, and that this aids program comprehension. “Meaningful” is commonly interpreted as favoring long descriptive names. However, there is at least some use of short and even single-letter names: using i in loops is very common, and we show (by extracting variable names from 1000 popular github projects in 5 languages) that some other letters are also widely used. In addition, controlled experiments with different versions of the same functions (specifically, different variable names) failed to show significant differences in ability to modify the code. Finally, an online survey showed that certain letters are strongly associated with certain types and meanings. This implies that a single letter can in fact convey meaning. The conclusion from all this is that single letter variables can indeed be used beneficially in certain cases, leading to more concise code.

Effects of Variable Names on Comprehension: An Empirical Study
Eran Avidan and Dror Feitelson  
It is widely accepted that meaningful variable names are important for comprehension. We conducted a controlled experiment in which 9 professional developers try to understand 6 methods from production util classes, either with the original variable names or with names replaced by meaningless single letters. Results show that parameter names are more significant for comprehension than local variables. But, surprising, we also found that in 3 of the methods there were no significant differences between the control and experimental groups, due to poor and even misleading variable names. These disturbingly common bad names reflect the subjective nature of naming, and highlight the need for additional research on how variable names are interpreted and how better names can be chosen.

Syntax, Predicates, Idioms -- What Really Affects Code Complexity?
Shulamyt Ajami, Yonatan Woodbridge and Dror Feitelson  
Program comprehension concerns the ability to understand code written by others. But not all code is the same. We use an experimental platform fashioned as an online game-like environment to measure how quickly and accurately 222 professional programmes can interpret code snippets with similar functionality but different structures. The results indicate, inter alia, that for loops are significantly harder than ifs, that some but not all negations make a predicate harder, and that loops counting down are slightly harder than loops counting up. This demonstrate how the effect of syntactic structures, different ways to express predicates, and the use of known idioms can be measured empirically, and that syntactic structures are not necessarily the most important factor. By amassing many more empirical results like these it may be possible to derive better code complexity metrics than we have today.

Exploiting Type Hints in Method Argument Names to Improve Lightweight Type Inference
Nevena Milojković, Mohammad Ghafari and Oscar Nierstrasz  
The lack of static type information is one of the main obstacles to program comprehension in dynamically-typed languages. While static type inference algorithms try to remedy this problem, they usually suffer from the problem of false positives or false negatives. In order to partially compensate for the lack of static type information, a common practice in dynamically-typed languages is to name or annotate method arguments in such a way that they reveal their expected type, e.g., aString, anInt, or string: String. Recent studies confirmed that these type annotations are indeed frequently used by developers in dynamically-typed languages. We propose a lightweight heuristic that uses these hints from method argument names to augment the performance of a static type inference algorithm. The evaluation through a proof-of-concept prototype implemented in Pharo Smalltalk shows that the augmented algorithm outperforms the basic algorithm, and correctly infers types for 81% more method arguments.
16:00 - 17:30

Technical Research: Smells and Clones

Room: Auditorio 1  |  Session Chair: Mike Godfrey
Binary Code Clone Detection across Architectures and Compiling Configurations
Yikun Hu, Yuanyuan Zhang, Juanru Li and Dawu Gu  
Binary code clone detection (or similarity comparison) is a fundamental technique for many important applications, such as plagiarism detection, malware analysis, software vulnerability assessment and program comprehension. With the prevailing of smart and IoT (Internet of Things) devices, more and more programs are ported from traditional desktop platforms (e.g., IA-32) to ARM and MIPS architectures. It becomes imperative to detect cloned binary code across architectures. However, because of incomparable instruction sets of different architectures as well as alternative compiling configurations, it is difficult to conduct a binary code clone detection with traditional syntax- or structure-based methods. To address, we propose a semantics-based approach to fulfill the target. We recognize arguments and indirect jump targets of each binary function, and emulate executions of those functions, extracting semantic signatures to measure the similarity of functions. The approach has been implemented in a prototype system named CACOMPARE to detect cloned binary functions across architectures and compiling configurations. It supports comparisons between mainstream architectures (IA-32, ARM and MIPS) and is able to analyse binaries on the Linux platform. The experimental results show that CACOMPARE not only is effective in dealing with binaries of different architectures and variant compiling configurations, but also improves the accuracy of binary code clone detection comparing to state-of-the-art solutions.

Identifying Code Clones having High Possibilities of Containing Bugs
Manishankar Mondal, Chanchal K. Roy and Kevin Schneider  
Code cloning has emerged as a controversial term in software engineering research and practice because of its positive and negative impacts on software evolution and maintenance. Researchers suggest managing code clones through refactoring and tracking. Given the huge number of code clones in a software system’s code-base, it is essential to identify the most important ones to manage. In our research, we investigate which clone fragments have high possibilities of containing bugs so that such clones can be prioritized for refactoring and tracking to help minimize future bug-fixing tasks. Existing studies on clone bug-proneness cannot pinpoint code clones that are likely to experience bug-fixes in the future. According to our analysis on thousands of revisions of four diverse subject systems written in Java, change frequency of code clones does not indicate their bug-proneness (i.e., does not indicate their tendencies of experiencing bug-fixes in future). Bug-proneness is mainly related with change recency of code clones. In other words, more recently changed code clones have a higher possibility of containing bugs. Moreover, for the code clones that were not changed previously we observed that clones that were created more recently have higher possibilities of experiencing bug-fixes. Thus, our research reveals the fact that bug-proneness of code clones mainly depends on how recently they were changed or created (for the ones that were not changed before). It invalidates the common intuition regarding the relatedness between high change frequency and bug-proneness. We believe that code clones should be prioritized for management considering their change recency or recency of creation (for the unchanged ones).

Smells are sensitive to developers! On the efficiency of (un)guided customized detection
Mario Hozano, Alessandro Garcia, Nuno Antunes, Baldoino Fonseca and Evandro Costa  
Code smells indicate poor implementation choices that may hinder program comprehension and maintenance. Their informal definition allows developers to follow different heuristics to detect smells in their projects. Machine learning has been used to customize smell detection according to the developer’s perception. However, such customization is not guided (i.e. constrained) to consider alternative heuristics used by developers when detecting smells. As a result, their customization might not be efficient, requiring a considerable effort to reach high effectiveness. In fact, there is no empirical knowledge yet about the efficiency of such unguided approaches for supporting developer-sensitive smell detection. This paper presents Histrategy, a guided customization technique to improve the efficiency on smell detection. Histrategy considers a limited set of detection strategies, produced from different detection heuristics, as input of a customization process. The output of the customization process consists of a detection strategy tailored to each developer. The technique was evaluated in an experimental study with 48 developers and four types of code smells. The results showed that Histrategy is able to outperform six widely adopted machine learning algorithms – used in unguided approaches – both in effectiveness and efficiency. It was also confirmed that most developers benefit from using alternative heuristics to: (i) build their tailored detection strategies, and (ii) achieve efficient smell detection.

On the Uniqueness of Code Redundancies
Bin Lin, Luca Ponzanelli, Andrea Mocci, Gabriele Bavota and Michele Lanza  
Code redundancy widely occurs in software projects. Researchers have investigated the existence, causes, and impacts of code redundancy, showing that it can be put to good use, for example in the context of code completion. When analyzing source code redundancy, previous studies considered software projects as sequences of tokens, neglecting the role of the syntactic structures enforced by programming languages. However, differences in the redundancy of such structures may jeopardize the performance of applications leveraging code redundancy. We present a study of the redundancy of several types of code constructs in a large-scale dataset of active Java projects mined from GitHub, unveiling that redundancy is not uniform and mainly resides in specific code constructs. We further investigate the implications of the locality of redundancy by analyzing the performance of language models when applied to code completion. Our study discloses the perils of exploiting code redundancy without taking into account its strong locality in specific code constructs.

Tuesday, May 23


09:00 - 10:30

Most Influential Paper

Room: Auditorio 1  |  Session Chairs: David Lo and Alexander Serebrenik
11:00 - 12:30

Technical Research: Android and Security

Room: Auditorio 1  |  Session Chair: Jacques Klein

Industry and Tool Demo

Room: 126  |  Session Chairs: Felienne Hermans and David Shepherd  
RepDroid: An Automated Tool for Android Application Repackaging Detection
Shengtao Yue, Weizan Feng, Jun Ma, Yanyan Jiang, Xianping Tao, Chang Xu and Jian Lu  
In recent years, with the explosive growth of mobile smart phones, the number of Android applications (apps) increases rapidly. Attackers usually leverage the popularity of Android apps by inserting malwares, modifying the original apps, repackaging and releasing them for their own illegal purposes. To avoid repackaged apps from being detected, they usually use sorts of obfuscation and encryption tools. As a result, it’s important to detect which apps are repackaged. People often intuitively judge whether two apps are a repackaged pair by executing them and observing their runtime user interface (UI) traces. Hence, we propose layout group graph (LGG) built from UI trances to model those UI behaviors and use LGG as the birthmark of Android apps for identification. Based on LGG, we also implement a dynamic repackaging detection tool, RepDroid. Since our method does not require the apps’ source code, it is resilient to app obfuscation and encryption. We conducted an experiment with two data sets. The first set contains 98 pairs of repackaged apps. The original apps and repackaged ones are compared and we can detect all of these repackaged pairs. The second set contains 125 commercial apps. We compared them pair-wisely and the false positive rate was 0.08%.

Comprehension of Ads-supported and Paid Android Applications: Are They Different?
Rubén Saborido Infantes, Foutse Khomh, Giuliano Antoniol and Yann-Gaël Guéhéneuc  
The Android market is a place where developers offer paid and–or free apps to users. Free apps can follow the freemium or the ads-business model. While the former offers less features and the user is charged for unlocking additional features, the latter includes ads to allow developers to get a revenue. Free apps are interesting to users because they can try them immediately without incurring a monetary cost. However, free apps often have limited features and–or contain ads when compared to their paid counterparts. Thus, users may eventually need to pay to get additional features and–or remove ads. While paid apps have clear market values, their ads-supported versions are not entirely free because ads have an impact on performance. The hidden costs of ads, and the recent possibility to form family groups in Google Play to share purchased apps, make it difficult for developers and users to balance between visible and hidden costs of paid and ads-supported apps. In this paper, first, we perform an exploratory study about ads-supported and paid apps to understand their differences in terms of implementation and development process. We analyze 40 Android apps and we observe that (i) ads-supported apps are preferred by users although paid apps have a better rating, (ii) developers do not usually offer a paid app without a corresponding free version, (iii) ads-supported apps usually have more releases and are released more often than their corresponding paid versions, (iv) there is no a clear strategy about the way developers set prices of paid apps, (v) paid apps do not usually include more functionalities than their corresponding ads-supported versions, (vi) developers do not always remove ad networks in paid versions of their ads-supported apps, and (vii) paid apps require less permissions than ads-supported apps. Second, we carry out an experimental study to compare the performance of ads-supported and paid apps and we propose four equations to estimate the cost of ads-supported apps. We obtain that (i) ads-supported apps use more resources than their corresponding paid versions with statistically significant differences and (ii) paid apps could be considered a most cost-effective choice for users because their cost can be amortized in a short period of time, depending on their usage.

How Professional Hackers Understand Protected Code while Performing Attack Tasks
Mariano Ceccato, Paolo Tonella, Aldo Basile, Bart Coppens, Bjorn De Sutter, Paolo Falcarin, and Marco Torchiano  
Code protections aim at blocking (or at least delaying) reverse engineering and tampering attacks to critical assets within programs. Knowing the way hackers understand protected code and perform attacks is important to achieve a stronger protection of the software assets, based on realistic assumptions about the hackers’ behaviour. However, building such knowledge is difficult because hackers can hardly be involved in controlled experiments and empirical studies. The FP7 European project Aspire has given the authors of this paper the unique opportunity to have access to the professional penetration testers employed by the three industrial partners. In particular, we have been able to perform a qualitative analysis of three reports of professional penetration test performed on protected industrial code. Our qualitative analysis of the reports consists of open coding, carried out by 7 annotators and resulting in 459 annotations, followed by concept extraction and model inference. We identified the main activities: understanding, building attack, choosing and customizing tools, and working around or defeating protections. We built a model of how such activities take place. We used such models to identify a set of research directions for the creation of stronger code protections.

NetDroid: Summarizing Network Behavior of Android Apps for Network Code Maintenance
Shaikh Mostafa, Rodney Rodriguez and Xiaoyin Wang  
Network access is one of the most common features of Android applications. Statistics show that almost 80% of Android apps ask for network permission and thus may have some network-related features. Android apps may access multiple servers to retrieve or post various types of data, and the code to handle such network features often needs to change as a result of server API evolution or the content change of data transferred. Since various network code is used by multiple features, maintenance of network-related code is often difficult because the code may scatter in different places in the code base, and it may not be easy to predict the impact of a code change to the network behavior of an Android app. In this paper, we present an approach to statically summarize network behavior from the byte code of Android apps. Our approach is based on string taint analysis, and generates a summary of network requests by statically estimating the possible values of network API arguments. To evaluate our technique, we applied our technique to top 500 android apps from the official Google Play market, and the result shows that our approach is able to summarize network behavior for most apps efficiently (averagely less than 50 second for an app). Furthermore, we performed an empirical evaluation on 8 real-world maintenance tasks extracted from bug reports of open-source Android projects on Github. The empirical evaluation shows that our technique is effective in locating relevant network code.
Removing Code Clones from Industrial Systems Using Compiler Directives
Tomomi Hatano and Akihiko Matsuo  
Refactoring of code clones is an effective method for improving software maintainability. Existing studies have proposed automated techniques and tools for refactoring. However, it is difficult to apply refactoring to our industrial systems in practice because of three main reasons. First, we have many industrial systems written in COBOL which requires a particular refactoring method compared with current techniques because Type-2 clones in COBOL are generated by renaming parts of identifiers. Second, nested clones must be refactored, in which an instance of a clone set is contained within an instance of another clone set. They also make it difficult to estimate the reduction size by refactoring. Third, refactoring requires testing which is time-consuming and laborious. To overcome these problems, we developed an approach for refactoring of Type-2 clones in COBOL programs. Our approach identifies actual refactorable clone sets and includes a string comparison technique to parameterize partial differences in identifier names. The clone sets are extracted as shared code fragments and transformed into the refactored code using compiler directives. It is easy to confirm that refactoring using compiler directives preserves program behavior, because they do not change program structure. We also provide a method that makes it possible to refactor nested clones by ordering their refactoring. This method enables to estimate how many lines can be reduced by refactoring. We applied the approach to four industrial systems to assess how many lines can be reduced. The results show that the lines could be reduced by 10 to 15% and one system was reduced by 27%. We also discuss the parameter number required for our refactoring approach.

Language-Independent Information Flow Tracking Engine for Program Comprehension Tools
Mohammad Reza Azadmanesh, Michael Van De Vanter and Matthias Hauswirth  
Program comprehension tools are often developed for a specific programming language. Developing such a tool from scratch requires significant effort. In this paper, we report on our experience developing a language-independent framework that enables the creation of program comprehension tools, specifically tools gathering insight from deep dynamic analysis, with little effort. Our framework is language independent, because it is built on top of Truffle, an open-source platform, developed in Oracle Labs, for implementing dynamic languages in the form of AST interpreters. Our framework supports the creation of a diverse variety of program comprehension techniques, such as query, program slicing, and back-in-time debugging, because it is centered around a powerful information-flow tracking engine. Tools developed with our framework get access to the information-flow through a program execution. While it is possible to develop similarly powerful tools without our framework, for example by tracking information-flow through bytecode instrumentation, our approach leads to information that is closer to source code constructs, thus more comprehensible by the user. To demonstrate the effectiveness of our framework, we applied it to two of Truffle-based languages namely Simple Language and TruffleRuby, and we distill our experience into guidelines for developers of other Truffle-based languages who want to develop program comprehension tools for their language.

The Code Time Machine
Emad Aghajani, Andrea Mocci, Gabriele Bavota, Michele Lanza  
Exploring and analyzing the history of changes is an intrinsic part of software evolution comprehension. Existing tools that exploit the data residing in version control repositories provide only limited support for the intuitive navigation of code changes from a historical perspective. We present the Code Time Machine, a lightweight IDE plugin which uses visualization techniques to depict the history of any chosen file augmented with information mined from the underlying versioning system. Inspired by Apple’s Time Machine, our tool allows both developers and the system itself to seamlessly move through time.

Docio: Documenting API Input/Output Examples
Siyuan Jiang, Ameer Armaly, Collin McMillan, Qiyu Zhi, Ronald Metoyer  
When learning to use an Application Programming Interface (API), programmers need to understand the inputs and outputs (I/O) of the API functions. Current documentation tools automatically document the static information of I/O, such as parameter types and names. What is missing from these tools is dynamic information, such as I/O examples—actual valid values of inputs that produce certain outputs. In this paper, we demonstrate Docio, a prototype toolset we built to generate I/O examples. Docio logs I/O values when API functions are executed, for example in running test suites. Then, Docio puts I/O values into API documents as I/O examples. Docio has three programs: 1) funcWatch, which collects I/O values when API developers run test suites, 2) ioSelect, which selects one I/O example from a set of I/O values, and 3) ioPresent, which embeds the I/O examples into documents. In a preliminary evaluation, we used Docio to generate four hundred I/O examples for three C libraries: ffmpeg, libssh, and protobuf-c.

MetricAttitude++: Enhancing Polymetric Views with Information Retrieval
Rita Francese, Michele Risi, Genoveffa Tortora  
MetricAttitude is a visualization tool based on static analysis that provides a mental picture by viewing an object-oriented software by means of polymetric views. In this tool demonstration paper, we integrate an information retrieval engine in MetricAttitude and name this new version as MetricAttitude++. This new tool allows the software engineer to formulate free-form textual queries and shows results on the polymetric views. In particular, MetricAttitude++ shows on the visual representation of a subject software the elements that are more similar to that query. The navigation among elements of interest can be then driven by the polymetric views of the depicted elements and/or reformulating the query and applying customizable filters on the software view. Due to its peculiarities, MetricAttitude++ can be applicable to many kinds of software maintenance and evolution tasks (e.g., concept location and program comprehension).

FindSmells: Flexible Composition of Bad Smell Detection Strategies
Bruno Sousa, Priscila Souza, Eduardo Fernandes, Kecia Ferreira, Mariza Bigonha  
Bad smells are symptoms of problems in the source code of software systems. They may harm the maintenance and evolution of systems on different levels. Thus, detecting smells is essential in order to support the software quality improvement. Since even small systems may contain several bad smell instances, and considering that developers have to prioritize their elimination, its automated detection is a necessary support for developers. Regarding that, detection strategies have been proposed to formalize rules to detect specific bad smells, such as Large Class and Feature Envy. Several tools like JDeodorant and JSpIRIT implement these strategies but, in general, they do not provide full customization of the formal rules that define a detection strategy. In this paper, we propose FindSmells, a tool for detecting bad smells in software systems through software metrics and their thresholds. With FindSmells, the user can compose and manage different strategies, which run without source code analysis. We also provide a running example of the tool.
14:00 - 15:30

Technical Research: Communities and Changes

Room: Auditorio 1  |  Session Chair: Dror Feitelson

Early Research Achievement

Room: 126  |  Session Chairs: Sonia Haiduc and Martin Pinzger  
An Exploratory Study on the Relationship between Changes and Refactoring
Fabio Palomba, Andy Zaidman, Rocco Oliveto and Andrea De Lucia  
Refactoring aims at improving the internal structure of a software system without changing its external behavior. Previous studies empirically assessed, on the one hand, the benefits of refactoring in terms of code quality and developers’ productivity, and on the other hand, the underlying reasons that push programmers to apply refactoring. Results achieved in the latter investigations indicate that besides personal motivation such as the responsibility concerned with code authorship, refactoring is mainly performed as a consequence of changes in the requirements rather than driven by software quality. However, these findings have been derived by surveying developers, and therefore no software repository study has been carried out to corroborate the achieved findings. To bridge this gap, we provide a quantitative investigation on the relationship between different types of code changes (i.e., Fault Repairing Modification, Feature Introduction Modification, and General Maintenance Modification) and 28 different refactoring types coming from 3 open source projects. Results showed that developers tend to apply a higher number of refactoring operations aimed at improving maintainability and comprehensibility of the source code when fixing bugs. Instead, when new features are implemented, more complex refactoring operations are performed to improve code cohesion. Most of the times, the underlying reasons behind the application of such refactoring operations are represented by the presence of duplicate code or previously introduced self-admitted technical debts.

Developer-Related Factors in Change Prediction: An Empirical Assessment
Gemma Catolino, Fabio Palomba, Andrea De Lucia, Filomena Ferrucci and Andy Zaidman  
Predicting the areas of the source code having a higher likelihood to change in the future is a crucial activity to allow developers to plan preventive maintenance operations such as refactoring or peer-code reviews. In the past the research community was active in devising change prediction models based on structural metrics extracted from the source code. More recently, Elish et al. showed how evolution metrics can be more efficient for predicting change-prone classes. In this paper, we aim at making a further step ahead by investigating the role of different developer-related factors, which are able to capture the complexity of the development process under different perspectives, in the context of change prediction. We also compared such models with existing change-prediction models based on evolution and code metrics. Our findings reveal the capabilities of developer-based metrics in identifying classes of a software system more likely to be changed in the future. Moreover, we observed interesting complementarities among the experimented prediction models, that may possibly lead to the definition of new combined models exploiting developer-related factors as well as product and evolution metrics.

Analyzing User Comments on YouTube Coding Tutorial Videos
Elizabeth Poché, Nishant Jha, Grant Williams, Jazmine Staten, Miles Visper and Anas Mahmoud  
Video coding tutorials enable expert and novice programmers to visually observe real developers write, debug, and execute code. Previous research in this domain has focused on helping programmers find relevant content in coding tutorial videos as well as understanding the motivation and needs of content creators. In this paper, we focus on the link connecting programmers creating coding videos with their audience. More specifically, we analyze user comments on YouTube coding tutorial videos. Our main objective is to help content creators to effectively understand the needs and concerns of their viewers, thus respond faster to these concerns and deliver higher-quality content. A dataset of 6000 comments sampled from 12 YouTube coding videos is used to conduct our analysis. Important user questions and concerns are then automatically classified and summarized. The results show that Support Vector Machines can detect useful viewers’ comments on coding videos with an average accuracy of 77%. The results also show that SumBasic, an extractive frequency-based summarization technique with redundancy control, can sufficiently capture the main concerns present in viewers’ comments.

A Comparison of Three Algorithms for Computing Truck Factors
Mivian Ferreira, Kecia Ferreira and Marco Tulio Valente  
Truck Factor (also known as Bus Factor or Lottery Number) is the minimal number of developers that have to be hit by a truck (or leave) before a project is incapacitated. Therefore, it is a measure that reveals the concentration of knowledge and the key developers in a project. Due to the importance of this information to project managers, algorithms were proposed to automatically compute Truck Factors, using maintenance activity data extracted from version control systems. However, to the best of our knowledge, we still lack studies that compare the accuracy of the results produced by such algorithms. Therefore, in this paper, we evaluate and compare the results of three Truck Factor algorithms. To this end, we empirically determine the truck factors of 35 open-source systems by consulting their developers. Our results show that two algorithms are very accurate, especially when the systems have a small Truck Factor. We also evaluate the impact of different thresholds and configurations in algorithm results.
Comprehending Studies on Program Comprehension
Ivonne Schroter, Jacob Kruger, Janet Siegmund and Thomas Leich  
Program comprehension is an important aspect of developing and maintaining software, as programmers spend most of their time comprehending source code. Thus, it is the focus of many studies and experiments to evaluate approaches and techniques that aim to improve program comprehension. As the amount of corresponding work increases, the question arises how researchers address program comprehension. To answer this question, we conducted a literature review of papers published at the International Conference on Program Comprehension, the major venue for research on program comprehension. In this article, we i) present preliminary results of the literature review and ii) derive further research directions. The results indicate the necessity for a more detailed analysis of program comprehension and empirical research.

It's Duck (Typing) Season!
Nevena Milojkovic, Mohammad Ghafari and Oscar Nierstrasz  
Duck typing provides a way to reuse code and allow a developer to write more extensible code. At the same time, it scatters the implementation of a functionality over multiple classes and causes difficulties in program comprehension. The extent to which duck typing is used in real programs is not very well understood. We report on a preliminary study of the prevalence of duck typing in more than a thousand dynamically-typed open source software systems developed in Smalltalk. Although a small portion of the call sites in these systems is duck-typed, in half of the analysed systems at least 20% of methods are duck-typed.

Replicating Parser Behavior using Neural Machine Translation
Carol V. Alexandru, Sebastiano Panichella and Harald C. Gall  
More than other machine learning techniques, neural networks have been shown to excel at tasks where humans traditionally outperform computers: recognizing objects in images, distinguishing spoken words from background noise or playing “Go”. These are hard problems, where hand-crafting solutions is rarely feasible due to their inherent complexity. Higher level program comprehension is not dissimilar in nature: while a compiler or program analysis tool can extract certain facts from (correctly written) code, it has no intrinsic ‘understanding’ of the data and for the majority of real-world problems, a human developer is needed - for example to find and fix a bug or to summarize the bahavior of a method. We perform a pilot study to determine the suitability of neural machine translation (NMT) for processing plain-text source code. We find that, on one hand, NMT is too fragile to accurately tokenize code, while on the other hand, it can precisely recognize different types of tokens and make accurate guesses regarding their relative position in the local syntax tree. Our results suggest that NMT may be exploited for annotating and enriching out-of-context code snippets to support automated tooling for code comprehension problems. We also identify several challenges in applying neural networks to learning from source code and determine key differences between the application of existing neural network models to source code instead of natural language.

Towards Automatic Generation of Short Summaries of Commits
Siyuan Jiang and Collin McMillan  
Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studying the differences between auto-generated and human-written messages, we found that 82% of the human-written messages have only one sentence, while the automatically generated messages often have multiple lines. Furthermore, we found that the commit messages often begin with a verb followed by an direct object. This finding inspired us to use a “verb+object” format in this paper to generate short commit summaries. We split the approach into two parts: verb generation and object generation. As our first try, we trained a classifier to classify a diff to a verb. We are seeking feedback from the community before we continue to work on generating direct objects for the commits.

Android Repository Mining for Detecting Publicly Accessible Functions Missing Permission Checks
Hoang H. Nguyen, Lingxiao Jiang and Tho Quan  
Android has become the most popular mobile operating system. Millions of applications, including many malware, haven been developed for it. Even though its overall system architecture and many APIs are documented, many other methods and implementation details are not, not to mention potential bugs and vulnerabilities that may be exploited. Manual documentation may also be easily outdated as Android evolves constantly with changing features and higher complexities. Techniques and tool supports are thus needed to automatically extract information from different versions of Android to facilitate whole-system analysis of undocumented code. This paper presents an approach for alleviating the challenges associated with whole-system analysis. It performs usual program analysis for different versions of Android by control-flow and data-flow analyses. More importantly, it integrates information retrieval and query heuristics to customize the graphs for purposes related to the queries and make whole-system analyses more efficient. In particular, we use the approach to identify functions in Android that can be invoked by applications in either benign or malicious way, which are referred to as publicly accessible functions in this paper, and with the queries we provided, identify functions that may access sensitive system and/or user data and should be protected by certain permission checks. Based on such information, we can detect some publicly accessible functions in the system that may miss sufficient permission checks. As a proof of concept, this paper has analyzed six Android versions and shows basic statistics about the publicly accessible functions in the Android versions, and detects and verifies several system functions that miss permission checks and may have security implications.

Studying the Prevalence of Exception Handling Anti-Patterns
Guilherme Bicalho de Padua and Weiyi Shang  
Modern programming languages, such as Java and C#, typically provide features that handle exceptions. These features separate error-handling code from regular source code and are proven to enhance the practice of software reliability, comprehension, and maintenance. Having acknowledged the advantages of exception handling features, the misuse of them can still cause catastrophic software failures, such as application crash. Prior studies suggested anti-patterns of exception handling; while little knowledge was shared about the prevalence of these anti-patterns. In this paper, we investigate the prevalence of exception-handling anti-patterns. We collected a thorough list of exception anti-patterns from 16 open-source Java and C# libraries and applications using an automated exception flow analysis tool. We found that although exception handling anti-patterns widely exist in all of our subjects, only a few anti-patterns (e.g. Unhandled Exceptions, Catch Generic, Unreachable Handler, Over-catch, and Destructive Wrapping) can be commonly identified. On the other hand, we find that the prevalence of anti-patterns illustrates differences between C# and Java. Our results call for further in-depth analyses on the exception handling practices across different languages.

On the Properties of Design-relevant Classes for Design Anomaly Assessment
Liliane Nascimento Vale and Marcelo Maia  
Several object-oriented systems have their respective designs documented by using only a few design-relevant classes, which we will refer to as key classes. In this paper, we automatically detect key classes, and investigate some of their properties, and evaluate their role for assessing design. We propose focusing on such classes to make design decisions during maintenance tasks as those classes of this type are, by definition, more relevant than non-key classes. First, we show that key classes are more prone to bad smells than non-key classes. Although, structural metrics of key classes tend to be, in general, higher than non-key classes, there are still a significant set of non-key classes with poor structural metrics, suggesting that prioritizing design anomaly assessment using key classes would likely to be more effective.
16:00 - 17:30

Technical Research: Bugs

Room: Auditorio 1  |  Session Chair: Lingxiao Jiang

Technical Research: Variability and Comprehensibility

Room: 126  |  Session Chair: Mika Mäntylä
Bug Localization with Combination of Deep Learning and Information Retrieval
An Lam, Anh Nguyen, Hoan Nguyen and Tien Nguyen  
The automated task of locating the potential buggy files in a software project given a bug report is called bug localization. Bug localization helps developers focus on crucial files. However, the existing automated bug localization approaches face a key challenge, called lexical mismatch. Specifically, the terms used in bug reports to describe a bug are different from the terms and code tokens used in source files. To address that, we present a novel approach that uses deep neural network (DNN) in combination with rVSM, an information retrieval (IR) technique. rVSM collects the feature on the textual similarity between bug reports and source files. DNN is used to learn to relate the terms in bug reports to potentially different code tokens and terms in source files. Our empirical evaluation on real-world bug reports in the open-source projects shows that DNN and IR complement well to each other to achieve higher bug localization accuracy than individual models. Importantly, our new model, DNNLOC, with a combination of the features built from DNN, rVSM, and project’s bug-fixing history, achieves higher accuracy than the state-of-the-art IR and machine learning techniques. In half of the cases, it is correct with just a single suggested file. In 66% of the time, a correct buggy file is in the list of three suggested files. With 5 suggested files, it is correct in almost 70% of the cases.

Bug Report Enrichment with Application of Automated Fixer Recommendation
Tao Zhang, Jiachi Chen, He Jiang, Xiapu Luo and Xin Xia  
For large open source projects (e.g., Eclipse, Mozilla), developers usually utilize bug reports to facilitate software maintenance tasks such as fixer assignment. However, there are a large portion of short reports in bug repositories. We find that 78.1% of bug reports only include less than 100 words in Eclipse and require bug fixers to spend more time on resolving them due to limited informative contents. To address this problem, in this paper, we propose a novel approach to enrich bug reports. Concretely, we design a sentence ranking algorithm based on a new textual similarity metric to select the proper contents for bug report enrichment. For the enriched bug reports, we conduct a user study to assess whether the additional sentences can provide further help to fixer assignment. Moreover, we assess whether the enriched versions can improve the performance of automated fixer recommendation. In particular, we perform three popular automated fixer recommendation approaches on the enriched bug reports of Eclipse, Mozilla, and GNU Compiler Collection (GCC). The experimental results show that enriched bug reports improve the average F-measure scores of the automated fixer recommendation approaches by up to 10% for DREX, 13.37% for DRETOM, and 8% for DevRec when top-10 bug fixers are recommended.

Automatically Detecting Integrity Violations In Database-Centric Applications
Boyang Li, Denys Poshyvanyk and Mark Grechanik  
Database-centric applications (DCAs) are widely used by many companies and organizations to perform various control and analytical tasks using large databases. Real-world databases are described by complex schemas that oftentimes contain hundreds of tables consisting of thousands of attributes. However, when software engineers develop DCAs, they may write code that can inadvertently violate the integrity of these databases. Alternatively, business analysts and database administrators can also make errors that lead to integrity violations (semantic bugs). To detect these violations, stakeholders must create assertions that check the validity of the data in the rows of the database tables. Unfortunately, creating assertions is a manual, laborious and error-prone task. Thus, a fundamental problem of testing DCAs is how to find such semantic bugs automatically. We propose a novel solution, namely DACITE, that enables stakeholders to automatically obtain constraints that semantically relate database attributes and code statements using a combination of static analysis of the source code and associative rule mining of the databases. We rely on SAT-solvers to validate if a solution to the combined constraints exists and issue warnings on possible semantic bugs to stakeholders. We evaluated our approach on eight open-source DCAs and our results suggest that semantic bugs can be found automatically with high precision. The results of the study with developers show that warnings produced by DACITE are useful and enable them to find semantic bugs faster.

How Does Execution Information Help with Information-Retrieval Based Bug Localization?
Tung Dao, Lingming Zhang and Na Meng  
Bug localization is challenging and time-consuming. Given a bug report, a developer may spend tremendous time comprehending the bug description together with code in order to locate bugs. To facilitate bug report comprehension, information retrieval (IR)-based bug localization techniques have been proposed to automatically search for and rank potential buggy code elements (i.e., classes or methods). However, these techniques do not leverage any dynamic execution information of buggy programs. In this paper, we perform the first systematic study on how dynamic execution information can help with static IR-based bug localization. More specifically, with the fixing patches and bug reports of 157 real bugs, we investigated the impact of various execution information (i.e. coverage, slicing, and spectrum) on three IR-based techniques: the baseline technique, BugLocator, and BLUiR. Our experiments demonstrate that both the coverage and slicing information of failed tests can effectively reduce the search space and improve IR-based techniques at both class and method levels. Using additional spectrum information can further improve bug localization at the method but not the class level. Some of our investigated ways of augmenting IR-based bug localization with execution information even outperform a state-of-the-art technique, which merges spectrum with an IR-based technique in a complicated way. Different from prior work, by investigating various easy-to-understand ways to combine execution information with IR-based techniques, this study shows for the first time that execution information can generally bring considerable improvement to IR-based bug localization.
Constructing Feature Model by Identifying Variability-aware Module
Yutian Tang and Hareton Leung  
Modeling variability, known as building feature models, should be an essential step in the whole process of product line development, maintenance and testing. The work on feature model recovery serves as a foundation and further contributes to product line development and variability-aware analysis. Different from the architecture recovery process even though they somewhat share the same process, the variability is not considered in all architecture recovery techniques. In this paper, we proposed a feature model recovery technique VMS, which gives a variability-aware analysis on the program and further constructs modules for feature model mining. With our work, we bring the variability information into architecture and build the feature model directly from the source base. Our experimental results suggest that our approach performs competitively and outperforms six other representative approaches for architecture recovery.

An Empirical Study on Code Comprehension: Data Context Interaction compared to classical Object Oriented
Héctor Adrián Valdecantos, Katy Tarrit, Mehdi Mirakhorli and James O. Coplien  
Source code comprehension affects software development — especially its maintenance — where code reading is one of the most time-consuming activities. A programming language, together with the programming paradigm it supports, is a strong factor that profoundly impacts how programmers comprehend code. We conducted a human-subject controlled experiment to evaluate comprehension of code written using the Data Context Interaction (DCI) paradigm relative to code written with commonly used Object-Oriented (OO) programming. We used a new research-level language called Trygve which implements DCI concepts, and Java, a pervasive OO language in the industry. DCI revisits lost roots of the OO paradigm to address problems that are inherent to Java and most other contemporary OO languages. We observed correctness, time consumption, and locality of reference during reading comprehension tasks. We present a method which relies on the Eigenvector Centrality metric from Social Network Analysis to study the locality of reference in programmers by inspecting their sequencing of reading language element declarations and their permanence time in the code. Results indicate that DCI code in Trygve supports more comprehensible code regarding correctness and improves the locality of reference, reducing context switching during the software discovery process. Regarding reading time consumption, we found no statistically significant differences between both approaches.

The Effect of Delocalized Plans on Spreadsheet Comprehension - A Controlled Experiment
Bas Jansen and Felienne Hermans  
Spreadsheets are widely used in industry. Spreadsheets also suffer from typical software engineering issues. Previous research shows that they contain code smells, lack documentation and tests, and have a long live span during which they are transferred multiple times among users. These transfers highlight the importance of spreadsheet comprehension. Therefore, in this paper, we analyze the effect of the organization of formulas on spreadsheet comprehension. To that end, we conduct a controlled experiment with 107 spreadsheet users, divided into two groups. One group receives a model where the formulas are organized such that all related components are grouped closely together, while the other group receives a model where the components are spread far and wide across the spreadsheet. All subjects perform the same set of comprehension tasks on their spreadsheet. The results indicate that the way formulas are located relative to each other in a spreadsheet, influences the performance of the subjects in their ability to comprehend and adapt the spreadsheet. Especially for the comprehension tasks, the subjects perform better on the model where the formulas were grouped closely together. For the adaptation tasks, we found that the length of the calculation chain influences the performance of the subjects more than the location of the formulas itself.

The Discipline of Preprocessor-Based Annotations Does #ifdef TAG n't #endif Matter
Romero Malaquias, Márcio Ribeiro, Rodrigo Bonifácio, Eduardo Monteiro, Flávio Medeiros, Alessandro Garcia and Rohit Gheyi  
The C preprocessor is a simple, effective, and language-independent tool. Developers use the preprocessor in practice to deal with portability and variability issues. Despite the widespread usage, the C preprocessor suffers from severe criticism, such as negative effects on code understandability and maintainability. In particular, these problems may get worse when using undisciplined annotations, i.e., when a preprocessor directive encompasses only parts of C syntactical units. Nevertheless, despite the criticism and guidelines found in systems like Linux to avoid undisciplined annotations, the results of a previous controlled experiment indicated that the discipline of annotations has no influence on program comprehension and maintenance. To better understand whether developers care about the discipline of preprocessor-based annotations and whether they can really influence on maintenance tasks, in this paper we conduct a mixed-method research involving two studies. In the first one, we identify undisciplined annotations in 110 open-source C/C++ systems of different domains, sizes, and popularity GitHub metrics. We then refactor the identified undisciplined annotations to make them disciplined. Right away, we submit pull requests with our code changes. Our results show that almost two thirds of our pull requests have been accepted and are now merged. In the second study, we conduct a controlled experiment. We have several differences with respect to the aforementioned one, such as blocking of cofounding effects and more replicas. We have evidences that maintaining undisciplined annotations is more time consuming and error prone, representing a different result when compared to the previous experiment. Overall, we conclude that undisciplined annotations should not be neglected.
17:30 - 18:30

Open Steering Committee Meeting

Room: Auditorio 1