Technical Research: Android and SecurityRoom: Auditorio 1 | Session Chair: Jacques Klein
Industry and Tool Demo†Room: 126 | Session Chairs: Felienne Hermans and David Shepherd
RepDroid: An Automated Tool for Android Application Repackaging Detection
Shengtao Yue, Weizan Feng, Jun Ma, Yanyan Jiang, Xianping Tao, Chang Xu and Jian Lu
In recent years, with the explosive growth of mobile smart phones, the number of Android applications (apps) increases rapidly. Attackers usually leverage the popularity of Android apps by inserting malwares, modifying the original apps, repackaging and releasing them for their own illegal purposes. To avoid repackaged apps from being detected, they usually use sorts of obfuscation and encryption tools. As a result, it’s important to detect which apps are repackaged. People often intuitively judge whether two apps are a repackaged pair by executing them and observing their runtime user interface (UI) traces. Hence, we propose layout group graph (LGG) built from UI trances to model those UI behaviors and use LGG as the birthmark of Android apps for identification. Based on LGG, we also implement a dynamic repackaging detection tool, RepDroid. Since our method does not require the apps’ source code, it is resilient to app obfuscation and encryption. We conducted an experiment with two data sets. The first set contains 98 pairs of repackaged apps. The original apps and repackaged ones are compared and we can detect all of these repackaged pairs. The second set contains 125 commercial apps. We compared them pair-wisely and the false positive rate was 0.08%.
Comprehension of Ads-supported and Paid Android Applications: Are They Different?
Rubén Saborido Infantes, Foutse Khomh, Giuliano Antoniol and Yann-Gaël Guéhéneuc
The Android market is a place where developers offer paid and–or free apps to users. Free apps can follow the freemium or the ads-business model. While the former offers less features and the user is charged for unlocking additional features, the latter includes ads to allow developers to get a revenue. Free apps are interesting to users because they can try them immediately without incurring a monetary cost. However, free apps often have limited features and–or contain ads when compared to their paid counterparts. Thus, users may eventually need to pay to get additional features and–or remove ads. While paid apps have clear market values, their ads-supported versions are not entirely free because ads have an impact on performance. The hidden costs of ads, and the recent possibility to form family groups in Google Play to share purchased apps, make it difficult for developers and users to balance between visible and hidden costs of paid and ads-supported apps. In this paper, first, we perform an exploratory study about ads-supported and paid apps to understand their differences in terms of implementation and development process. We analyze 40 Android apps and we observe that (i) ads-supported apps are preferred by users although paid apps have a better rating, (ii) developers do not usually offer a paid app without a corresponding free version, (iii) ads-supported apps usually have more releases and are released more often than their corresponding paid versions, (iv) there is no a clear strategy about the way developers set prices of paid apps, (v) paid apps do not usually include more functionalities than their corresponding ads-supported versions, (vi) developers do not always remove ad networks in paid versions of their ads-supported apps, and (vii) paid apps require less permissions than ads-supported apps. Second, we carry out an experimental study to compare the performance of ads-supported and paid apps and we propose four equations to estimate the cost of ads-supported apps. We obtain that (i) ads-supported apps use more resources than their corresponding paid versions with statistically significant differences and (ii) paid apps could be considered a most cost-effective choice for users because their cost can be amortized in a short period of time, depending on their usage.
How Professional Hackers Understand Protected Code while Performing Attack Tasks
Mariano Ceccato, Paolo Tonella, Aldo Basile, Bart Coppens, Bjorn De Sutter, Paolo Falcarin, and Marco Torchiano
Code protections aim at blocking (or at least delaying) reverse engineering and tampering attacks to critical assets within programs. Knowing the way hackers understand protected code and perform attacks is important to achieve a stronger protection of the software assets, based on realistic assumptions about the hackers’ behaviour. However, building such knowledge is difficult because hackers can hardly be involved in controlled experiments and empirical studies. The FP7 European project Aspire has given the authors of this paper the unique opportunity to have access to the professional penetration testers employed by the three industrial partners. In particular, we have been able to perform a qualitative analysis of three reports of professional penetration test performed on protected industrial code. Our qualitative analysis of the reports consists of open coding, carried out by 7 annotators and resulting in 459 annotations, followed by concept extraction and model inference. We identified the main activities: understanding, building attack, choosing and customizing tools, and working around or defeating protections. We built a model of how such activities take place. We used such models to identify a set of research directions for the creation of stronger code protections.
NetDroid: Summarizing Network Behavior of Android Apps for Network Code Maintenance
Shaikh Mostafa, Rodney Rodriguez and Xiaoyin Wang
Network access is one of the most common features of Android applications. Statistics show that almost 80% of Android apps ask for network permission and thus may have some network-related features. Android apps may access multiple servers to retrieve or post various types of data, and the code to handle such network features often needs to change as a result of server API evolution or the content change of data transferred. Since various network code is used by multiple features, maintenance of network-related code is often difficult because the code may scatter in different places in the code base, and it may not be easy to predict the impact of a code change to the network behavior of an Android app. In this paper, we present an approach to statically summarize network behavior from the byte code of Android apps. Our approach is based on string taint analysis, and generates a summary of network requests by statically estimating the possible values of network API arguments. To evaluate our technique, we applied our technique to top 500 android apps from the official Google Play market, and the result shows that our approach is able to summarize network behavior for most apps efficiently (averagely less than 50 second for an app). Furthermore, we performed an empirical evaluation on 8 real-world maintenance tasks extracted from bug reports of open-source Android projects on Github. The empirical evaluation shows that our technique is effective in locating relevant network code.
Removing Code Clones from Industrial Systems Using Compiler Directives
Tomomi Hatano and Akihiko Matsuo
Refactoring of code clones is an effective method for improving software maintainability. Existing studies have proposed automated techniques and tools for refactoring. However, it is difficult to apply refactoring to our industrial systems in practice because of three main reasons. First, we have many industrial systems written in COBOL which requires a particular refactoring method compared with current techniques because Type-2 clones in COBOL are generated by renaming parts of identifiers. Second, nested clones must be refactored, in which an instance of a clone set is contained within an instance of another clone set. They also make it difficult to estimate the reduction size by refactoring. Third, refactoring requires testing which is time-consuming and laborious. To overcome these problems, we developed an approach for refactoring of Type-2 clones in COBOL programs. Our approach identifies actual refactorable clone sets and includes a string comparison technique to parameterize partial differences in identifier names. The clone sets are extracted as shared code fragments and transformed into the refactored code using compiler directives. It is easy to confirm that refactoring using compiler directives preserves program behavior, because they do not change program structure. We also provide a method that makes it possible to refactor nested clones by ordering their refactoring. This method enables to estimate how many lines can be reduced by refactoring. We applied the approach to four industrial systems to assess how many lines can be reduced. The results show that the lines could be reduced by 10 to 15% and one system was reduced by 27%. We also discuss the parameter number required for our refactoring approach.
Language-Independent Information Flow Tracking Engine for Program Comprehension Tools
Mohammad Reza Azadmanesh, Michael Van De Vanter and Matthias Hauswirth
Program comprehension tools are often developed for a specific programming language. Developing such a tool from scratch requires significant effort. In this paper, we report on our experience developing a language-independent framework that enables the creation of program comprehension tools, specifically tools gathering insight from deep dynamic analysis, with little effort. Our framework is language independent, because it is built on top of Truffle, an open-source platform, developed in Oracle Labs, for implementing dynamic languages in the form of AST interpreters. Our framework supports the creation of a diverse variety of program comprehension techniques, such as query, program slicing, and back-in-time debugging, because it is centered around a powerful information-flow tracking engine. Tools developed with our framework get access to the information-flow through a program execution. While it is possible to develop similarly powerful tools without our framework, for example by tracking information-flow through bytecode instrumentation, our approach leads to information that is closer to source code constructs, thus more comprehensible by the user. To demonstrate the effectiveness of our framework, we applied it to two of Truffle-based languages namely Simple Language and TruffleRuby, and we distill our experience into guidelines for developers of other Truffle-based languages who want to develop program comprehension tools for their language.
The Code Time Machine†
Emad Aghajani, Andrea Mocci, Gabriele Bavota, Michele Lanza
Exploring and analyzing the history of changes is an intrinsic part of software evolution comprehension. Existing tools that exploit the data residing in version control repositories provide only limited support for the intuitive navigation of code changes from a historical perspective. We present the Code Time Machine, a lightweight IDE plugin which uses visualization techniques to depict the history of any chosen file augmented with information mined from the underlying versioning system. Inspired by Apple’s Time Machine, our tool allows both developers and the system itself to seamlessly move through time.
Docio: Documenting API Input/Output Examples†
Siyuan Jiang, Ameer Armaly, Collin McMillan, Qiyu Zhi, Ronald Metoyer
When learning to use an Application Programming Interface (API), programmers need to understand the inputs and outputs (I/O) of the API functions. Current documentation tools automatically document the static information of I/O, such as parameter types and names. What is missing from these tools is dynamic information, such as I/O examples—actual valid values of inputs that produce certain outputs. In this paper, we demonstrate Docio, a prototype toolset we built to generate I/O examples. Docio logs I/O values when API functions are executed, for example in running test suites. Then, Docio puts I/O values into API documents as I/O examples. Docio has three programs: 1) funcWatch, which collects I/O values when API developers run test suites, 2) ioSelect, which selects one I/O example from a set of I/O values, and 3) ioPresent, which embeds the I/O examples into documents. In a preliminary evaluation, we used Docio to generate four hundred I/O examples for three C libraries: ffmpeg, libssh, and protobuf-c.
MetricAttitude++: Enhancing Polymetric Views with Information Retrieval†
Rita Francese, Michele Risi, Genoveffa Tortora
MetricAttitude is a visualization tool based on static analysis that provides a mental picture by viewing an object-oriented software by means of polymetric views. In this tool demonstration paper, we integrate an information retrieval engine in MetricAttitude and name this new version as MetricAttitude++. This new tool allows the software engineer to formulate free-form textual queries and shows results on the polymetric views. In particular, MetricAttitude++ shows on the visual representation of a subject software the elements that are more similar to that query. The navigation among elements of interest can be then driven by the polymetric views of the depicted elements and/or reformulating the query and applying customizable filters on the software view. Due to its peculiarities, MetricAttitude++ can be applicable to many kinds of software maintenance and evolution tasks (e.g., concept location and program comprehension).
FindSmells: Flexible Composition of Bad Smell Detection Strategies†
Bruno Sousa, Priscila Souza, Eduardo Fernandes, Kecia Ferreira, Mariza Bigonha
Bad smells are symptoms of problems in the source code of software systems. They may harm the maintenance and evolution of systems on different levels. Thus, detecting smells is essential in order to support the software quality improvement. Since even small systems may contain several bad smell instances, and considering that developers have to prioritize their elimination, its automated detection is a necessary support for developers. Regarding that, detection strategies have been proposed to formalize rules to detect specific bad smells, such as Large Class and Feature Envy. Several tools like JDeodorant and JSpIRIT implement these strategies but, in general, they do not provide full customization of the formal rules that define a detection strategy. In this paper, we propose FindSmells, a tool for detecting bad smells in software systems through software metrics and their thresholds. With FindSmells, the user can compose and manage different strategies, which run without source code analysis. We also provide a running example of the tool.
12:30 - 14:00
12:30 - 13:30
Informal Session for tool demos
Room: 126 | Session Chair: Csaba Nagy
Technical Research: Communities and ChangesRoom: Auditorio 1 | Session Chair: Dror Feitelson
Early Research AchievementRoom: 126 | Session Chairs: Sonia Haiduc and Martin Pinzger
An Exploratory Study on the Relationship between Changes and Refactoring
Fabio Palomba, Andy Zaidman, Rocco Oliveto and Andrea De Lucia
Refactoring aims at improving the internal structure of a software system without changing its external behavior. Previous studies empirically assessed, on the one hand, the benefits of refactoring in terms of code quality and developers’ productivity, and on the other hand, the underlying reasons that push programmers to apply refactoring. Results achieved in the latter investigations indicate that besides personal motivation such as the responsibility concerned with code authorship, refactoring is mainly performed as a consequence of changes in the requirements rather than driven by software quality. However, these findings have been derived by surveying developers, and therefore no software repository study has been carried out to corroborate the achieved findings. To bridge this gap, we provide a quantitative investigation on the relationship between different types of code changes (i.e., Fault Repairing Modification, Feature Introduction Modification, and General Maintenance Modification) and 28 different refactoring types coming from 3 open source projects. Results showed that developers tend to apply a higher number of refactoring operations aimed at improving maintainability and comprehensibility of the source code when fixing bugs. Instead, when new features are implemented, more complex refactoring operations are performed to improve code cohesion. Most of the times, the underlying reasons behind the application of such refactoring operations are represented by the presence of duplicate code or previously introduced self-admitted technical debts.
Developer-Related Factors in Change Prediction: An Empirical Assessment
Gemma Catolino, Fabio Palomba, Andrea De Lucia, Filomena Ferrucci and Andy Zaidman
Predicting the areas of the source code having a higher likelihood to change in the future is a crucial activity to allow developers to plan preventive maintenance operations such as refactoring or peer-code reviews. In the past the research community was active in devising change prediction models based on structural metrics extracted from the source code. More recently, Elish et al. showed how evolution metrics can be more efficient for predicting change-prone classes. In this paper, we aim at making a further step ahead by investigating the role of different developer-related factors, which are able to capture the complexity of the development process under different perspectives, in the context of change prediction. We also compared such models with existing change-prediction models based on evolution and code metrics. Our findings reveal the capabilities of developer-based metrics in identifying classes of a software system more likely to be changed in the future. Moreover, we observed interesting complementarities among the experimented prediction models, that may possibly lead to the definition of new combined models exploiting developer-related factors as well as product and evolution metrics.
Analyzing User Comments on YouTube Coding Tutorial Videos
Elizabeth Poché, Nishant Jha, Grant Williams, Jazmine Staten, Miles Visper and Anas Mahmoud
Video coding tutorials enable expert and novice programmers to visually observe real developers write, debug, and execute code. Previous research in this domain has focused on helping programmers find relevant content in coding tutorial videos as well as understanding the motivation and needs of content creators. In this paper, we focus on the link connecting programmers creating coding videos with their audience. More specifically, we analyze user comments on YouTube coding tutorial videos. Our main objective is to help content creators to effectively understand the needs and concerns of their viewers, thus respond faster to these concerns and deliver higher-quality content. A dataset of 6000 comments sampled from 12 YouTube coding videos is used to conduct our analysis. Important user questions and concerns are then automatically classified and summarized. The results show that Support Vector Machines can detect useful viewers’ comments on coding videos with an average accuracy of 77%. The results also show that SumBasic, an extractive frequency-based summarization technique with redundancy control, can sufficiently capture the main concerns present in viewers’ comments.
A Comparison of Three Algorithms for Computing Truck Factors
Mivian Ferreira, Kecia Ferreira and Marco Tulio Valente
Truck Factor (also known as Bus Factor or Lottery Number) is the minimal number of developers that have to be hit by a truck (or leave) before a project is incapacitated. Therefore, it is a measure that reveals the concentration of knowledge and the key developers in a project. Due to the importance of this information to project managers, algorithms were proposed to automatically compute Truck Factors, using maintenance activity data extracted from version control systems. However, to the best of our knowledge, we still lack studies that compare the accuracy of the results produced by such algorithms. Therefore, in this paper, we evaluate and compare the results of three Truck Factor algorithms. To this end, we empirically determine the truck factors of 35 open-source systems by consulting their developers. Our results show that two algorithms are very accurate, especially when the systems have a small Truck Factor. We also evaluate the impact of different thresholds and configurations in algorithm results.
Comprehending Studies on Program Comprehension
Ivonne Schroter, Jacob Kruger, Janet Siegmund and Thomas Leich
Program comprehension is an important aspect of developing and maintaining software, as programmers spend most of their time comprehending source code. Thus, it is the focus of many studies and experiments to evaluate approaches and techniques that aim to improve program comprehension. As the amount of corresponding work increases, the question arises how researchers address program comprehension. To answer this question, we conducted a literature review of papers published at the International Conference on Program Comprehension, the major venue for research on program comprehension. In this article, we i) present preliminary results of the literature review and ii) derive further research directions. The results indicate the necessity for a more detailed analysis of program comprehension and empirical research.
It's Duck (Typing) Season!
Nevena Milojkovic, Mohammad Ghafari and Oscar Nierstrasz
Duck typing provides a way to reuse code and allow a developer to write more extensible code. At the same time, it scatters the implementation of a functionality over multiple classes and causes difficulties in program comprehension. The extent to which duck typing is used in real programs is not very well understood. We report on a preliminary study of the prevalence of duck typing in more than a thousand dynamically-typed open source software systems developed in Smalltalk. Although a small portion of the call sites in these systems is duck-typed, in half of the analysed systems at least 20% of methods are duck-typed.
Replicating Parser Behavior using Neural Machine Translation
Carol V. Alexandru, Sebastiano Panichella and Harald C. Gall
More than other machine learning techniques, neural networks have been shown to excel at tasks where humans traditionally outperform computers: recognizing objects in images, distinguishing spoken words from background noise or playing “Go”. These are hard problems, where hand-crafting solutions is rarely feasible due to their inherent complexity. Higher level program comprehension is not dissimilar in nature: while a compiler or program analysis tool can extract certain facts from (correctly written) code, it has no intrinsic ‘understanding’ of the data and for the majority of real-world problems, a human developer is needed - for example to find and fix a bug or to summarize the bahavior of a method. We perform a pilot study to determine the suitability of neural machine translation (NMT) for processing plain-text source code. We find that, on one hand, NMT is too fragile to accurately tokenize code, while on the other hand, it can precisely recognize different types of tokens and make accurate guesses regarding their relative position in the local syntax tree. Our results suggest that NMT may be exploited for annotating and enriching out-of-context code snippets to support automated tooling for code comprehension problems. We also identify several challenges in applying neural networks to learning from source code and determine key differences between the application of existing neural network models to source code instead of natural language.
Towards Automatic Generation of Short Summaries of Commits
Siyuan Jiang and Collin McMillan
Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studying the differences between auto-generated and human-written messages, we found that 82% of the human-written messages have only one sentence, while the automatically generated messages often have multiple lines. Furthermore, we found that the commit messages often begin with a verb followed by an direct object. This finding inspired us to use a “verb+object” format in this paper to generate short commit summaries. We split the approach into two parts: verb generation and object generation. As our first try, we trained a classifier to classify a diff to a verb. We are seeking feedback from the community before we continue to work on generating direct objects for the commits.
Android Repository Mining for Detecting Publicly Accessible Functions Missing Permission Checks
Hoang H. Nguyen, Lingxiao Jiang and Tho Quan
Android has become the most popular mobile operating system. Millions of applications, including many malware, haven been developed for it. Even though its overall system architecture and many APIs are documented, many other methods and implementation details are not, not to mention potential bugs and vulnerabilities that may be exploited. Manual documentation may also be easily outdated as Android evolves constantly with changing features and higher complexities. Techniques and tool supports are thus needed to automatically extract information from different versions of Android to facilitate whole-system analysis of undocumented code. This paper presents an approach for alleviating the challenges associated with whole-system analysis. It performs usual program analysis for different versions of Android by control-flow and data-flow analyses. More importantly, it integrates information retrieval and query heuristics to customize the graphs for purposes related to the queries and make whole-system analyses more efficient. In particular, we use the approach to identify functions in Android that can be invoked by applications in either benign or malicious way, which are referred to as publicly accessible functions in this paper, and with the queries we provided, identify functions that may access sensitive system and/or user data and should be protected by certain permission checks. Based on such information, we can detect some publicly accessible functions in the system that may miss sufficient permission checks. As a proof of concept, this paper has analyzed six Android versions and shows basic statistics about the publicly accessible functions in the Android versions, and detects and verifies several system functions that miss permission checks and may have security implications.
Studying the Prevalence of Exception Handling Anti-Patterns
Guilherme Bicalho de Padua and Weiyi Shang
Modern programming languages, such as Java and C#, typically provide features that handle exceptions. These features separate error-handling code from regular source code and are proven to enhance the practice of software reliability, comprehension, and maintenance. Having acknowledged the advantages of exception handling features, the misuse of them can still cause catastrophic software failures, such as application crash. Prior studies suggested anti-patterns of exception handling; while little knowledge was shared about the prevalence of these anti-patterns. In this paper, we investigate the prevalence of exception-handling anti-patterns. We collected a thorough list of exception anti-patterns from 16 open-source Java and C# libraries and applications using an automated exception flow analysis tool. We found that although exception handling anti-patterns widely exist in all of our subjects, only a few anti-patterns (e.g. Unhandled Exceptions, Catch Generic, Unreachable Handler, Over-catch, and Destructive Wrapping) can be commonly identified. On the other hand, we find that the prevalence of anti-patterns illustrates differences between C# and Java. Our results call for further in-depth analyses on the exception handling practices across different languages.
On the Properties of Design-relevant Classes for Design Anomaly Assessment
Liliane Nascimento Vale and Marcelo Maia
Several object-oriented systems have their respective designs documented by using only a few design-relevant classes, which we will refer to as key classes. In this paper, we automatically detect key classes, and investigate some of their properties, and evaluate their role for assessing design. We propose focusing on such classes to make design decisions during maintenance tasks as those classes of this type are, by definition, more relevant than non-key classes. First, we show that key classes are more prone to bad smells than non-key classes. Although, structural metrics of key classes tend to be, in general, higher than non-key classes, there are still a significant set of non-key classes with poor structural metrics, suggesting that prioritizing design anomaly assessment using key classes would likely to be more effective.
Technical Research: BugsRoom: Auditorio 1 | Session Chair: Lingxiao Jiang
Technical Research: Variability and ComprehensibilityRoom: 126 | Session Chair: Mika Mäntylä
Bug Localization with Combination of Deep Learning and Information Retrieval
An Lam, Anh Nguyen, Hoan Nguyen and Tien Nguyen
The automated task of locating the potential buggy files in a software project given a bug report is called bug localization. Bug localization helps developers focus on crucial files. However, the existing automated bug localization approaches face a key challenge, called lexical mismatch. Specifically, the terms used in bug reports to describe a bug are different from the terms and code tokens used in source files. To address that, we present a novel approach that uses deep neural network (DNN) in combination with rVSM, an information retrieval (IR) technique. rVSM collects the feature on the textual similarity between bug reports and source files. DNN is used to learn to relate the terms in bug reports to potentially different code tokens and terms in source files. Our empirical evaluation on real-world bug reports in the open-source projects shows that DNN and IR complement well to each other to achieve higher bug localization accuracy than individual models. Importantly, our new model, DNNLOC, with a combination of the features built from DNN, rVSM, and project’s bug-fixing history, achieves higher accuracy than the state-of-the-art IR and machine learning techniques. In half of the cases, it is correct with just a single suggested file. In 66% of the time, a correct buggy file is in the list of three suggested files. With 5 suggested files, it is correct in almost 70% of the cases.
Bug Report Enrichment with Application of Automated Fixer Recommendation
Tao Zhang, Jiachi Chen, He Jiang, Xiapu Luo and Xin Xia
For large open source projects (e.g., Eclipse, Mozilla), developers usually utilize bug reports to facilitate software maintenance tasks such as fixer assignment. However, there are a large portion of short reports in bug repositories. We find that 78.1% of bug reports only include less than 100 words in Eclipse and require bug fixers to spend more time on resolving them due to limited informative contents. To address this problem, in this paper, we propose a novel approach to enrich bug reports. Concretely, we design a sentence ranking algorithm based on a new textual similarity metric to select the proper contents for bug report enrichment. For the enriched bug reports, we conduct a user study to assess whether the additional sentences can provide further help to fixer assignment. Moreover, we assess whether the enriched versions can improve the performance of automated fixer recommendation. In particular, we perform three popular automated fixer recommendation approaches on the enriched bug reports of Eclipse, Mozilla, and GNU Compiler Collection (GCC). The experimental results show that enriched bug reports improve the average F-measure scores of the automated fixer recommendation approaches by up to 10% for DREX, 13.37% for DRETOM, and 8% for DevRec when top-10 bug fixers are recommended.
Automatically Detecting Integrity Violations In Database-Centric Applications
Boyang Li, Denys Poshyvanyk and Mark Grechanik
Database-centric applications (DCAs) are widely used by many companies and organizations to perform various control and analytical tasks using large databases. Real-world databases are described by complex schemas that oftentimes contain hundreds of tables consisting of thousands of attributes. However, when software engineers develop DCAs, they may write code that can inadvertently violate the integrity of these databases. Alternatively, business analysts and database administrators can also make errors that lead to integrity violations (semantic bugs). To detect these violations, stakeholders must create assertions that check the validity of the data in the rows of the database tables. Unfortunately, creating assertions is a manual, laborious and error-prone task. Thus, a fundamental problem of testing DCAs is how to find such semantic bugs automatically. We propose a novel solution, namely DACITE, that enables stakeholders to automatically obtain constraints that semantically relate database attributes and code statements using a combination of static analysis of the source code and associative rule mining of the databases. We rely on SAT-solvers to validate if a solution to the combined constraints exists and issue warnings on possible semantic bugs to stakeholders. We evaluated our approach on eight open-source DCAs and our results suggest that semantic bugs can be found automatically with high precision. The results of the study with developers show that warnings produced by DACITE are useful and enable them to find semantic bugs faster.
How Does Execution Information Help with Information-Retrieval Based Bug Localization?
Tung Dao, Lingming Zhang and Na Meng
Bug localization is challenging and time-consuming. Given a bug report, a developer may spend tremendous time comprehending the bug description together with code in order to locate bugs. To facilitate bug report comprehension, information retrieval (IR)-based bug localization techniques have been proposed to automatically search for and rank potential buggy code elements (i.e., classes or methods). However, these techniques do not leverage any dynamic execution information of buggy programs. In this paper, we perform the first systematic study on how dynamic execution information can help with static IR-based bug localization. More specifically, with the fixing patches and bug reports of 157 real bugs, we investigated the impact of various execution information (i.e. coverage, slicing, and spectrum) on three IR-based techniques: the baseline technique, BugLocator, and BLUiR. Our experiments demonstrate that both the coverage and slicing information of failed tests can effectively reduce the search space and improve IR-based techniques at both class and method levels. Using additional spectrum information can further improve bug localization at the method but not the class level. Some of our investigated ways of augmenting IR-based bug localization with execution information even outperform a state-of-the-art technique, which merges spectrum with an IR-based technique in a complicated way. Different from prior work, by investigating various easy-to-understand ways to combine execution information with IR-based techniques, this study shows for the first time that execution information can generally bring considerable improvement to IR-based bug localization.
Constructing Feature Model by Identifying Variability-aware Module
Yutian Tang and Hareton Leung
Modeling variability, known as building feature models, should be an essential step in the whole process of product line development, maintenance and testing. The work on feature model recovery serves as a foundation and further contributes to product line development and variability-aware analysis. Different from the architecture recovery process even though they somewhat share the same process, the variability is not considered in all architecture recovery techniques. In this paper, we proposed a feature model recovery technique VMS, which gives a variability-aware analysis on the program and further constructs modules for feature model mining. With our work, we bring the variability information into architecture and build the feature model directly from the source base. Our experimental results suggest that our approach performs competitively and outperforms six other representative approaches for architecture recovery.
An Empirical Study on Code Comprehension: Data Context Interaction compared to classical Object Oriented
Héctor Adrián Valdecantos, Katy Tarrit, Mehdi Mirakhorli and James O. Coplien
Source code comprehension affects software development — especially its maintenance — where code reading is one of the most time-consuming activities. A programming language, together with the programming paradigm it supports, is a strong factor that profoundly impacts how programmers comprehend code. We conducted a human-subject controlled experiment to evaluate comprehension of code written using the Data Context Interaction (DCI) paradigm relative to code written with commonly used Object-Oriented (OO) programming. We used a new research-level language called Trygve which implements DCI concepts, and Java, a pervasive OO language in the industry. DCI revisits lost roots of the OO paradigm to address problems that are inherent to Java and most other contemporary OO languages. We observed correctness, time consumption, and locality of reference during reading comprehension tasks. We present a method which relies on the Eigenvector Centrality metric from Social Network Analysis to study the locality of reference in programmers by inspecting their sequencing of reading language element declarations and their permanence time in the code. Results indicate that DCI code in Trygve supports more comprehensible code regarding correctness and improves the locality of reference, reducing context switching during the software discovery process. Regarding reading time consumption, we found no statistically significant differences between both approaches.
The Effect of Delocalized Plans on Spreadsheet Comprehension - A Controlled Experiment
Bas Jansen and Felienne Hermans
Spreadsheets are widely used in industry. Spreadsheets also suffer from typical software engineering issues. Previous research shows that they contain code smells, lack documentation and tests, and have a long live span during which they are transferred multiple times among users. These transfers highlight the importance of spreadsheet comprehension. Therefore, in this paper, we analyze the effect of the organization of formulas on spreadsheet comprehension. To that end, we conduct a controlled experiment with 107 spreadsheet users, divided into two groups. One group receives a model where the formulas are organized such that all related components are grouped closely together, while the other group receives a model where the components are spread far and wide across the spreadsheet. All subjects perform the same set of comprehension tasks on their spreadsheet. The results indicate that the way formulas are located relative to each other in a spreadsheet, influences the performance of the subjects in their ability to comprehend and adapt the spreadsheet. Especially for the comprehension tasks, the subjects perform better on the model where the formulas were grouped closely together. For the adaptation tasks, we found that the length of the calculation chain influences the performance of the subjects more than the location of the formulas itself.
The Discipline of Preprocessor-Based Annotations Does #ifdef TAG n't #endif Matter
Romero Malaquias, Márcio Ribeiro, Rodrigo Bonifácio, Eduardo Monteiro, Flávio Medeiros, Alessandro Garcia and Rohit Gheyi
The C preprocessor is a simple, effective, and language-independent tool. Developers use the preprocessor in practice to deal with portability and variability issues. Despite the widespread usage, the C preprocessor suffers from severe criticism, such as negative effects on code understandability and maintainability. In particular, these problems may get worse when using undisciplined annotations, i.e., when a preprocessor directive encompasses only parts of C syntactical units. Nevertheless, despite the criticism and guidelines found in systems like Linux to avoid undisciplined annotations, the results of a previous controlled experiment indicated that the discipline of annotations has no influence on program comprehension and maintenance. To better understand whether developers care about the discipline of preprocessor-based annotations and whether they can really influence on maintenance tasks, in this paper we conduct a mixed-method research involving two studies. In the first one, we identify undisciplined annotations in 110 open-source C/C++ systems of different domains, sizes, and popularity GitHub metrics. We then refactor the identified undisciplined annotations to make them disciplined. Right away, we submit pull requests with our code changes. Our results show that almost two thirds of our pull requests have been accepted and are now merged. In the second study, we conduct a controlled experiment. We have several differences with respect to the aforementioned one, such as blocking of cofounding effects and more replicas. We have evidences that maintaining undisciplined annotations is more time consuming and error prone, representing a different result when compared to the previous experiment. Overall, we conclude that undisciplined annotations should not be neglected.