Inspecting code churns to prioritize test cases
[BIBTEX]Abstract
Within the context of software evolution, due to time-to-market pressure, it is not uncommon that a company has not enough time and/or resources to re-execute the whole test suite on the new software version, to check for non-regression. To face this issue, many Regression Test Prioritization techniques have been proposed, aimed at ranking test cases in a way that tests more likely to expose faults have higher priority. Some of these techniques exploit code churn metrics, i.e. some quantification of code changes between two subsequent versions of a software artifact, which have been proven to be effective indicators of defect-prone components. In this paper, we first present three new Regression Test Prioritization strategies, based on a novel code churn metric, that we empirically assessed on an open source software system. Results highlighted that the proposal is promising, but that it might be further improved by a more detailed analysis on the nature of the changes introduced between two subsequent code versions. To this aim, in this paper we also sketch a more refined approach we are currently investigating, that quantifies changes in a code base at a finer grained level. Intuitively, we seek to prioritize tests that stress more fault-prone changes (e.g., structural changes in the control flow), w.r.t. those that are less likely to introduce errors (e.g., the renaming of a variable). To do so, we propose the exploitation of the Abstract Syntax Tree (AST) representation of source code, and to quantify differences between ASTs by means of specifically designed Tree Kernel functions, a type of similarity measure for tree-based data structures, which have shown to be very effective in other domains, thanks to their customizability.