Luigi Libero Lucio Starace, Ph.D.

Assistant Professor @ Università degli Studi di Napoli Federico II, Italy.

ReCover: a Curated Dataset for Regression Testing Research

AuthorsFrancesco Altiero, Anna Corazza, Sergio Di Martino, Adriano Peron, and Luigi Libero Lucio Starace.
conferenceMSR 2022 - Mining Software Repositories Conference.
DOI10.1145/3524842.3528490

Abstract

It is recognized in the literature that finding representative data to conduct regression testing research is non-trivial. In our experience within this field, existing datasets are often affected by issues that limit their applicability. Indeed, these datasets often lack fine-grained coverage information, reference software repositories that are not available anymore, or do not allow researchers to readily build and run the software projects, e.g., to obtain additional information. As a step towards better replicability and data-availability in regression testing research, we introduce ReCover, a dataset of 114 pairs of subsequent versions from 28 open source Java projects from GitHub. In particular, ReCover is intended as a consolidation and enrichment of recent dedicated regression testing datasets proposed in the literature, to overcome some of the above described issues, and to make them ready to use with a broader number of regression testing techniques. To this end, we developed a custom mining tool, that we make available as well, to automatically process two recent, massive regression testing datasets, retaining pairs of software versions for which we were able to (1) retrieve the full source code; (2) build the software in a general-purpose Java/Maven environment (which we provide as a Docker container for ease of replication); and (3) compute fine-grained test coverage metrics. ReCover can be readily employed in regression testing studies, as it bundles in a single package full, buildable source code and detailed coverage reports for all the projects. We envision that its use could foster regression testing research, improving replicability and long-term data availability.

Data

The ReCover dataset includes 114 version pairs from 28 real-world Java (Maven) projects with extensive JUnit test suites. Each version pair is such that a regression fault is introduced in the most recent version in the pair (i.e.: at least one of the tests fails). The ReCover dataset, including both full source code for each version and detailed test coverage reports, is available for download at DOI.

The source code of the Java Mining Tool we developed to build ReCover is available at DOI.

Lastly, we also provide a Dockerfile to setup the environment for building and running the projects included in the ReCover dataset. The Dockerfile is available at DOI.