Luigi Libero Lucio Starace, Ph.D.

Assistant Professor @ Università degli Studi di Napoli Federico II, Italy.

Detecting Near-duplicate States in Web Application Model Inference: a Tree Kernel-based Approach

[pdf file]
AuthorLuigi Libero Lucio Starace.
conferenceECOOP/ISSTA 2021 - Doctoral Symposium Track.

Abstract

In the context of End-to-End testing of web applications, automated exploration techniques (a.k.a. crawling) are widely used to infer state-based models of the application under test. These models, in which states represent dynamic web pages and transitions represent reachability relationships, can be used for several analysis and testing tasks, such as test case or test artifact generation. However, current crawling techniques often lead to models affected by near-duplicates, i.e., multiple states representing slightly different pages that are in fact instances of the same functionality. This has a negative impact on the subsequent model-based testing tasks, adversely affecting, for example, size, running time, and achieved coverage of generated test suites.

In my research, my goal is to improve the model inference of web applications by devising novel near-duplicate detection techniques. My vision is to leverage Tree Kernel (TK) functions, which have been largely investigated and applied, thanks to their flexibility, in the Natural Language Processing domain to compute similarity between tree-structured objects. I envision to design specifically-suited TK functions, meant to consider the peculiarities of the Document Object Model (DOM) tree-structured representation of web pages, to detect near-duplicate web pages, thus improving the quality of the inferred models.