When AI codes, Who tests? A teaching case on risk, quality, and governance in LLM-assisted software development

Authors	Author: Luigi Libero Lucio Starace.
Journal	Journal of Systems and Software.
DOI	10.1016/j.jss.2026.112989

Abstract

Large Language Models (LLMs) are increasingly integrated into software development workflows, promising dramatic productivity gains but introducing distinctive risks. This teaching case explores different bug patterns in LLM-generated code through a realistic organizational scenario. Set in a mid-sized software company piloting AI-assisted coding tools, the case follows a junior engineer tasked with investigating anomalies in LLM-generated code. Students analyze realistic examples of bugs, ranging from misinterpretations and missing corner cases to hallucinated APIs, and classify them using an empirically grounded framework. In addition to classification, students reflect on how these defects differ from traditional human-introduced bugs and evaluate strategies for detecting them through testing, code review, and governance mechanisms. The narrative culminates in a strategic dilemma: should the company scale up LLM adoption, restrict it to low-risk modules, or suspend the initiative? Designed for software engineering courses, the case fosters critical thinking on quality assurance, risk management, and governance in AI-assisted development, bridging technical analysis with managerial decision-making.

Additional material

Teaching notes and associated materials are available online on the publisher’s platform.