From Online User Feedback to Requirements: Evaluating Large Language Models for Classification and Specification TasksScientific Evaluation Paper
This program is tentative and subject to change.
[Context and Motivation] Online user feedback provides valuable information to support requirements engineering (RE). How- ever, analysing online user feedback is challenging due to its large vol- ume and noise. Large language models (LLMs) show strong potential to automate this process and outperform previous techniques. They can also enable new tasks, such as generating requirements specifications. [Question/Problem] Despite their potential, the use of LLMs to anal- yse user feedback for RE remains underexplored. Existing studies offer limited empirical evidence, lack thorough evaluation, and rarely provide replication packages, undermining validity and reproducibility. [Princi- pal Idea/Results] We evaluate five lightweight open-source LLMs on three RE tasks: user request classification, NFR classification, and re- quirements specification generation. Classification performance was mea- sured on two feedback datasets, and specification quality via human evaluation. LLMs achieved moderate-to-high classification accuracy (F1 ≈0.47–0.68) and moderately high specification quality (mean ≈3/5). [Contributions]WenewlyexplorelightweightLLMsforfeedback-driven requirements development. Our contributions are: (i) an empirical eval- uation of lightweight LLMs on three RE tasks, (ii) a replication package, and (iii) insights into their capabilities and limitations for RE.
This program is tentative and subject to change.
Tue 24 MarDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | LLMs use in REResearch Track at CW 8 Chair(s): Oliver Karras TIB - Leibniz Information Centre for Science and Technology | ||
14:00 30mScientific evaluation | Opportunities and Limitations of GenAI in RE: Viewpoints from PracticeScientific Evaluation Paper Research Track Anne Hess Technical University of Applied Sciences Würzburg-Schweinfurt, Andreas Vogelsang paluno – The Ruhr Institute for Software Technology, University of Duisburg-Essen, Xavier Franch Universitat Politècnica de Catalunya, Andrea Herrmann Herrmann & Ehrlich, Sylwia Kopczyńska Poznan University of Technology, Alexander Rachmann Hochschule Niederrhein | ||
14:30 30mScientific evaluation | A Comparative Study of Large and Small Language Models for Conceptual Model ExtractionScientific Evaluation Paper Research Track | ||
15:00 30mScientific evaluation | From Online User Feedback to Requirements: Evaluating Large Language Models for Classification and Specification TasksScientific Evaluation Paper Research Track Manjeshwar Mallaya , Alessio Ferrari CNR-ISTI, Mohammad Amin Zadenoori ISTI-CNR, Jacek Dąbrowski Lero - the Research Ireland Centre for Software Pre-print | ||