From Online User Feedback to Requirements: Evaluating Large Language Models for Classification and Specification Tasks (REFSQ 2026 - Research Track)

Who

Manjeshwar Mallaya, Alessio Ferrari, Mohammad Amin Zadenoori, Jacek Dąbrowski

Track

REFSQ 2026 Research Track

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 24 Mar 2026 15:00 - 15:30 at CW 8 - LLMs use in RE Chair(s): Oliver Karras

Abstract

[Context and Motivation] Online user feedback provides valuable information to support requirements engineering (RE). How- ever, analysing online user feedback is challenging due to its large vol- ume and noise. Large language models (LLMs) show strong potential to automate this process and outperform previous techniques. They can also enable new tasks, such as generating requirements specifications. [Question/Problem] Despite their potential, the use of LLMs to anal- yse user feedback for RE remains underexplored. Existing studies offer limited empirical evidence, lack thorough evaluation, and rarely provide replication packages, undermining validity and reproducibility. [Princi- pal Idea/Results] We evaluate five lightweight open-source LLMs on three RE tasks: user request classification, NFR classification, and re- quirements specification generation. Classification performance was mea- sured on two feedback datasets, and specification quality via human evaluation. LLMs achieved moderate-to-high classification accuracy (F1 ≈0.47–0.68) and moderately high specification quality (mean ≈3/5). [Contributions]WenewlyexplorelightweightLLMsforfeedback-driven requirements development. Our contributions are: (i) an empirical eval- uation of lightweight LLMs on three RE tasks, (ii) a replication package, and (iii) insights into their capabilities and limitations for RE.

Link to Preprint

https://arxiv.org/pdf/2510.23055

Manjeshwar Mallaya

Alessio Ferrari

CNR-ISTI

Italy

Mohammad Amin Zadenoori

ISTI-CNR

Italy

Jacek Dąbrowski

Lero - the Research Ireland Centre for Software

Ireland