“A real person Will always be better:” Student Perceptions of GPT-produced Feedback on a CS1 Non-Coding Assignment

Researcher(s)

Matthew Nadar, Computer Science, University of Delaware

Faculty Mentor(s)

John Aromando, Computer & Information Sciences, University of Delaware

Abstract

Programmers, especially beginners, often use feedback to correct code issues. With the advent of Large Language Models (LLMs), it is now theoretically possible to generate human-like feedback. This has significant implications for educational contexts, where timely and clear feedback is crucial for student learning and development. LLMs can potentially streamline the grading process, provide personalized feedback, and alleviate the workload on educators, allowing them to focus on more complex teaching tasks. However, the quality and reception of LLM-generated feedback remain under-explored, particularly across different demographic groups. Understanding how various student populations perceive and benefit from such feedback is essential for creating inclusive and effective educational tools. With this in mind, we aimed to evaluate 223 CS1 students’ perceptions of LLM-generated feedback across demographic groups. After completing an assignment, students received feedback on their submissions and then participated in a survey. The survey data assessed various aspects of the feedback experience, including its perceived usefulness, ease of understanding, and overall satisfaction compared to traditional human feedback. We conducted significance tests, comparing responses across demographic groups and different problem questions. Additionally, we performed sentiment and thematic analyses on the open-response questions. Our findings revealed some significant differences among men and women, suggesting that men were much harsher in their ratings of the feedback compared to women. Furthermore, sentiment analysis revealed that the feedback given was positive, and while the students’ reception to the feedback was more mixed, it was generally positive overall. We conclude by presenting these results and offering design recommendations for system designers and educators working on LLM feedback generation, aimed at enhancing the inclusivity and effectiveness of GPT-generated feedback for all students.