Tampa, Florida
June 15, 2019
June 15, 2019
June 19, 2019
Computers in Education
13
10.18260/1-2--32906
https://peer.asee.org/32906
441
James Perretta is currently pursuing a master's degree in Computer Science at the University of Michigan, where he also develops automated grading systems. His research interests and prior work focus on using automated grading systems and feedback policies to enhance student learning.
Andrew DeOrio is a teaching faculty member at the University of Michigan and a consultant for web and machine learning projects. His research interests are in ensuring the correctness of computer systems, including medical and IOT devices and digital hardware, as well as engineering education. In addition to teaching software and hardware courses, he teaches Creative Process and works with students on technology-driven creative projects. His teaching has been recognized with the Provost's Teaching Innovation Prize, and he has twice been named Professor of the Year by the students in his department.
Human vs. Automated Coding Style Grading in Computing Education
Computer programming courses often evaluate student coding style manually. Static analysis tools provide an opportunity to automate this process. In this paper, we explore the tradeoffs of human style graders and general-purpose static analysis tools to evaluate student code. We investigate the following research questions: - Are human coding style evaluation scores consistent with static analysis tools? - Which style grading criteria are best evaluated with existing static analysis tools and which are more effectively evaluated by human graders?
We analyze data from a second-semester programming course at a large research institution with 943 students enrolled. Hired student graders evaluated student code with rubric criteria such as “Lines are not too long” or “Code is not too deeply nested.” We also ran several static analysis tools on the same student code to evaluate the same criteria. We then analyzed the correlation between the number of static analysis warnings and human style grading score for each criterion.
In our preliminary results, we see that static analysis tools tend to be more effective at evaluating objective code style criteria. We found a weak negative or no correlation between the human style grading score and number of static analysis warnings. Note that we expect student code with more static analysis warnings to receive fewer human style grading points. When comparing the “Lines are not too long” human style grading criterion to a related line-length static analysis inspection, we see a Pearson correlation score of r=-0.21. We also see trends in the distributions of human style grading scores that suggest human graders perform inconsistently. For example, 50% of students who received full human style grading points for the line-length criterion had 3 or more static analysis warnings from a related line-length inspection. Additionally, 23% of students who received no points on the same criterion had no static analysis warnings for the line-length inspection.
We also found that some code style criteria are not well suited to the general-purpose static analysis tools we investigated. For example, none of the static analysis tools we investigated provide a robust way of evaluating the quality of variable and function names in a program. Some tools provide an inspection for detecting variable names that are shorter than a user-specified length threshold; however, this inspection fails to identify low-quality variable names that happen to be longer than the minimum allowed length. Furthermore, there are some common scenarios where a short variable name is acceptable by convention.
Static analysis tools have the benefit of integration with an automated grading system, facilitating faster and more frequent feedback compared to human grading. The literature suggests that frequent feedback encourages students to actively improve on their work (Spacco et al. 2006). There is also evidence to suggest that increased engagement is most beneficial to students with less experience (Carini et al. 2006). Our results suggest that automated code quality evaluation could be one tool that benefits student learning in intro CS courses, helping most those students with least access to CS training pre-college.
References - Carini, R.M., Kuh, G.D. & Klein, S.P. Res High Educ (2006) 47: 1. - Spacco, Jaime and Pugh, William. Helping students appreciate test-driven development (TDD). Proceedings of OOPSLA, pages 907–913, 2006.
Perretta, J., & Weimer, W., & Deorio, A. (2019, June), Human vs. Automated Coding Style Grading in Computing Education Paper presented at 2019 ASEE Annual Conference & Exposition , Tampa, Florida. 10.18260/1-2--32906
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2019 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015