June 15, 2019
June 15, 2019
October 19, 2019
Computers in Education
Providing real-time feedback to novice programmers is critical to their ability to learn to program. Higher enrollment in introductory computer science courses reduces the amount of time for individual student-instructor interaction. Reduced interaction time equates to a reduction in the time for and amount of instructor feedback. Building on our work involving manual classification and analysis of student source code comments, in this full paper we explore how machine learning techniques can be leveraged to provide real-time automated feedback to students with regards to their computational thinking processes. This paper discusses the initial classification of student source code comments using supervised machine learning methods. In this phase of classification, we focus on whether a comment is sufficient or insufficient. The classification process is broken down into three steps: text processing, data exploration, and comment classification using the Multinomial Naive-Bayes Classifier and a Random Forest Classifier. We detail the text processing requirements, including how to prepare the raw student data using natural language processing techniques such as stop word filtration, tokenization, and lemmatization. We also show how the data preparation process can affect the final classification outcome. Using Multinomial Naive-Bayes we achieved a precision rate of 82%. Using a Random Forest classifier and lemmatization we achieved a classification precision of 90%. We conclude with a description of how the current classification results can be used to provide real-time feedback to students while they are learning to program. Towards our ultimate goal of providing comprehensive real-time feedback to students, we describe future research plans, which include using unsupervised machine learning techniques to move beyond basic binary classification.
Beck, P., & Mohammadi-Aragh, M. J., & Archibald, C. (2019, June), An Initial Exploration of Machine Learning Techniques to Classify Source Code Comments in Real-time Paper presented at 2019 ASEE Annual Conference & Exposition , Tampa, Florida. 10.18260/1-2--32065
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2019 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015