Asee peer logo

An Initial Exploration of Machine Learning Techniques to Classify Source Code Comments in Real-time

Download Paper |


2019 ASEE Annual Conference & Exposition


Tampa, Florida

Publication Date

June 15, 2019

Start Date

June 15, 2019

End Date

October 19, 2019

Conference Session

Technical Session 3: The Best of Computers in Education

Tagged Division

Computers in Education

Page Count




Permanent URL

Download Count


Request a correction

Paper Authors

author page

Phyllis Beck Mississippi State University


Mahnas Jean Mohammadi-Aragh Mississippi State University Orcid 16x16

visit author page

Dr. Jean Mohammadi-Aragh is an assistant professor in the Department of Electrical and Computer Engineering at Mississippi State University. Dr. Mohammadi-Aragh investigates the use of digital systems to measure and support engineering education, specifically through learning analytics and the pedagogical uses of digital systems. She also investigates fundamental questions critical to improving undergraduate engineering degree pathways. . She earned her Ph.D. in Engineering Education from Virginia Tech. In 2013, Dr. Mohammadi-Aragh was honored as a promising new engineering education researcher when she was selected as an ASEE Educational Research and Methods Division Apprentice Faculty.

visit author page

author page

Christopher Archibald Mississippi State University

Download Paper |


Providing real-time feedback to novice programmers is critical to their ability to learn to program. Higher enrollment in introductory computer science courses reduces the amount of time for individual student-instructor interaction. Reduced interaction time equates to a reduction in the time for and amount of instructor feedback. Building on our work involving manual classification and analysis of student source code comments, in this full paper we explore how machine learning techniques can be leveraged to provide real-time automated feedback to students with regards to their computational thinking processes. This paper discusses the initial classification of student source code comments using supervised machine learning methods. In this phase of classification, we focus on whether a comment is sufficient or insufficient. The classification process is broken down into three steps: text processing, data exploration, and comment classification using the Multinomial Naive-Bayes Classifier and a Random Forest Classifier. We detail the text processing requirements, including how to prepare the raw student data using natural language processing techniques such as stop word filtration, tokenization, and lemmatization. We also show how the data preparation process can affect the final classification outcome. Using Multinomial Naive-Bayes we achieved a precision rate of 82%. Using a Random Forest classifier and lemmatization we achieved a classification precision of 90%. We conclude with a description of how the current classification results can be used to provide real-time feedback to students while they are learning to program. Towards our ultimate goal of providing comprehensive real-time feedback to students, we describe future research plans, which include using unsupervised machine learning techniques to move beyond basic binary classification.

Beck, P., & Mohammadi-Aragh, M. J., & Archibald, C. (2019, June), An Initial Exploration of Machine Learning Techniques to Classify Source Code Comments in Real-time Paper presented at 2019 ASEE Annual Conference & Exposition , Tampa, Florida. 10.18260/1-2--32065

ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2019 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015