An Initial Exploration of Machine Learning Techniques to Classify Source Code Comments in Real-time

Phyllis Beck; Mahnas Jean Mohammadi-Aragh; Christopher Archibald

Download Paper | Permalink

Conference: 2019 ASEE Annual Conference & Exposition
Location: Tampa, Florida
Publication Date: June 15, 2019
Start Date: June 15, 2019
End Date: June 19, 2019
Conference Session: Technical Session 3: The Best of Computers in Education
Tagged Division: Computers in Education
Page Count: 10
DOI: 10.18260/1-2--32065
Permanent URL: https://peer.asee.org/32065
Download Count: 621

Paper Authors

author page

Phyllis Beck Mississippi State University

biography

Mahnas Jean Mohammadi-Aragh Mississippi State University orcid.org/0000-0002-3094-3734

visit author page

Dr. Jean Mohammadi-Aragh is an assistant professor in the Department of Electrical and Computer Engineering at Mississippi State University. Dr. Mohammadi-Aragh investigates the use of digital systems to measure and support engineering education, specifically through learning analytics and the pedagogical uses of digital systems. She also investigates fundamental questions critical to improving undergraduate engineering degree pathways. . She earned her Ph.D. in Engineering Education from Virginia Tech. In 2013, Dr. Mohammadi-Aragh was honored as a promising new engineering education researcher when she was selected as an ASEE Educational Research and Methods Division Apprentice Faculty.

visit author page

author page

Christopher Archibald Mississippi State University

Download Paper | Permalink

Abstract

Providing real-time feedback to novice programmers is critical to their ability to learn to program. Higher enrollment in introductory computer science courses reduces the amount of time for individual student-instructor interaction. Reduced interaction time equates to a reduction in the time for and amount of instructor feedback. Building on our work involving manual classification and analysis of student source code comments, in this full paper we explore how machine learning techniques can be leveraged to provide real-time automated feedback to students with regards to their computational thinking processes. This paper discusses the initial classification of student source code comments using supervised machine learning methods. In this phase of classification, we focus on whether a comment is sufficient or insufficient. The classification process is broken down into three steps: text processing, data exploration, and comment classification using the Multinomial Naive-Bayes Classifier and a Random Forest Classifier. We detail the text processing requirements, including how to prepare the raw student data using natural language processing techniques such as stop word filtration, tokenization, and lemmatization. We also show how the data preparation process can affect the final classification outcome. Using Multinomial Naive-Bayes we achieved a precision rate of 82%. Using a Random Forest classifier and lemmatization we achieved a classification precision of 90%. We conclude with a description of how the current classification results can be used to provide real-time feedback to students while they are learning to program. Towards our ultimate goal of providing comprehensive real-time feedback to students, we describe future research plans, which include using unsupervised machine learning techniques to move beyond basic binary classification.

Citation
Format

Beck, P., & Mohammadi-Aragh, M. J., & Archibald, C. (2019, June), An Initial Exploration of Machine Learning Techniques to Classify Source Code Comments in Real-time Paper presented at 2019 ASEE Annual Conference & Exposition , Tampa, Florida. 10.18260/1-2--32065