Kalamazoo, Michigan
March 22, 2024
March 22, 2024
March 23, 2024
12
10.18260/1-2--45589
https://peer.asee.org/45589
101
Joseph C. Sheils is an undergraduate researcher at Marshall University. With a background in statistics, he has conducted research on machine learning, probability theory, and natural language processing.
Dr. Dave Dampier is Dean of the College of Engineering and Computer Sciences and Professor in the Department of Computer Sciences and Electrical Engineering at Marshall University. In that position, he serves as the university lead for engineering and computer sciences. He also serves as Director of the Institute for Cyber Security.
Dr. Malik is an Associate Professor at the Department of Computer Sciences and Electrical Engineering, Marshall University, WV, USA.
Student evaluations, such as those collected within a higher education institution or externally on RateMyProfessors.com, leverage the individual experiences of students to provide comprehensive assessments of their teachers and schools. RateMyProfessors.com, the world’s largest crowd-sourced web service for student evaluation, provides a rich set of historical data on student evaluations. Over time, it has become a vast repository of student perceptions.
In order to create a more cohesive educational environment between students and university administrators, educators, and policymakers, topic models provide a method to efficiently research large-scale collections of student evaluations. This research evaluates the efficacy of multiple topic modeling techniques for academic feedback, identifying an appropriate method for the higher education community to use in identifying themes and patterns of student perceptions of their educators and schools.
Though quantitative Likert-scale student evaluations can be easily compared and analyzed, the unstructured nature of textual comments poses challenges in the analytical process. To facilitate efficient, large-scale analysis of the textual data in student evaluations, past research efforts have successfully utilized topic modeling, a natural language processing (NLP) technique. Topic models automatically discover the main topics present within a collection of student evaluations, thereby making it easy to compare student comments across the discovered topics. The most widely used topic modeling technique is Latent Dirichlet Allocation (LDA). However, it often suffers from unsuitability to short texts, the production of overlapping topics, and the need for extensive text preprocessing to obtain topics which are interpretable. Since LDA was established in 2003, various advanced topic modeling methods that leverage neural techniques such as transformers and word embeddings have been developed, reducing the need for extensive text preprocessing while improving performance for short texts, such as those found in student evaluations.
To pinpoint a suitable topic modeling technique for usage with academic feedback on educators and higher education institutions, we conduct a comparative study on the performance of four topic modeling techniques: namely, (1) Latent Dirichlet Allocation (LDA), (2) Nonnegative Matrix Factorization (NMF), (3) BERTopic, and (4) Top2vec. LDA and NMF are traditionally used topic modeling techniques that statistically extract topics through the structure of documents, while BERTopic and Top2Vec extract topics through word embeddings. The four techniques are chosen to represent both conventional means of topic modeling, as well as those which utilize recently developed, complex approaches. Comments within student evaluations on schools and educators, collected from RateMyProfessors.com, serve as the textual basis on which model performance is assessed.
Though the chosen techniques span a wide range of implementation processes, the topics produced by each are held to the same evaluation standard. The metrics used to evaluate the performance of the chosen topic modeling techniques are topic coherence, topic diversity, and human interpretation of the topics. Topic coherence is a measure of topic quality described by Hoyle et al. (2021) as “An intangible sense, available to human readers, that a set of terms, when viewed together, enable human recognition of an identifiable category.”. Along with coherence, we evaluate topic diversity, which measures how different topics are from each other. While these evaluation metrics are important proxies for model performance, we also investigate the human interpretability of topics and provide visualizations of model results.
Sheils, J. C., & Dampier, D. A., & Malik, H. (2024, March), A Comparative Study of Topic Models for Student Evaluations Paper presented at 2024 ASEE North Central Section Conference, Kalamazoo, Michigan. 10.18260/1-2--45589
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2024 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015