June 20, 2010
June 20, 2010
June 23, 2010
Educational Research and Methods
15.298.1 - 15.298.18
Comparison of Four Methodologies for Modeling Student Retention in Engineering
Several methodologies based on statistical methods or machine learning theories have been applied in previous studies for the modeling of student retention. However, most prior studies were based solely on a specific modeling method of authors’ choice. Direct comparison of competing methods using identical collection of student retention data was rarely provided.
The purpose of this paper is to present a direct comparison of prominent methods for modeling student retention using the same data. Four modeling methodologies (neural networks, logistic regression, discriminant analysis and structural equation modeling) are included in this study. These competing methods were implemented on five retention models with various collections of cognitive and non-cognitive factors, ranging from 9 to 71 variables. The retention data in this study were collected from more than 1500 first year engineering students in a large Midwestern university. The eleven cognitive attributes include high school GPAs, standardized test scores, and the grades and number of semesters in math, science and English courses in high school. The non-cognitive variables were collected through Student Attitudinal Success Instrument (SASI), covering the following nine constructs: Leadership, Deep Learning, Surface Learning, Teamwork, Academic Self-efficacy, Motivation, Metacognition, Expectancy-value, and Major Decision.
The following findings are found during this study. First, among the five retention models, the two hybrid models with both cognitive and non-cognitive factors always perform better than models consisting of either only cognitive, or only non-cognitive factors. Second, the addition of non-cognitive items can significantly improve the prediction performance of a cognitive-only model when applied properly. Third, neural network methods perform better than the other three methodologies in performance indices, followed by logistic regression. However, logistic regression may be attractive to some researchers for its ease in implementation and lower requirements for computation power. Finally, the authors found the commonly used threshold (0.05) for including variables in stepwise selection process in logistic regression may not result in the best model for prediction performance. The authors strongly suggest that researchers explore beyond this typical threshold in order to find the best performing collection of variables.
Exceptional high school graduates with excellent grade point averages and standardized test scores enter engineering programs across this country. However, as reported in various studies, the number of students switching out of engineering majors continues to be a major issue1,2. In a study of over 300 universities, Astin found that only 47% of first- year engineering students eventually completed their engineering degree3.
Imbrie, P., & Lin, J. J., & Reid, K. (2010, June), Comparison Of Four Methodologies For Modeling Student Retention In Engineering Paper presented at 2010 Annual Conference & Exposition, Louisville, Kentucky. https://peer.asee.org/16677
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2010 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015